THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY1 ON THE DESIGN AND ANALYSIS OF REAL-TIME COMPUTERS C. M. Krishna CRL-TR-37844 September 1984 Room 1079, East Engineering Building Ann Arbor, Michigan 48109 USA Tel: (313) 763-8000 'Any opinions finding, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily renect the views of the funding agency.

ON THE DESIGN AND ANALYSIS OF REAL-TIME COMPUTERS by C. M. Krishna A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in The University of Michigan 1984 Doctoral Committee: Associate Professor Kang G. Shin, Chairman Professor Kuei Chuang Associate Professor Trevor N. Mudge Associate Professor Tony C. Woo

TABLE OF CONTENTS ACKNOWLEDGMENT * ii LIST OF FIGURES.,.. *. a... v LIST OF TABLES,....-.............................. viii CHAPTER INTRODUCT ION 1. INTRODUCTION * * 2 2. REAL-TIME SYSTEMS *. * 8 THE PERFORMANCE MEASURES 3. THE PERFORMANCE MEASURES URES.............. 17 A. Terminology and Notation B. Definition of the Performance Measures C. Obtaining the Hard Deadline D. Obtaining the Finite Cost Function E. Remark on Finite Cost Functions F. Allowed State-Space and its Decomposition G. A Final Remark on this Chapter 4. CASE STUDY.................. 41 A. The Controlled Process B. Derivation of Performance Measures C. Allowed State Space D. Designation of Subspaces E. Finite Cost Functions iii

APPLICATIONS OF THE MEASURES 5. THE NUMBER-POWER TRADEOFF RAEF............* 65 A. System Description and Analysis B. Numerical Results and Discussion C. Extension 6. SYNCHRONIZATION AND FAULT-MASKING............ 89 A. Synchronization B. Voting and Byzantine Generals Algorithm C. Reconfigurable and Non-Reconfigurable Systems 7. SCHEDULING TASKS WITH A QUICK RECOVERY FROM FAILURE................ 131 A. Assumptions and Problem Statement B. Main Result C. Example D. Discussion DISCUSSION 8. DISCUSSION.......:.......................... 163 APPENDIX 167 REFERENCES 169 iv

LIST OF FIGURES Figure 1. A Typical Real-Time Control System see................ 9 2. Schematic Decomposition of a Real-Time Control Computer '.......... Os...... eo.... 12 3. Illustration of Cost Functions................... 27 4. Aircraft Control System......*..........a.....a.... 42 5. Definition of Aircraft Angles............................. 46 6. The Approximate State Equations..........a....~...... 51 7. Elevator Deflections without Abnormality.................. 53 8. Elevator Deflections with Abnormality........~....~........ 54 9(a). Allowed State Space: Altitude.......................... 57 9(b). Allowed State Space: Descent Rate...................... 58 9(c). Allowed State Space: Pitch Angle................. 59 9(d). Allowed State Space: Pitch Angle Rate....................60 10. Landing Job Cost Function............................. 63 11. A Real-Time Multiprocessor........................... 67 12. Markov Mlodel for Number-Power Tradeoff................. 69 13. Dependence of Probability of Dynamic Failure on Processor Number when Tasks are Non-Critical..71 V

14. Dependence of Probability of Dynamic Failure on Processor Number and Hard Deadlines 73 15. Minimum Achievable Probability of Dynamic Failure............ *.... s.... o....75 16. Race Between Static and Non-Static Components of Probability of Dynamic Failure..... a................ 76 17. Mean Cost for Configurations of Figure 15 79 18. Probability of Dynamic Failure for a Constant Number Power Product.............**...*. 80 19. Mean Costs for a Constant Number-Power Product.......................... 84 20. Functional Block Diagram of a Real-Time System..................... 90 21. Component of a Fault-Tolerant Phase-Locked Clock * *................... 94 22. Trigger Graph: Example............... 96 23. Scenarios in Example.............................. 102 24. Trigger Graph for Scenarios in Figure 23...............-. 103 25. Davies and Wakerly's Multistage Synchronizer F......... 112 26. Software Synchronization Overhead 116 27. Comparison of Reconfigurable and Non-Reconfigurable Systems ********..*......... 124 28. Comparison of Reconfigurable and Non-Reconfigurable Systems * *........ e *........... 130 29. A Typical Pipe.. 135 30. Effect of Ghosts on Primary Schedules: An Example................... 141 31. Illustrating Condition C2 144 vi

32. Nesting a Ghost inside a Primary.................... 147 33. Amalgamating Ghost and Contingency Schedules....... 151 34. Obtaining S..*.....e......*.......* 156 35. Schedules when Ghosts are Activated *................. t.157 36. Piecing Together a Contingency Scedule............. 159 37. All Clones Treated as Primary....................... 161 vii

LIST OF TABLES Table 1. Difference between Central and Peripheral Areas........ 13 2. Comparison of Performance Measures................. 31 3. Feedback Equation Constants........ 45 4. Weights for the Performance Index.................. 49 5. Minimum Feasible Number-Power Products............ 78 6. Maximum Cluster Size Permissible for Software Synchronization............... 117 7. Overhead of Byzantine Generals Algorithm: Lower Bound for SIFT 122 viii

INTRODUCTION 1

CHAPTER 1 INTRODUCTION There are indications that progress toward higher chip density, lower cost, and greater reliability of microprocessors and memories will continue into the next decade. This trend naturally leads to the design of faster and more reliable multiprocessors than their uniprocessor counterpart. However, use of multiple microprocessors to speed up general-purpose computations requires the solution of such important problems as task partitioning, interconnection/intercommunication, synchronization, reliability, I/O interface and handling, software structure and programmability, etc. The efficacy of the multiprocessor depends crucially on the application tasks that it executes, and no single multiprocessor can at present embody the optimal solution to the above issues for general-purpose computations. Consequently, it has been the tendency to develop special-purpose multiprocessors. One such example is real-time multiprocessors whose primary function is control of critical real-time systems, e.g. aircraft, spacecraft, nuclear reactor, power distribution and monitoring, etc. Use of multiple processors/memories for real-time control is motivated by its potential for high operating speed and improved reliability through component multiplicity [1], [2]. A real-time control system comprises two components: a controlled process and a control computer. Despite their synergistic relationship, these two components 2

3 have been designed and analyzed separately in isolation: the former by control scientists and the latter by computer designers. Moreover, control computer (called for brevity the controller) design has usually relied on ad hoc/empirical methods whereas there has been a significant progress in the theory and design of controlled processes. In order to narrow this gap and provide a bridge between these two components, this dissertation considers the controller with the controlled processes taken into account. A computer controller has three communicating functions: data acquisition, data processing and output functions. The data acquisition is responsible for gathering input (feedback) data from sensors, input panels and other associated -equipment; the processing function done by the computer (in our case the multiprocessor) generates output control/display signals from input data and the output function sends the processed results to mechanical actuators, displays and other output devices. The system may thus logically be regarded as a three-stage pipe. The controller software in the processing section consists of a set of tasks, each of which corresponds to some job to be performed repetitively in response to particular sets of environmental stimuli (triggers). These include both regular task triggers according to a predetermined schedule, as well as unexpected, situationdependent, task triggers. The set of tasks to be executed by the controller is predetermined and the stochastic nature and behavior of the software known in advance -- at least in outline -- to the designer. This fact makes it both easier and more necessary to obtain a reasonably good performance analysis of the system. The determining characteristic of a real-time multiprocessor's performance is a combination of reliability and high throughput. The throughput requirements arise

from the need for quick system response to environmental stimuli. Speed is of the essence in a real-time controller since failure can occur not only through massive hardware failures in the system, but also on account of the system's not responding quickly enough to events in the environment. As a result of these special performance requirements, performance measures used to characterize general-purpose uniprocessor systems are no longer appropriate for real-time multiprocessors. Conventional throughput, reliability, and availability by themselves alone have little meaning in the context of control; a suitable combination of these is necessary. New performance measures are required: measures that are congruent to the application, permit the expression of specifications that reflect without contortion true system characteristics and application requirements, in addition to allowing an objective comparison of rival systems for particular applications. We cannot stress too heavily that it is meaningless to speak of the performance of a computer out of the context of its application. The form the performance measures take must reflect the needs of the application, and the computer system must be modelled within this context. The multiprocessor controller and the controlled process form a synergistic pair, and any effort to study the one must take account of the needs of the other. It is important that performance measures should depend on variables that can be definitively estimated or objectively measured. It is our policy in this dissertation, therefore, to always base performance indices on experimentally-measurable quantities: controller response times for the various system tasks.

It must also be realized that there is a distinction to be drawn between the measurement of performance parameters and their interpretation. In parameter measurement, we are concerned, for example, with the ease and accuracy with which the parameter can be measured. On the other hand, interpretation consists of a procedure to integrate the results of the measurement into a complete picture of the computer's performance. Different parameter values (reliability, throughput, etc.) can depend on one another in a quite complex way: we do not have the luxury of assuming that they are independent of each other. We have either to present computer performance as a vector (which makes comparison between different systems difficult) or to derive an objective metric for the performance vector. It is the purpose of our research to develop one such metric and then use it for the design and analysis of real-time computers. Performance measures that partially meet real-time requirements have been suggested by the following authors. Beaudry [3] considers measures emanating from the volume of computation from a computer system over a given period of operation. Mine and Hatayama [41 consider job-related reliability, by which they mean the probability that the system will successfully complete a certain job. Huslende [5I attempts to be as general as possible, and presents what amounts to a re-statement of Markov modeling with traditional measures. Chou and Abraham [6] present performance-availability models, Castillo and Siewiorek [7J performance-reliability models. Osaki and Nishio [8] consider the "reliability of information", by which they mean the expected percentage of wrong outputs per unit time in steady state. All these measures consider the computer system in isolation, i.e. without explicit regard to the requirements of the operating environment. For this reason,

they are quite unsuitable for use in real-time control situations. Among all existing performance measures, Meyer's performability [9J seems to meet, though in an abstract form, the real-time requirements discussed above. His measure explicitly links the application with the computer by listing "accomplishment levels", which are expressions of how well the computer has performed within the context of the application. His work focuses on the development of a framework for modeling and performance evaluation, rather than on methodology for deriving the performance measures themselves. No guidelines are given for appropriately specifying the accomplishment levels: what is provided is a set of mathematical tools for their computation, once they have been defined. Meyer's more recent work continues this trend, developing the theory of stochastic Petri nets. By contrast we focus, in part of this dissertation, on presenting a methodology for objectively characterizing and determining controller performance. If one wished to translate our work into the terms of Meyer's performability, we show how to derive a set of uncountably many accomplishment levels that are completely objective and capable of definitive estimation and/or measurement. It is this that makes our measures complementary to that of Meyer. The next step is to apply the performance measures to design and analysis of real-time computers. We consider problems of of trading off the number of processors in a system against their processing speed, of reliable synchronization and fault-location, and of generating fault-tolerant schedules for real-time tasks. This dissertation is organized as follows. In Chapter 2, controlled systems are discussed, and Chapter 3 introduces our performance measures. Chapter 4 contains two examples to show how to determine the performance measures: one is an ideal

ized motion control problem and the other is a more realistic example, the aircraft landing problem. Chapter 5 to 7 are allied problems in real-time computer design, in which the use of our measures is demonstrated. In Chapter 5, we introduce and consider the number-power tradeoff. In Chapter 6, synchronization and faultmasking in real-time systems are considered. In Chapter 7, we provide an algorithm for the generation of fault-tolerant schedules. The dissertation concludes with Chapter 8.

CHAPTER 2 REAL-TIME SYSTEMS Figure 1 shows the block diagram of a typical real-time control system. The inputs to the control computer are from sensors that provide data about the controlled process, and from the environment. This is typically fed to the control computer at regular intervals. Data rates are usually low: generally fewer than 20 words a second for each sensor. The job list represents the fact that all the control software is pre-determined and partitioned into individual jobs. Central to the operation of the system is the trigger generator that initiates execution of one or more of the control programs. In most systems, this is physically part of the controller itself, but we separate them here for purposes of clarity. Triggers can be classed into three categories. (1) Time-generated trigger: These are generated at regular intervals, and lead to the corresponding controller job(s) being initiated at regular intervals. In control-theoretic terms, these are open-loop triggers. (2) State-generated trigger: These are closed-loop triggers, generated whenever the system is in a particular set of states. A simple example is a thermostat that switches on or off according to the ambient temperature. For practicality, it might be necessary to space these triggers by more than a specified minimum 8

ENVIRONMENT CONTROLLED - ESR SENSORS JOB L~IST PROCESS see TRITGGER GENE RATOR *00 STATE ESTIMATES ACTUATORS C ONTROeLLER DIS-PLAY OPERATOR F 1gure r. A Typical Real-Time Control System.

10 duration. If time is to be regarded as an implicit state variable, the timegenerated trigger is a special case of the state-generated trigger. One can also have combinations of the two. (3) Operator-generated trigger: The operator can generally over-ride the automatic systems, generating and cancelling triggers at will. The output of the controller is fed to the actuators and/or the display panel(s). Since the actuators are mechanical devices and the displays are meant as a human interface, the data rates here are usually very low. Indeed, a control-computer system generally exhibits a fundamental dichotomy from many points of view. Firstly, the I/O is carried out at rather low rates (the only exceptions to this that we know of are control systems that depend on real-time image-processing: such applications have extremely high input data rates), and the computations have to be carried out at very high rates owing to real-time constraints on control. Secondly, the complexity of the data processing carried out at the sensors and the actuators is much less than that carried out in the main data-processing area. Thirdly, the sensors, actuators, and the associated equipment are entirely dedicated to the performance of a particular set of tasks, while the hardware in the region where the complex data processing takes place is usually not dedicated. It is therefore possible to logically partition real-time computer systems into central and peripheral areas. The peripheral area consists of the sensors, actuators, displays, and the associated processing elements used for the pre-processing and formatting of data that is to be put into the central area, and the "unpacking" of data that are put out by the central area to the actuators and/or displays. The central area consists of the processors and associated hardware where all the higher-level

11 computation takes place. Designing the peripheral area is relatively straightforward; the most difficult design problems that arise in these systems usually concern the central area. Figure 2 and Table 1 emphasize these points. A control system executes "missions." These are periods of operation between successive periods of maintenance. In the case of aircraft, a mission is usually a single flight. The operating interval can sometimes be divided down into consecutive sections that can be distinguished from each other. These sections are called phases. For example, Meyer et al. [111 define the following four distinct phases in the mission lifetime of a civilian aircraft: (a) Takeoff/cruise until VHF Omnirange (VOR)/Distance Measuring Equipment (DME) out of range. (b) Cruise until VOR/DME in range again. (c) Cruise until landing is to be initiated. (d) Landing. The current phase of the controller partially determines its job load, job mix, job priorities, and so on. A real-time system typically has to function under more constraints than does its general-purpose counterpart. Firstly, there are hard deadlines, which if missed, can lead to catastrophic failure. Timing is therefore crucial to job execution. Secondly, there are physical constraints that are not quite so restricting for the general-purpose computer. Examples are weight and power consumption. The applications software has the following properties.

SENSORS, ACTUATORS, DISPLAY, PILOT COMMANDS C E NT RAL CLUSTER PERIPHERAL CLUSTER Figure 2. Schematic Decomposition of a Real-Time Control Computer.

Peripheral Area Central Area Low baud rates High baud rates Complete dedication Complete generality of function Low-capability processors High-capability processors Simple interconnection structure Complex interconnection structure Almost totally decoupled processors Processors highly coupled in many cases Trivial executive software Complex executive software Table 1. Difference between Central and Peripheral Areas.

14 (1) The interaction between individual processes is minimal. (2) The effects of (computing) processes upon one another is well understood. (3) Clear lines of authority are recognized. (4) Clear lines of information flow are recognized. (5) The products of the (computing) processes are well defined. These are precisely the five conditions for efficiency in a distributed system as listed by Fox [12]. Because of this and also due to their potentially high reliability, distributed systems are particularly suited to real-time use. Also, the problems that arise when one attempts to partition programs in general-purpose applications for implementation on a distributed computer do not usually arise in the real-time context. The software for a control computer is not so much a single partitionable package, as a set of cleanly interacting subroutines. Macro-instruction languages show much promise in this context [13]. The constraints on real-time systems, as well as the properties of the applications software, have a very great influence on the system architecture and the executive software. The overall computer system has to be much more reliable than any of its components, so that fault-tolerance is essential. Massive replication of hardware is commonplace, as also are high interconnection-link bandwidths. The system must be as symmetric as possible so that reconfiguration is easy. The nature of the executive software must reflect constraints on time and resources. The executive is responsible for the control of queues at shared resources, for the scheduling of events, for the handling of interrupts, and the allocation of

15 memory. While all these tasks are common to general-purpose systems, the existence of hard deadlines makes the efficient execution of such activities imperative. The designer of the real-time system does not have the luxury of assuming that occasional serious degradation of performance is acceptable, if unfortunate. An additional important task of the executive is fault-handling and recovery. This includes reconfiguration where that is possible, and the rescheduling of tasks upon processor failure. Here again, the constraints on time make this a difficult problem.

PART II THE PERFORMANCE MEASURES 16

CHAPTER 3 THE PERFORMANCE MEASURES A. Terminology and Notation A real-time computer executes pre-defined control jobs repeatedly, upon environmental or other stimuli. A job is a well-defined stretch of software, e.g., a subroutine. Each job maps into one or more tasks. The mapping is determined by the current state of the controlled process. This is further clarified later in this Chapter. System response time is defined as the time between the initiation/triggering of a control job and the actuator and/or display output that results. This quantity is the sum of controller response time and actuation time. Environmental or other occurrences trigger the tasks, a unique version being created as a result. This is said to be an extant version as long as it continues to execute in the system. Versions of task i are denoted by Vi, which represents the tth execution of task i. Denote the response time of a no-longer-extant version Vi by RESP( Vi,). The response time of an extant version is undefined. The extant time of a version Vj1 triggered at time r, when the system is in state ni is given by _( Vn,rj,n"j,t) = min(t-r,,, RESFI Vii)). Note, however, that this does not imply that no state changes occur during the course of a task execution. RESP(VV) is an 17

18 implicit function of nii. A controller task is said to be critical if it has an associated finite hard deadline [141, which if exceeded by any of its versions, results in catastrophic or dynamic failure. Hard deadlines do not exist for non.critical tasks; although in some cases, it might be convenient to pretend they exist and set them at infinity. Ordinarily, repair to the real-time computer is not allowed while the computer is in operation. In this connection, we define the mssiion lifetimc as the duration of operation between successive stages of service. We let the mission lifetime be a random variable with probability distribution function L4t). At the beginning of a mission (i.e. immediately after service), a system is assumed to be free of faults. B. Definition of the Performance Measures Our performance measures are all based on the extant and response times. For critical tasks i, with hard deadlines tdi, the cost function is defined by: 9) if 0 < _ < td( oo)- I iif E > tdi where g, is called the finite cost function of task I, and is only defined in the interval [0, tdd, and we omit for notational convenience the arguments of 2, the extant time of the task. For non-critical tasks, the same definition for the cost function can be used, with the associated hard deadline set at infinity.

19 Let qu(t) denote the number of times task i is initiated in the interval [O,t). Then, the cumulative cost function for the task i is defined as rat) = — E cj-=( Vi,,,ni,t)) (2) 3=l and the system cost function is defined as s( t) E r,( (3) s=1 where r is the number of tasks in the system. Both r, and S are clearly defective random variables. Our performance measures are then given by: 00 Coat Index, K(X) fProb{S(t) <XdL( t) (4) 0 00 Probability of dynamic failure, P,,- f Probf S(t)=oo}) dL(t) (5) 0 Afean Coat, M f E (S(t) I no hard deadlinsca arc missaed)dL(t) (6) 0 00 Variance Cost, V f Var {S(t) I no hard deadlines are miased}dL(t) (7) 0 where E({lj} and Var({l}) represent conditional expectation and variance, respectively. The probability of dynamic failure subsumes the traditional probability of failure (called here for distinction the probability of static failure ) since the latter can be viewed as the probability that the expected system response time is infinity. Clearly, in the case of non-critical tasks, the probabilities of static and of dynamic failure are equal. The following auxiliary measures are useful when one focuses on the contribution to the cost of individual tasks.

20 00oo Cost Indez for task i, Kx) - f Prob({r(t)<x)dL(t) (4a) 0 00oo AMean Cost for task i, M, - fE {rt) I no hard deadlines are missed) dLt) (5a) 0 00oo Variance Cost for task i, Vi f Var {,(t) no hard deadlines are missed) dLt) (7a) o The computation of these measures can sometimes be complicated by the fact that the mission might end while one or more versions are still extant. In most instances, however, the mission lifetimes are very much longer than individual task execution times -- for example, several hours for aircraft and several days or even months for spacecraft -- and the number of times tasks are executed to completion before the mission ends is also very large. For this reason, it is usually an acceptable approximation to compute the costs assuming that all jobs that enter the system during the mission complete executing before the mission ends (as long as they do not miss any hard deadlines). In what follows, we consider how to determine these performance measures, beginning with the determination of the hard deadline. C. Obtaining the Hard Deadline The dynamics and the nature of the operating environment of the critical process are both known a priori. This follows from the critical nature of the process -- for example, the dynamics and operating environment of aircraft have both been studied carefully - and advances in the theory and design of controlled processes.

21 The process can most conveniently be expressed by a state-space model. Let xER" denote the process state, uER" the input vector, and t the time. The input vector is made up of two sub-vectors, UER ' and UERm'. uc denotes the input delivered at the command of the computer, and u, the input generated and then applied by the operating environment. We characterize state transitions by the mapping #:TX TXXX U-NX where TCR represents time, XCR" the state-space and UCRm the input space. x(t) = (t,to0,x(to),u) (9a) Measurement of the system is described by a vector yER1 and a mapping,1:XXUX T. y(t) = - (x(t),u(t),t) (9b) Catastrophic failure can follow if the process leaves the "safe" region of the state space. For example, a boiler may explode if its temperature becomes too high. This is formally expressed by defining an allowed state epace, Xjt), which defines the "safe" region of operation. The task of the controller or real-time computer is to derive the optimal control, uc(t), as a function of the perceived process state. Since the response time, denoted by w, is positive, we have uc(t) h(x(t-w),u(t-w),t) where h expresses the control algorithm for the task in question. Then, the hard deadline associated with this task is given by the maximum value of w that may be permitted if the process is to remain in X, with probability one. More precisely, the hard deadline associated with controller task a triggered at to when the system is in state x(to) is given by: t(x(t0o)) — einm up(r I (to+r,to,x(to),u)Ex, (10)

22 where fl is the admissible input space. One can also define conditional hard deadlines if it is only required to perform the computation over a certain subset of the admissible input or state space. The conditional hard deadline of task a, denoted by tdalw,,, is defined as tdalw,(X(to))- inl sup{r l (to+r,to,x(to),u)EaCX.) (11) uEwCf" The hard deadline, defined in this general way, is a function both of the process state at the moment of task initiation, and of time. It is a random variable if the environment is stochastic. Consider the following example to see how the hard deadline can be determined. Example 1: A body of mass m is constrained to move in one dimension. Its statevector consists of three components: position (zl), velocity (*2), and acceleration (z3). The allowed state-space is defined by X=(x I IzlLb, *<oo, i3<oo) where b>O is a constant. The body is subject to impact from either direction with equal probability. Each impact changes the velocity of the body by k>O units, in the appropriate direction. The change of velocity takes place in a negligible duration. The body has devices that can exert thrust of magnitude H in either direction. This thrust is imposed only after the controller has recognized an impact and has determined how to react to it. It takes a negligible amount of time to switch the thrust on or off in either direction. The controller's job is to bring the body to x=0. The controller operates in open-loop: when it recognizes an impact, it computes the thrusts as a function of time, following which the control response is assumed to be instantaneous. The problem now is to compute the hard deadline associated with this task. The hard deadline is only a function of the state and X, In computing it, we do not need to take into account the possibility of a second impact before the

23 controller has finished responding to the first, since the state of the process contains all necessary information. The allowed state space is static and simply connected, so that if when the body is brought to rest for the first time following the impact it is in X, it must have been within X, throughout the period following the impact, assuming only that it was within X, at the moment of impact. Therefore, we have only to compute the position of the body when it first comes to rest (after the impact) as a function of the response time, (, and set the hard deadline equal to the greatest t for which the body comes to rest within X,. Let the initial state of the body be X.T -z, z2i, Z3]. Since the impact duration and the switch-off or on time for the thrust are assumed to be zero, we can always take z3,=. Define tl I z2+k] 1i t2 I j[2-kl I By an elementary derivation, we arrive at the following. Ca8e 1: z2i<-k: td (xg)=min b + zli + (z2i+k)t1 + b+ zi + (z2 -k)t2 -(zi+ k) k- z2i For future convenience, denote the right hand side of the above by tl1). Case 2: 12;<k: t i).=min 6b - zji - (z2i+k)tl + b + Zi + (z-k)t + tx)M2m 2m (13)

24 Denote the right hand side of the above equation by t(2) Case 8: z2> k: tfxi) —mi b - Zli - (Z+k)tl+ 2 b - zli - (z-k)t2 +(14) [x =mi+ k + - k I4 Denote the right hand side of the above equation by t(). If the velocity imparted to the body upon an impact is not constant at k, but is a random variable (which would be more realistic) the magnitude of which has probability distribution function Fimpgct, then the hard deadline will be a random variable, whose distribution is a function of the state at the moment of impact. The following can be written down by inspection: Case 1, z2i < 0:,( 2) with probability Fimp,ct (I.2il) td (xi) = 61) with probability 1 - Fimpgct (122il) Case 2, x2i>O: t(2) with probability 1 - Finmpa4c (IZ2il) t (xi) - t(3) with probability Fimp.ac (12hil) We could similarly treat the case when the allowed state-space is stochastic. The above example is meant only to illustrate the hard deadlines and should not lull the reader into a false sense of security. Obtaining closed-form expressions for the deadlines of any but the most trivial systems and static allowed space is usually extremely difficult, if not impossible. For example, if we relax the assumption that the controller acts in open-loop, the equations of motion become too diffi

25 cult to solve exactly in closed form. It is generally necessary to resort to numerical methods to obtain deadlines for real-life systems. Since most of the state-spaces one uses in practice have uncountably many points, we must define hard deadlines as functions of sets of states, not of the states themselves, if the entire allowed state-space is to be covered. Subdividing the state-space into these subsets while keeping errors low is not always easy. For an example of subdivision where the application is the control of aircraft elevatot deflections, see the case study that follows in Chapter 4. A further remark is in order here. The hard deadlines are not dependent upon the performance functional (time, energy, etc.) that the controller is attempting to optimize, since the paramount duty of the controller is to keep the system within the allowed state-space, and only secondarily to optimize the performance functional. D. Obtaining the Finite Cost Functions Performance functionals have been known for a long time in control theory as optimization criteria and measures of controlled process performance. We exploit this fact to derive the control-computer cost functions, by linking directly the performance of the controller to the value of the controlled process performance functional that results. Performance functionals in control theory are functionals of system state and input and express the cost of running the process over some interval [to, t1 ]. The

28 performance functionals can be stated as: e(x0, to, t) = [E f10 (x(t), u(t),, t) I y(r), to < r < t ldt (17) where x(to) = xO, and fo is the instantaneous performance functional. Since the controller response time affects the state trajectory of the controlled process, it affects the performance functional as well. If we use the expected contribution to the performance functional, e(x0, to, ti), of the control delivered as a result of executing task i with response time (, we can derive the finite cost function as: gi, () (x,) () - l(x,O) for 0 < ( < tdi (18) 0 otherwise where Q(x,() denotes the contribution to e(xo, to, t,) of a task with response time (, and initiated when the process state was x. By doing so, we can directly couple the response time of the controller to the fuel, energy, or other commodity by the controlled process. See Figure 3. Notice that while we use response time to compute the cost function, the finite costs were originally defined as functions of the extant time. The latter is the case since we wish costs to accrue as the execution proceeds, so that the system cost function is continuous. This ensures if two systems are compared under an identical load with the first faster than the second with regard to a particular task, that the faster system will never exhibit a mean cost greater than the slower system as long as both have approximately the same probability of dynamic failure. The possibility of correlation of successive tasks can complicate calculations considerably. To see this, take the system in the example above. If a second impact comes in before the system has finished reacting to the first, then, the energy or

27 execution with to0 gi )). 2 2) 3 4I_ ~~I /__~~~~~~~~ecutie I ail/ trigger trigger trigger trigger 2 3 4 (j = RESP( Vij) for j 1,2,3,4. Figure 3. Illustration of Cost Functions.

28 time to be expended will not, in general, be the sum of the energies or the times that would have to be expended if the second impact had arrived after the system had finished reacting to the first (i.e. had arrived at x=O). Assuming that successive tasks are decoupled leads to a certain measure of "double-counting" of the energy or time spent. The same remark would apply to fuel, force, or any other performance functional used for the controlled process. Due to this double-counting, assuming that successive tasks are decoupled leads to an upper bound to the energy, time, or other quantity expended. If we find an upper bound acceptable, we can simplify our computations greatly. If exact figures are called for, a detailed and complicated model has to be worked out in which each instance of inter-task coupling is itemized and its probability of occurrence computed. Whether or not this is worth the effort depends entirely on the requirements of the analysis. There is also an irritating anomaly. Since the mean costs are defined by an expectation that is conditioned on not failing in the mission lifetime, it is possible to construct pathological examples where a system with a probability of dynamic failure of, say, 0.5 over a given lifetime, will exhibit a lower mean cost over that lifetime than another that has a Pd.. of 10-10: we shall see examples of them in Chapter 5. Such cases are, however, generally no more than an academic curiosity. Example 2: Consider again the controlled process described in Example 1. This time, we set out to compute the finite cost function associated with the task under review. As before, assume that the state of the body at the moment of impact is given by xT=[z1,,z2,,z3j. We make the assumption that a function that provides an upper

29 bound of the cost expended is sufficient, so that it is not necessary to consider the correlative effect of successive jobs. We provide cost functions relating to two different control policies. Case A: Assume that the duty of the controller is to bring the body back to x=O within as short a time period as possible. (Note that x=O means that all three components -- position, velocity, and acceleration -- are zero). The cost function is the time taken. This is the well-known minimum-time problem in optimal control theory [151. If, after the impact, the body is moving away from zl=O, it must be stopped, and brought back using bang-bang control. If it is moving toward zl=0, depending on the velocity after impact and the response time of the controller, the body is either first accelerated toward z1=O and then decelerated, or first brought to a stop on the other side of zl==O and then brought back to the origin using bang-bang control. The derivation of the time taken is elementary, if tedious, and is excluded. See the Appendix for expressions of the finite cost function under such a control policy. The case when the velocity imparted upon impact is not constant, but a random variable, can be handled as in Example 1. Case B: Suppose the controller is to minimize the energy expended while, after every impact, keeping the body within the allowed state-space. Then, the control policy is simply to bring the system to rest anywhere inside Xa, and the cost function is in terms of energy. If the controller computer responds to an impact within the hard deadline, it can by definition, keep the system from failing. As may easily be verified, the energy expended in doing this is the energy required to bring the body to rest, which is equal to the energy of the body immediately after impact. Therefore, as long as the allowed state-space is not violated, the energy expended will remain the same no matter what the response time (assuming that the response

30 time is within the hard deadlines derived in the preceding section). So, the finite cost function over the entire allowed state-space is here the zero function, which signifies that, as long as the hard deadlines are honored, it makes no difference to the overhead under this control policy, as to what the response time may be. This example has served to emphasize the intimate relation between control policy and controller (finite) cost function. The same system, with the same constraints on the allowed state-space, has different cost functions based on what the duty of the controller is. It reflects our goal of having the cost functions express the control overhead in the context of the application. This, in fact, distinguishes our measures from those extant in the literature. See Table 2 for a comparison between our measures and those of others. Once again, it should be noted that the simplicity of the above expository examples does not usually exist in real-life systems. Real-life analyses are much more difficult, and the same comments as applied to Pde, above apply to the mean cost, also. There is also one additional complication. The controller is to optimize the performance functional subject to the condition that the system must not leave the allowed state-space, if this is at all possible. Such a difficulty did not arise in this simple example, but it can sometimes prove difficult to obtain optimal control policies under this requirement. This, however, is a problem for the designer of the controlled process, not of the controlling computer. See Chapter 4 for a computation of the cost function in a realistic case.

31 Other Measures Our Measures Wide applicability to almost all Limited Applicability. Aims applications of fault-tolerant, specifically at real-time, especially gracefully-degrading systems. at control, application. Measures express performance in Measures express performance in rather gross terms. rather exact terms. Performance linked to charac- Performance measures specifically teristics of the computer alone. designed to reflect the overhead of the computer on the real-time system. Table 2. Comparison of Traditional and New Methods of Characterizing Performance

32 E. Remark on Finite Cost Functions It is not necessary that the process performance functional that is used to derive the cost function (e.g. energy, fuel, time, etc.) be the same as the process performance functional that the system is trying to optimize. For example, the system may be given the task of optimizing fuel, and the cost function may be measuring the extra energy consumed as a result of controller delay. However care should be taken to ensure that the two functionals (the one used to optimize the process, and the one used to express the cost with) do not conflict. For the functionals not to conflict, the optimal control actions (i.e. the controller decisions) taken on the basis of one functional should be identical to the optimal control actions that would have been taken on the basis of the other. For example, if the fuel consumed were linearly related to the energy expended, the cost function could be expressed in terms of energy, while the controller was trying to minimize the fuel used. To see why conflicts must not be allowed, assume in the system of the above example, that the job of the controller is to minimize the time taken in bringing the body back to the origin, while the cost function is in terms of energy. Take the instance in which the speed of the body after the impact is a slow motion toward zl=O' The controller should in such a case apply full thrust throughout the motion, first speeding the body up toward zl=O, and then slowing it down to reach x=O in minimum time. However, since the cost function is in terms of energy, not time, it is easy to see that the shorter the time period over which the controller exerts thrust, the smaller is the value of the cost measured. Thus, over a certain range of states, the cost function would actually decreaac with an increase in response time. This is

33 not only counter-intuitive, but also results in inefficient operation. Task priorities, scheduling policies, etc., for the control computer are meant to be derived to optimize the mean cost as expressed by the cost functions. If inconsistencies such as the above arose, the operating system of the controller would tend to oppose the goals inherent in its own applications software, resulting in an unsatisfactory overall computer-controlled system. F. Allowed State-Space and Its Decomposition As we said above, it is difficult to determine the hard deadline and the finite cost function as a function of the state over the entire state space. The solution of the controlled process state equations cannot usually be obtained in closed form when controller delay is considered. To obtain the functional dependence of the hard deadlines or the finite cost function of each controller job on the current state vector is therefore impossible to do analytically, and prohibitively expensive to do numerically for a large number of sample states. To get around this problem, we divide the allowed state-space down into subspaces. Subspaces are aggregates of states in which the system exhibits roughly the same behavior. Even if there do not exist clear boundaries for these subspaces, one can always force the allowed state space to be divided into subspaces so that a sufficient safety margin can be provided. This is a designer's choice for approximation. In each subspace, each critical controller job has a unique hard deadline. Remark: In some subspaces, a job described in general as "critical" might not be critical in the sense that even if the execution delay associated with it is infinity,

34 catastrophic failure does not occur. That is, the associated hard deadline may be infinity for a particular subspace. What does usually happen in these circumstances is that the system moves into a new subspace -- or at the least toward the subspace boundary -- in which the dangers of catastrophic failure are greater. In this subspace, the requirements on controller delay are more stringent, and there might well be a hard deadline, representing a critical task. Thus a "critical" job need not be truly critical in every subspace, it only has to map into a critical task -- defined in the sequel -- in at least one subspace. Also, subspaces are job-related, i.e. the same allowed state space can divide into a different set of subspaces for each control job. For convenience, a controller "task" is defined as follows. Definition:-A controller task, often abbreviated to "task", is defined as a controller job operating within a designated subspace of the allowed state space. Let Si for i=O,1,...,a be disjoint subspaces of XA with XA = U Si and let J denote a controller job. Then, we define the projection: (J, XA) - ((To, SO), (T1, S1),...,( T,, S,)) where Tj is the controller task generated by executing J in S;. With each controller task, we may now define a hard deadline without the coupling problem mentioned above. We denote it by tjd for critical task Ti (for convenience, however, the superscript J will be omitted in the sequel). We will see that a critical job can possibly map into a noncritical task for one or more allowed subspace; it only needs to map into a critical task in at leaet one such subspace to be considered critical.

35 Allowed State-Space The allowed state-space is the set of states that the system must not leave if catastrophic failure is not to occur. Consider the two sets of states XI and X2 defined as follows. (i) XX is the set of states that the system must reside in if catastrophic failure is not to occur immediately. For example, we may define in the case of an aircraft, a situation in which the aircraft flies upside down as unacceptable to the passengers and as constituting failure. Notice that terminal constraints are not taken into consideration here unless the task in question is executed just prior to mission termination. (ii) X2 is the set of acceptable states given the terminal constraints, i.e., it is the set of states from which, given the constraints on the control, it becomes possible to satisfy the terminal constraints. Note that leaving XX means that no matter how good our subsequent control, failure has occurred. (Strictly speaking, of course, there can be no subsequent control since by leaving XX the system has failed catastrophically before the next control could be implemented.) On the other hand, altering the allowed input space, i.e. changing the control available can affect the set XA. The allowed state space is then defined as XA = XX nf XA. Obtaining state-space X1A can be difficult in practice. The curse of dimensionality ensures that even systems with four or five state variables make unacceptable demands on computation resources for the accurate determination of the allowed state-space. However, while it can be very difficult to obtain the entire allowed state-space, it is somewhat easier to obtain a reasonably large subset,

38 XACXA. -By defining this subset as the actual allowed state-space, (i.e., by artificially restricting the range of allowed states), we make a conservative estimate for the allowed state-space. Note that by making a conservative approximation, we err on the side of safety. Also, the information we need about XA may be determined to as much precision as we are willing to invest in computing resources. In what follows, to avoid needless pedantry, we shall refer to the artificially restricted allowed state-space, XA, simply as'the "allowed state-space", XA. On Obtaining the Subspaces The job of dividing XA into S - (S,, S,..., S,) is sometimes made easy by the existence of natural cleavages in the state-space, when the latter is viewed as an influence on system behavior. In most cases, however, such conveniences do not exist, and artificial means must be found. The problem then becomes one of finding discrete subdivisions of a continuum. The method we employ is to quantize the state continuum in much the same way as analog signals are quantized into digital ones. Intervals of hard deadlines and expected operating cost (i.e. the mean of the cost function conditioned on the controller delay time, and using the distribution of the latter) are defined. Then, points are allocated to subspaces corresponding to these intervals. To take a concrete example, consider a state-space XCR" that is to be subdivided on the basis of the hard deadlines. The first step is to define a quantization for the hard deadlines. Let this be A. Then, define subspace Si as containing all states in which the hard deadline lies in the interval [(i-1)A, iA). Alternatively, one might define a sequence of numbers Al, A2,..., such that the subspaces were defined by intervals with the A' a as their end-points. This would correspond to quantizing with variable step

37 sizes. The subspace in which the job under consideration maps into a non-critical task is a special case and is denoted by So. Subspaces can also be defined based on a quantization of the expected operating cost or on both the operating cost and the hard deadlines. We provide an example of subdivision by hard deadlines in Chapter 4. Computational Complexity When closed-form solutions are available for the state equations, they can be solved backwards from the terminal constraints to obtain analytical results for X2A. When closed-form solutions are not available, numerical techniques must be used, and solving differential equations backwards can give rise to numerical instability. In the most general case, the amount of computations required increases rapidly with the number of state variables and the complexity of the state equations; therefore, the exact amount of required computations depends on the solution approach employed and the actual system under consideration. One begins by deriving the optimal trajectory, which requires the problem- and solution- specific amount of computations. Then, obtaining X2A consists of searching the n-dimensional state-space surrounding this trajectory. In general, this becomes an exhaustive search for the boundary of X]A. If the search is confined to an (n+l) dimensional parallelopiped (the n+1-th dimension represents time) with edge lengths klk2,..,k,+,, and the digital resolution is A, then defining m.=rk,/lA, i=l,...,n+l, the state equations must be solved O(II mi) times. The values for ki must be set by the user.

38 In the event that X2A is known to be convex, a binary search can be instituted along one of the dimensions, reducing the search complexity to O(ml. m-lmi+l r n m,+l log2mi) where mi=maxml,...,m,}. As for the cost functions and the hard deadlines, these must also be derived, in general, by an exhaustive analysis. The allowed state-space is quantized, and deadlines and cost functions must be found for each of the resulting "cells". It is therefore clear that for a process with large n, these methods become prohibitively expensive. Two comments are now in order. The first is that this approach is not uniquely susceptible to the curse of dimensionality: all optimal control theory techniques suffer this problem when complex processes are considered. The second comment is that, in certain cases, additional knowledge about the process dynamics can be used to improve the efficiency of the search for the XA boundary, and for the hard deadlines and cost functions. G. A Final Remark on this Chapter The relevance of the finite cost functions and of the performance measures -- mean and variance cost -- that are derived from them should be obvious. What is not immediately apparent is the practical difference between the traditional probability of failure and the probability of dynamic failure introduced here. The standard, devil's advocate, objection runs as follows. Just as faulttolerant systems are designed with considerable physical redundancy, so they can be designed also to allow considerable time redundancy. This would entail, for example, a task having a worst-case run time (given no environmental interference) of,

39 say, a fifth or a tenth of the interval between its release and its deadline. Moreover, the tasks are usually executed many -- five or even ten -- times for every output that is required at the actuator. In such a case, the argument runs, the probability of dynamic failure is as close to the traditional probability of failure as makes no matter. The fallacy in this argument arises from the lack of a fixed worst-case run-time for tasks. Environmental influences can on rare occasions, prolong the execution time of tasks almost indefinitely. A good example of this is lightning strikes on aircraft which cause electromagnetic interference. This appears only recently to have been seriously studied as a source of disruption to avionic electronics. Unfortunately, since preliminary measurements are protected from unlimited disclosure, we have to fall back on a hypothetical example, with hypothetical results. Assume that the prolongation of the execution time caused by lightning follows an exponential law, with a mean of 0.1 second. This is a good figure to choose: if lightning were, on the average, to have a much smaller duration, it would only rarely be noticed. Let the probability of a lightning strike on an aircraft during a flight of 10 hours' duration be 10-3. Imagine a computer using 7-modular redundancy, with processor MTBF's being 10,000 hours, and consider a task that takes 0.01 second to execute under normal circumstances. and which is run continuously in a loop that ends only with the mission. The task is run ten times in a second. Failure will occur if not even one out of ten consecutive runs produces a result within 0.1 second of task release. Denoting the probability that a given processor fails over the ten hour flight by p, the traditional probability of failure - which is the probability that more than 3 of the 7 processors fail during the 10 hours of

40 flight -- is given by p'( -p)7p. With the numerical values as assumed above, this is equal to about 3.5X10-i1. The probability of dynamic failure, however, is dominated by the probability of a lightning strike delaying response beyond the hard deadline, and is lower bounded by the probability of a lightning strike lasting more than one second. Under the parameter values assumed above, this quantity is 4.5X1 0-8, i.e. more than a thousand times the traditional probability of failure, and 45 times the NASA benchmark of 10-9. Furthermore, the fact that the slack time was so large a multiple of the run time, made little difference to the probability of failure. The data used above are purely hypothetical (although credible from a common-sense perspective), so that no absolute inferences may be drawn from these calculations, but the point that we wish to make should be clear: when ultra-reliable systems are being considered, all conceivable sources of failure must be considered, not just the traditional ones of massive hardware failure. Traditional reliability computations do not indicate any need to consider the effects of lightning in this example. Designers attempt to compensate for this oversight by making the task execute many times for each output that is absolutely required. Unfortunately, as we saw in this example, such attempts are not always viable when ultra-reliable systems are being considered.

CHAPTER 4 CASE STUDY A control system executes "missions." These are periods of operation between successive periods of maintenance. In the case of aircraft, a mission is usually a single flight. The operating interval can sometimes be divided down into consecutive sections that can be distinguished from each other. These sections are called phases. As pointed out in Section 2, Meyer et al. [111] define four distinct phases in the mission lifetime of a civilian aircraft. The phase to be considered here is landing, it takes about 20 seconds. The controller job that we shall treat is the control of the aircraft elevator deflection during landing. The specific systeym employed is assumed to be organized as shown in Figure 4. Sensors report on the tour key paraieters: altitude, descent rate, pitch angle, and pitch angle rate every 60 milli-seconds. We have a time-generated trigger, with a time period of 60 milli-seconds. Every 60 milli-seconds, the controller computes the optimal setting for the elevator, which is the only actuator used in the landing phase. (There are other actuators used aboard the aircraft for purposes of stability, horizontal speed control, etc. We do not however consider them here, concentrating exclusively on the control of the elevator.) The execution time for the computation is nominally 20 milli-seconds, although this can vary in practice due to failures. Since the aircraft is a dynamical system, 41

SENSORS, SENSOR & ACTUATOR & GROUND I/O PROCESSORS DISPLAY AIRCRAFT COMMUNICATION -A/D CONVERSIONS CONTROLLER PROCESSORS ACTUATORS AND I/O 'BUFFERING eD/A CONVERSION DYNAMICS PERIPHERALS *FORMATTING 'BUFFERING -BROADCASTING 'CHECKING DISPLAY PI a LOT Figure 4. Aircraft Control System Schematic.

43 the effects of controller delay are considerable -- as we shall see in this Section. Since the process being controlled is critical (i.e. in which some failures can lead to catastrophic consequences), variations of controller delay and other abnormal behavior by the controller must be explicitly considered. For simplicity, we do not allow job pipelining in the controller; in other words a controller job must be completed or abandoned before its successor can be initiated. The following controller abnormalities can occur: (i) The controller orders an incorrect output to the actuator. (ii) The controller takes substantially more than 20 milli-seconds (the nominal execution time) but less than the inter-trigger interval of 60 milli-seconds to complete executing. (iii) The controller takes more than 60 milli-seconds to complete executing. In such a case, the abnormal job is abandoned and the new one initiated. We say that a control trigger is "missed" when this happens. An analysis of controller performance during the landing phase must take each of the above abnormalities into account. A. The Controlled Process The model and the optimal control solution used are due to Ellert and Merriam [16]. The aircraft dynamics are characterized by the equations:

44 1(t) = b11 l( t)+b12z2(t)+ b13z3( t)+c lml( t,() (19a) 2(t) = Zl(t) (19b) =3(t)= b32z2(t)+b3323(t) (19c) z,(t) = 3(t) (19d) where 22 is the pitch angle, zl the pitch angle rate, z3 the altitude rate, and z4 the altitude. ml denotes the elevator deflection, which is the sole control employed. The constants bij and cl, are given in Table 3. Recall that ( denotes controller response time. The phase of landing takes about 20 seconds. Initially, the aircraft is at an altitude of 100 feet, travelling at a horizontal speed of 256 feet/sec. This latter velocity is assumed to be held constant over the entire landing interval. The rate of ascent at the beginning of this phase is -20 feet/sec. The pitch angle is ideally to be held constant at 2 ~. Also, the motion of the elevator is restricted by mechanical stops. It is constrained to be between -35 ~ and 15 ~. For linear operation, the elevator may not operate against the elevator stops for nonzero periods of time during this phase. Saturation effects are not considered. Also not considered are wind gusts and other random environmental effects. The constraints are as follows: The pitch angle must lie between 0 ~ and 10~ to avoid landing on the nose-wheel or on the tail, and the angle of attack (see Figure 5) must be held to less than 18 0 to avoid stalling. The vertical speed with which the aircraft touches down must be less than around 2 feet/sec so that the undercarriage can withstand the force of landing. The desired altitude trajectory is given by

45 Feedback Term Value bl- -0.600 b12 -0.780 b 3 0.003 b32 1 102.4 b33 -0.4 ell -2.374 Table 3. Feedback Equation Constant

DRAG LIFT \t // NZ~t~HORIZONTAL W FLIGHT AIRPLANE PATH a REFERENCE Figure 5. Definition of Aircraft Angles

47 100e-/s0 O<t<15 hit) - t20-t 15<t<20 (20) while the desired rate of ascent is * -20e-5 0<_t< 15 15<t<20 The desired pitch angle is 2 ~ and the desired pitch angle rate is 0 ~ per sec. The performance index (for the aircraft) chosen by Ellert and Merriam and suitably adapted here to take account of the nonzero controller response time f is given by to o( ~) =.f em( t,()dt (22) where t represents time, and [to, tJ is the interval under consideration, and where em( ta,) = -h( t)[hk( t)-Z( t)]2+ ^( t)[h( t)-z3( t)12+A( t)[Z2 ( t)-Z2( t)]2 + (t) [x i(- t( t)12+ [ml t f'12 (23) where the d-subscripts denote the desired (i.e. ideal) trajectory. To ensure that the touch-down conditions are met, the weights q must be impulse weighted. Thus we define: h(t) = +4(t) + 04,t6(20-t) (23a) Oh(t) = 43(t) + 03,t6(20-t) (23b) 4e( t) = 02,9 t)b(20-t) (23c) A(t) =,1(t) (23d) where the functions q must be given suitable values, and 6 denotes the Dirac-delta function. The values of the 4 are given based on a study of the trajectory that

48 results. The chosen values are listed in Table 4. The control law for the elevator deflection is given by: ml( t,t) = w,2K T,[kll( t-.)-kll( t-t)zl( t-)-k12( t-2) (Z( -) (24) where the aircraft. parameters are given by: K, -0.95 sec-l, T = 2.5 sec, w = 1 radian sec1I and the constants k are the feedback parameters derived (as shown in [16]) by solving the Riccatian differential equations that result upon minimizing the process performance index. For these differential equations we refer the reader to [16]. B Derivation of Performance Measures We consider here only one controller task: that of computing the elevator deflection so as to follow the desired landing trajectory. The inputs for the controller here are the sensed values of the four states. We seek the following information. As the controller delay increases, how much extra overhead is added to the performance index? Also, it is intuitively obvious that too great a delay will lead to a violation of the terminal (landing) conditions, thus resulting in a plane crash. This corresponds to dynamic failure, and we are naturally interested in determining the range of controller delays that permit a safe landing. Consider first a formal treatment of the problem. The control problem is of the linear feedback form. The state equations can be expressed as:

Weighting Factor Value 02,t~ t) O 20.0 3( t) (o< t< 15) 0.0 +3(t) (15~t<20) 0.0001 1.000 04 Q 0.00005 0+4, t,0.001 Table 4. Weights for the Performance Index

50 x(t) = Ax(t) + Bu(t) (25) where the symbols have their traditional meanings. Define the feedback matrix by E(t). Then, clearly, u(t) = 2(t-0)x(t-) (26) For a small controller delay (i.e., a small f), the above can be expanded in a Taylor series and the terms of second order and higher discarded for a linear approximation. By carrying out the obvious mathematical steps, we arrive at the equation: x(t)= E(t,()x(t) + f(() (27) as representing the behavior of the controlled process, assuming that the initial conditions are given. For further details, see Figure 6. Given a closed-form expression for the kit) that appear in E(t,(), we could then proceed to study the characteristics of the system as a function of the matrix E. However, in the absence of such closed formulations for the k-i, we must take recourse to the less elegant medium of numerical solution. The procedures we follow for obtaining the numerical solution are as follows. First, the feedback values are computed by solving the feedback differential equations that define the kii. These are not affected by the magnitude of the controller delay. Then, the state equations are solved as simultaneous differential equations. These are used to check that the terminal constraints have been satisfied, and in the event that they are the performance functional is evaluated. This procedure must be repeated for each new subspace. Since the environment is deterministic in this case (no wind gusts or other random disturbances are permitted in the model), the hard deadline associated with each process subspace is a constant and not a random variable.

61 0 1 0 1 E(t,j) =10 b32 b33 0 001 0 where adl =[l-Clkll(t)]l-' [bll-kll( t)cl-c'~ (Ob(t)+2bllkll(t)+ l a=2 = [l-c~jkjf )- {b2-IC:lkl I(t lblll( t )+b2k,( +k( t)-,2 ()} a13 [= l-c2lkll(t)J-l [1b,3-Clk13(t)+b13kll(t)+k23(t)-C2lklj(t)k13(t)}J a14 =-1-cl1k11(t)l-' [-kl4(t)-bllkl4(t)-k24(t)C+c lkll(t)kI4(t)J b32 = Horizontal Velocity/ T, b33 = -1/ T When the execution delay is Q, the approximate state equations are C11 k t)+ (t) t)+bk(t)+k x(t) = E(t,)ox(t) + O Figure 6. The Approximate State Equations.

52 The trajectory followed by the aircraft when the delay is less than about 60 milli-seconds follows the optimal trajectory closely, although the elevator deflections required would be intuitively assumed to increase as the delay increases. Also, the susceptibility of the process to failure in the presence of incorrect or no input is expected to rise with the introduction of random environmental effects. The control that is required for various values of controller delay is shown in Figure 7. Due to the absence of any random effects, elevator deflections for all the delays considered tend to the same value as the end of the landing phase (20 seconds) is approached, although much larger controls are needed initially. In the presence of random effects, the divergence between controls needed in the low and the high delay values of controller delay is even more marked. We present an example of this in Figure 8. The random effect considered here is the elevator being stuck at -35 ~ for 60 milli-seconds 8 seconds into the landing phase due to a faulty controller order. The controlled process is assumed in Figure 8 to be in the subspace in which the landing job maps into a non-critical process (defined in the sequel as SO). The diagrams speak for themselves. We shall show later that this demand on control is fully represented by the nature of the derived cost function. Also, above a certain threshold value for controller delay, we would expect the system to become unstable. This is indeed the case in the present problem, although this point occurs beyond a delay of 60 milli-seconds for all points in the allowed state space (obtained in the next section), which cannot by definition occur here.

DEFLECT2cN IRAD) DEFLECTION Ififl) -0.63 -0.42 -0 21 -0.00 0 21 0.42 -0.03 -0.42 -0.21 -O 00 0.21 0.42 0~~~~~~~~~~ DEFLECTION4 Iiifi) DEFLECTJOtN tpnf)O - -6.42 -0.21 -0.0 0.21 0.42 -0.63 -0.42 -0.21 -C.GD 2.21 0.42 CIO.~~ 0 r,~~~~~~~~~~~~~~~~~~~c C.~~~~~~~~~. 0-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~b 0 0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~( (A~~~~~~~~~~~~~~~~~~~~~~~~~~~~(

0CrLE(:rJ0l if-f CULECTICN U xift,', -06G3 -0.4? -0.21 -V U 0 21 0 42 -0.63 -0.42 -0.21 -O.J 0 21 0 42 00 C CO~~~~~~ 0 -0 63 -3.42 -02 04 0.21 0.4 2 -0.63 -0.42 -U 2 -u "Io 0.21 0 4? II~ ~ ~ ~ ~ ~~O 0 ap 3 o,~~~~~~~~~~~~C 03 ~ ~ ii.

55 C Allowed State Space In this subsection, we derive the allowed state space of the aircraft system. To do so, note that in Ellert and Merriam's model, X1 does not exist. The reason is that the state equations do not take into account the angle of attack. In the idealized model we are considering, it is implicitly assumed that the constraint on the angle of attack is always honored, so that the only constraints to be considered are the terminal constraints. The terminal constraints have been given earlier but are repeated here for convenience. The touchdown speed must be less than 2 feet/sec in the vertical direction, and the pitch angle at touchdown must lie between 0~ and 10 ~. To avoid overshooting the runway, touchdown must occur at between 4884 and 5120 feet in the horizontal direction from the moment the landing phase begins. The horizontal velocity is assumed to be kept constant throughout the landing phase at 256 feet/sec. Thus, touchdown should occur between 19 and 20 seconds after the descent phase begins. The only control is the elevator deflection which must be kept between -35 ~ and 15~. The restriction that we place on the allowed state-space, XA, is for ease of representation. We define allowed "component state-spaces" for the individual components of the state-vector such that any point in the 4-dimensional state-space whose individual components each lie in the corresponding component state-spaces is in the overall allowed state-space. The component state-spaces are determined by perturbing the state values around the desired trajectory and determining the maximum pertubation possible under the requiement that no terminal constraint be

56 violated. They were obtained by a search process and are plotted in Figure 9. It should again be stressed that what we plot in Figure 9 is a subaet of XA. D Designation of Subspaces We subdivide the allowed state-space found above using the method described in Section 3. The criterion used is the hard deadline, since the finite cost function (derived in the next subsection) is found not to vary greatly within the whole of the allowed state-space. The value of A chosen is 60 milli-seconds. In other words, we wish to consider only the case where a trigger is "missed." The allowed state-space in Figure 9 is subdivided into two subspaces, So and S1. These correspond to the deadline intervals [120, oo) and [60, 120) respectively. So is the non-critical region corresponding to the [120, oo) interval. Here, even if the controller exhibits any of the abnormalities considered earlier, the airplane will not crash. In other words, if the controller orders an incorrect output, exhibits an abnormal execution delay or simply provides no output at all before the following trigger, the process will still survive at the end of the current inter-trigger interval if, at the beginning of that interval, it was in So. On the other hand, if the process is in SI at the beginning of an inter-trigger interval, it may safely endure a delay in controller response. However, if the controller behaves abnormally in either providing no output at all for the current trigger cycle or in ordering an incorrect output, there is a positive probability of an air crash.

57 C) CD L.: C=' c~ ~~ De sire d Trajectory y~~~~~~~~ co~~~~~~~~~~~~~~~I CCD (Z / ~~~~~~~~~/ cmso~~~~~~~~~~~~~~~~~~~~7 w~~~~~~~~~~~~~~~ I iii ~~~~~~~~,TM(EC) S /~~~~ j00~~~~~ I /So o~ ~ ~~~~.;/ H~~~~~~~1 / / C:) I / I / S_ t1/ \! CD C:D ~ ~ ~ fU ] ~i ~1I i I ~ I1' —1~ C.000 4.00 - 8.00 1~2.00 16.00 20.00 tll "E (SEC) Figure 9(a). Allowed Stale Space: Altitude

DESCENT RRTE (FT/SEC),_00 5.00 10.00 15.00 20.00 25.00 ITI~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I 0 o' J rr~~ - 1 " -. r,, I,. C")._... DESCENT RRITE (FT/SEC) '-' '" "' c~~~~~~~~~~~~~~~~~~~~~~~~~~~3.5 4.5,3 7.!.,1 8..' Of ~~.J,, /.._, CA C.) -, cm~~~~~~~~~~~~~~~~~~~~~P C13 c+ ~ ' /l~ rn 0 '1) \l - - e,0 rr I,Ifjl r ~~~~~~~~~\ -IL CP t _tt~~~~c~ C-)c,~~~~~~~~~". =b OI \ I~~~~~~~~~ C3 ~ ~~~~~~~~~~~~~~~~~~~ ~ qq..4 c ~ ~ ~ ~ ~ ~ ~ ~ t-l~~~~~~~~~~~~r (O TC~~~~~

59 LO C) cYS oD -J z ICD Is A /_ ~~Desired Pitch Angle (C 0.00 4.00 8.00 12.00 16.00 20.00 TIME (SEC)

60 Cs cM CID aC (IT Desired Pitch Angle Rate. 0.00 4.00 8.00 12.00 16.00 20.00 TIME (SEC) Figure 9(d). Allowed State Space: Pitch Angle Rate

61 Notice that we explicitly consider only missing a single trigger, not the case when two or more triggers might be missed in sequence. This is because dynamic failure is treated here as a function of the state at the moment of triggering. If two successive triggers are missed, for example, we have to consider two distinct states, namely the states the process is in at the moment of those respective triggers. To speak of deadline intervals beyond 120 milli-seconds is therefore meaningless in this case since the triggers occur once every 60 milli-seconds. This is why the second deadline interval considered is [120, co), not [120,180). The hard deadline may consehratively be assumed to be 60 milli-seconds in S1. By definition it is infinity in So. E Finite Cost Functions The finite cost does not vary greatly within the entire allowed state-space. It is therefore sufficient to find a single cost function for So or SI. The determination of the cost function is carried out as a direct application of its definition. That is, the process differential equations are solved with varying values of ~. The value of ~ cannot be greater than the inter-trigger interval of 60 milli-seconds since, by assumption, no job pipelining is allowed and the controller terminates any execution in progress upon receiving a trigger. The finite cost function is found by computation to be approximately the same over the entire allowed state-space as defined in Figure 9.

In Figure 10, the finite cost function is plotted. The cost function is in the units of the performance index. Bear in mind that these measures are the result of an idealized model. We have, for example, ignored the effects of wind gusts and other random effects of the environment. When these are taken into account, the demands on controller speed get even greater, i.e. the costs increase. The reader should compare the nature of the cost function with the plots showing elevator deflection in Figure 7, and notice the correlation between the marginal increase in cost with increased execution delay and the marginal increase in control needed, also as a function of the execution delay.

63 cT co -. cn?I) C7 Co co 0. 0.00 20.00 30.00 40.00 50.00 60.00 DELAY (rISEC) Figure 1O. Landing Job Cost Function.

PART III APPLICATIONS OF THE MEASURES 64

CHAPTER 5 THE NUMBER-POWER TRADEOFF The number-power tradeoff problem can be stated as follows [171. It appears intuitively obvious if there are no device failures, that a system with a aingle processor with exponentially distributed service time with mean 1/p is more efficient than an N-processor (N>1) system with each processor providing exponential service at rate p/N. When failure is allowed for in the model, the above assertion is no longer obvious, and may not even be true in specific instances. So, we ask the question, "Given that the total processing power (number of processors X service rate per processor, called here the number-power product) is fixed at p, what is the optimal number, N, of processors, that the system should start out with for a specific mission lifetime?" We extend an adaptation of this problem to demonstrate the use of our performance measures to real-time computers. Two observations are in order here. Firstly, we tacitly assume that processors with any prescribed power are available. This is not true, although a wide variety of processors is available. Secondly, such a tradeoff depends for its resolution upon the cost functions and hard deadlines introduced above. It is this second point we pursue here. Specifically, we set out to determine for an example control computer, the configurations that meet specifications of reliability, and the sensitivity of reliability and the mean cost to changes in the number-power product under different 65

66 operating conditions of hard deadline and mission lifetime. Implicit in all this will be the tradeoff between device redundancy and device speed. A. System Description and Analysis We use the multiprocessor system in Figure 11 to demonstrate the idea. Assume that there is a single job class that enters the system as a Poisson process with rate X, and which requires an exponentially distributed amount of service with mean i/p. Then the system at any given time is an M/M/c queue (if the small dispatch time is ignored), where c is the number of processors functioning at that time. The distribution function for the response time for an M/M/c queue is well known [18]. It is given by: X{ - CA + AIW(O)}(1 - c-t) + A{1 - W(0)}{1 - (X } (28) where WV,(O) is the probability that, when there are c processors functional, at least one functional processor is free. This is given by: /-1 X 1 X c~ W(0O) = 1 - c!(-X/p) + I t C ]- (29) c(c-X/P) ~ A! ccP-X This distribution function is not defined unless cp > X. Assume that the hard deadline has a probability distribution function Fd, and that the finite cost function for the task is denoted by g (as in Eq. (18)). Let the processors fail according to an exponential law with rate Alp. Also, let n be the smallest integer for which np>X -- clearly, if there are fewer than n processors function

i j 67 Dispatcher Processors | J - ' ' To Actuators Figure 11. A Real-Time Multiprocessor.

68 ing, the utilization exceeds unity, and failure takes place with certainty. The system fails if a hard deadline is violated, or if there are fewer than n processors functioning. (We do not consider here the failure of the interconnection net, or of the dispatcher. Taking account of these is easy, but would obscure the analysis somewhat.) When the system is in a state i> n, the rate at which failure can happen is equal to the product of the task input rate and the probability that the response time of the system exceeds the hard deadline. If we assume that steady state is achieved between each state transition (i.e. between processor failures), then the probability distribution function of the response time at state i is always given by FMMi Such an assumption is valid, since the Mean Time Between Failures for components used in such systems typically ranges from 1,000 to 10,000 hours. The probability of dynamic failure can therefore be computed using the Markov model in Figure 12. Denoting the probability of being in state i by;r, the quantity X[1 - FA.AJ(t)] by a(j,t), the failure state by fail, and the number of processors at start-up by N, the following balance equations can be written rp(t) = -[A'p + lo a(N,f) dFJt) J7r lt), rA(0)=1 (30a) a(t) = -[ ip-+J-f a(i,7) dF~t) ]fr(t) + (i+l-)pp7r+~l(t),;r(O)=0, for n<i<430b) N oo;ia (t) = a {(o a(i,t) dF(O)},rt) + npprjt), rfi, (0)=0. (30c) a-n Clearly, Pdyn (t) = rlu(t) so that a solution of the above equations yields the probability of dynamic failure. Implicit in these calculations is the assumption that the hard deadline is very much smaller than the mission lifetime or the sojourn time in the various states. This is invariably the case in practice.

6O tip (~'i L. Fr,, "~ t)t oc % ~ ~ I~JLR oc~ 1~~"~~ n3-1 Fig 1. l o N- r+11 Figure 12. Markoy Model for Number-Power Tradeoff

70 To compute the mean finite cost over the mission lifetime, we first have to evaluate the distribution function of the response time, conditioned on the event that no hard deadline is violated. This distribution function for a c-processor system, denoted by F4i', is given by: 00 FrM dd(),"""'ini e MM Od ) fMMoC( F () Fd (r) (31) w here the function is only defined for arguments less than the associated hard deadline. Then, the mean cost is defined by: N ooooe M= f0 fo fJ 0x c(t) o(r) dF' (r)dFJ) dLt) (32) C=" The above expressions are used in the following section to obtain values for the probability of dynamic failure, and the mean cost as a function of the mission lifetime. B Numerical Results and Discussion In what follows, it is assumed that the job arrival rate is X=100, and that the processor failure rate is pp=10-4. All time units are in hours. Figure 13 is the probability of dynamic failure when the task is non-critical, i.e. the deadline is at infinity. In such cases, the probability of dynamic failure reduces to being the probability of etatic failure, namely the probability of failure of hardware components to the point when the system utilization exceeds 100 %. This is because catastrophic failure does not occur in this case until the system utilization is greater than unity. This explains the monotonic nature of the plot.

71 10o10 w | \ ML=125 10-15 cr M. Ca I Y ML=25 i 10-20 =issi ife e ML 0-45 t ML=Mission Lifetime ML=5 10-o, ML= i1 10-35,, 0.00 2.00 4.00 6 00 8.00 10.00 NO. OF PROCESSORS Figure 13. Dependence of Probability of Dynamic Failure on Processor Number when Tasks are Non-Critical.

72 Figure 14 shows the probability of dynamic failure when the task is critical with the deadlines for the respective curves noted in the figure. The number-power product is 2200. The curves that result for the failure plot form an inverted bell. The portions of the failure curve where the slope is positive can be explained as follows. When the number of processors increases, the response time distribution is skewed to the right. This in turn increases the probability of failing to meet the hard deadline to a greater extent than the static failure probability is reduced by the addition of further redundancy. The positive slope is the result of this tendency. When the hard deadline is smaller, the premium on speed is increased, and as a result, the trough of the curves moves to the left. In the region corresponding to the fewest processors, there tends to be a trades off between dynamic failure probability and the number of processors, leading to a negative slope for the failure curves. When there are few processors, the faulttolerance is less and the probability of static failure is therefore greater. Since the total processing power of the processor bank is fixed, the few processors each have greater power, and the mean waiting time is low. This accounts for the very small nature of the non-static component of the failure probability. As the speeds of the individual processors are decreased, but their numbers increased commensurately, the static failure probability drops, but the probability of missing the hard deadline increases. In the area of the curve where the slope of the probability of failure is negative, the benefits accrued from adding redundancy outweigh the negative impact of the lowered individual speeds that result; elsewhere the reverse is the case. Notice that the curve corresponding to a deadline of 0.01 has no region of negative slope. This means that 2200 is a number-power product that is too small with respect to that hard deadline.

73 td=-0.010 tdO.025 10 f /3 0.050 _I 10 / td.075 0_ I0-9 -. Number-Power Product=2200 I 0 - '2 |Mission Lifetime=10 hours _- - - I I -- 0.00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS Figure 14. Dependence of Probability of Dynamic Failure on Processor Number and Hard Deadlines.

74 In Figure 15, the dependence of the probability of dynamic failure on the number-power product for a mission lifetime of 10 hours is considered. Each point on the curves represents the configuration yielding the lowest possible failure probability for the number-power product represented. The label of each point on this plot is the number of processors in the configuration for which this lowest failure probability is achieved. As the product increases, the optimal configuration tends to contain more processors: this also is due to the lowering of the non-static component of the dynamic failure probability when the product is increased. Naturally, the curves are monotonically non-increasing. They serve to show the marginal gain in maximum achievable reliability that is to be had on increasing the number-power product at each point for the class of systems under consideration. Notice the "elbows" in the plot. These occur when the minimum failure probability configuration changes, and are the result of a tradeoff between the static and non-static components of the failure probability. The Pd, drops exponentially with an increase in the product as long as the static component is a small fraction of Pd,,. When the non-static component drops to sufficiently below the static component value, the optimal configuration changes, and the static component once again becomes negligible compared to the non-static component. This race continues indefinitely and is portrayed in Figure 16. The discrete nature of the processors causes the elbows: if the number of processors were a continuous quantity, they would not appear. The probability of dynamic failure is used as a pass-fail test for control computers. Plots such as Figure 15 can be used in this connection. As an example, let the mission lifetime be 10 hours, and the specified probability of dynamic failure

75 111 o10- 1 2 Ul 2 LL2 2 (: 2 \2 2 2 2 10- 22 0 l 3A 3 3 3 td=O? 075 td=O,?5 td=O.02 D, ~ 0 '000,11) (;1000 $00 10000 NO-POUER PRODUCT Figure 15. Minimum Achievable Probability of Dynamic Failure.

78 t =0 010 Nonstatic d (3 p ocessors) G 104 static (1 proc. C: LL [13 1 6 static \ Non static (2 proc.) - w (>(4 processors) o-onstatic i0~0-'~(1 processor) static (3 proc.) Nonstatic 10 -1; (2 processors) 0 2000 1000 6000 8000 10000 NO-POWER PRODUCT Figure 18. Race Between Static and Non-Static Components of Probability of Dynamic Failure.

77 equal to 10-7 over that period. Let the system parameters be those of the model in this section. Then, corresponding to each of the four deadlines considered, we can obtain graphically from Figure 15, the minimum number-power product that is required to satisfy the Pdy. specifications. These products are listed in Table 5. Any system that has a smaller number-power product must be rejected, no matter what its other credentials may be. When this stage of the evaluation is complete, one has a set of acceptable configurations. Only after this point does the mean cost come into consideration. The mean costs associated with each of the points in Figure 15 is graphed in Figure 17, where the finite cost function, g, has been taken as equal to the response time for demonstrative purpose. The curves take the form of a sawtooth wave, with each upward transition occurring when the optimal configuration increases by one. Clearly, the greater the power of each processor, the smaller is the mean cost. In Figure 18 (A, B, and C), we show the effects on Ptdy of changing mission lifetime for various values of the hard deadline, to In the light of the preceding discussion, these plots should be largely self-explanatory. It is worth pointing out, however, that as the lifetime increases, the optimal configuration contains a larger number of processors. The trough (around the optimal point) becomes shallower as one increases the mission lifetime, until finally, it disappears to be replaced by a shallow trough one unit to the right. As the lifetime increases still further, the new trough deepens, then begins to become shallow. Whether or not this cycle continues depends upon the hard deadline: it will continue so long as the number-power product is sufficiently large to cope with the hard deadline at the lifetimes used; the plot will rise monotonically to a failure probability of one if this is not the case (cf.

78 Hard Deadline Number-Power Product 0.010 7165 0.025 2126 0.050 1496 0.075 1180 Required Pdy. =10-7 Mission Lifetime = 10 hours Table 5. Minimum Number-Power Product for Various Hard Deadlines.

79 U3 rT 4 2 t =0. 075 3 CZ C) 3 4 z cr 3 4 t =0 05 WtF 1 2 d z 1 -1 3 2 3 4 1 44 ~~~~d 1T 2~ 1 t =0 01 c3 t.oo 02000 41000- 6000 8000 10000 NO-POWER PRODUCT Figure 17. Mean Cost for Configurations of Figure 15.

80 (A) TNIL= 12 ML2- 25 ML=ML 1 1 10 - >-, -td=-O.01 o!O-. ML=Mission Lifetime 0 1 No.-Pwr. Product=-1000 0! 10-1 - I; I I. 0.00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS Figure 18(a). Probability of Dynamic Failure for a Constant Number-Power Product.

81 (B) 1 _ ( NIF / / \hL-25 o ML=5 10-6 CLi td.-0o5 10-9 t ML=Nlission Lifetime No-Pwr.Product== 1000 C.C00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS Figure 18(b).

82 (C) \ 10\ IL 10 LL tO 10t ML=Mission Lifetime No-Pwr.ProdUct 1000 10 I M1, —I 10-12 M= 0.00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS Figure 18(c).

83 Figure 14). In Figure 19 (A, B, and C), we show the associated Mean Costs per unit time of operation using the same cost function as was used in Figure 17. In all three curves, we may note the anomaly mentioned in Section 2: as the lifetime increases, and as the number of processors increases, there is a region over which the mean costs per hour actually drop. It is most pronounced in Figure 19A, where the probability of dynamic failure is close to unity under almost all configurations.. This anomaly, of course, is due to the fact that the mean costs are computed on a response-time distribution that is conditioned on the system's not failing. Thus, on comparing Figures 19A, 19B, and 19C, we see that the system operating in the longest hard deadline, and therefore having greater reliability exhibits a higher mean cost per hour in some configurations than its identical counterparts that operate under more difficult conditions. If this causes undue irritation, the anomaly can be made to vanish by redefining the finite cost function for a task ito be: 9,(t) if t < td (33) A t) g,( td) if t > td introducing the following functions: q,(t) ac(t) -= g,(( Vij,ri,nii,t)) (34) j=l d(t) -- ~rt) (35) j=1 and defining the mean and rariance costs to be: 0

84 0.20 (A) 0.16 0B12 ML Im —1 No-Pwr.Product 1000 C ' 0.04 MIL= 125 0.00.o00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS Figure 19(a). Mean Costs for a Constant NumberPower Product.

BS 1.00 (B) 0.80 ML == tdO=0.05 0.60 hML=NMission Lifetime cr- No-Pwr.Product=1000 L. 0.4 \. LUI ' 0.20 0.00 V.00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS 10.00 Figure 19(b).

86 ML —1 1.o00 (C) ML= X5 ML=25 0.80+ t =O.OoIl l ML=NMission Lifetime ML A-125 No-Pwr.Product=1000 0.60 CD 0, LJ X D0.43 0 ' 0.20 0.20 0.00 4 4 4.. 0.00 2.00 4.00 6.00 8.00 10.00 NO. OF PROCESSORS Figure 19(c).

87 00 Variance Approximate Cost (VAC) - f Var{f( t)} dL(t) (37) 0o It is easy to see that MAC>Mean Coat always, and that the approximate costs approach the accurate costs when the probability of dynamic failure is small. Indeed, the anomaly does not appear until the probability of dynamic failure is significant. Since the applications under consideration are all critical processes, Pd.. is always small for the accepted configurations, and the configurations that exhibit this anomaly will be rejected by the Pdy, pass-fail test, and their mean costs need never be computed. C. Extension The number-power tradeoff can easily be extended to make it very useful in the process of design. The reader will have noticed that in the cost functions with which we measure the goodness of controller performance, no account is taken of controller hardware cost. All that the cost functions express is the control overhead incurred in actually running the process. Indeed, we may regard the mean costs as average operating overheads. It is not easy directly to incorporate the hardware cost into the cost functions themselves. Instead, one may consider the set of hardware configurations available for a particular hardware cost outlay. Then, conatant-coat plots can be drawn, showing the range of performance (in terms of probability of dynamic failure and average operating costs) that is available for any particular hardware cost outlay. From similar curves, one may arrive at the minimum finite average operating cost associated with a particular hardware cost given that

88 specifications for the probability of dynamic failure are met. This approach can easily be illustrated with the number-power tradeoff considered here. When the hardware cost of a processor is proportional to its processing speed, the curves in Figure 13 become dynamic failure curves for a particular hardware cost. Curves such as Figure 18 can be used to identify the configurations that meet requirements for the probability of dynamic failure for given mission lifetimes, and the sensitivity of Pdy! to changes in mission lifetime. Also, one can study any other tradeoffs that may exist between the hardware cost or the number-power product and the minimum mean cost per lifetime associated with such a cost or product. The computer studied here is simple; however, it can be extended in some useful directions relatively easily. It is easy to take care of the case when the hardware cost or the finite cost function is a more complicated function of the processing speed. More complicated multiprocessors require a more involved analysis, but the basic ideas should now be clear.

CHAPTER 6 SYNCHRONIZATION AND FAULT-MASKING A. Synchronization Figure 20 is a schematic showing the handling of data as it enters the system through the sensors, and leaves it (in a figurative sense) at the actuators. Synchronization and fault-masking are integral parts of any faulttolerant distributed system. When a multiplicity of processors executes code in parallel, care must be taken to keep them reasonably in step. Therefore, the issue of synchronization is focal to all methods of forward error recovery. There are two basic methods of synchronization: (1). Each processor has an ultra-precise clock. When the computer is switched on, the clocks are synchronized. If the clocks are sufficiently precise, the processors will continue to run in lock-step for an appreciable period. Unfortunately, such highly precise clocks are extremely expensive to build, and unsuited to incorporation in computer circuits. (For a description of highly precise clocks, see [19]). Clocks that are generally used in computer circuits drift too rapidly for this method to be employed in practice. We shall not 89

SE1>NSOR 0 I( I1NTE1ACTIVE C(3ONSISTi'E1.NCY/ VOIING' ACTUATOR CNSTECY/ PROCESSOR SYNCHIRONIZATION OUTPUT VllOTING Figure 20. Functional Block Diagram of a Real-Time System.

91 consider this method any further. (2). The synchronization is carried out mutually. There is no single component whose functioning is critical to the security of the whole system. One may choose to synchronize the processor clocks, or the processors themselves at pre-defined boundaries of software execution. In the first case, one has a system operating more or less in lock-step, such as the FTMP system [1]. Both methods of synchronization are based on the same basic concepts; the only difference is the frequency with which synchronization is carried out.' The notion of virtual time sources now arises naturally. These are not necessarily clocks in the traditional sense; they mark the points at which an individual processor performs synchronization. It is convenient to view them as virtual clocks, whose transitions represent either clock "ticks" or execution of a stretch of code up to a pre-specified boundary. In the sequel, unless it is otherwise stated, the term "clock" is used to mean "virtual clock". When synchronization is mutual, no "absolute" underlying time-source exists, only a set of time-sources whose relative behavior must be kept in step. The synchronizer (which may or may not be a physical part of the processor and which may be implemented either in hardware or in software) must therefore in each case have a perception of the state of the other time-sources. This perception may or may not be identical to that of the other synchronizers: if faulty modules necessarily behave consistently with respect to all synchronizers, it is identical; otherwise it need not be SO. An important corollary of this is that the maximum clock drift rates that can be tolerated decrease with a decrease in the frequency of synchronization.

92 The synchronization process contributes to the system overhead in two ways. Firstly, there is the overhead imposed by the synchronization task itself. Secondly, the task-to-task communication overhead is proportional to the degree of synchronization achieved. If hardware synchronization with phase-locked clocks is employed, the synchronization overhead can be reduced to vanishing point. If software synchronization is used, the overhead is significant. Both approaches to synchronization will be considered in succeeding sections. First, however, we will consider the second component. Because of severe timing constraints, real-time systems do not generally use sophisticated mechanisms for task-to-task communication. Typically, data are transmitted from one task to another via timing rules agreed in advance. As a result, the receiving task has to wait for a time equal to the sum of the maximum transmission time and the maximum possible clock skew before it can read the data. Where synchronization is carried out in software and depends on the transmission of timing data on regular data channels, this transmission delay feeds back to increase the synchronization delay itself. We will consider this matter in detail in the sequel. In what follows, we present a detailed discussion on each of hardware and software implementations of synchronization. Hardware Synchronization In this section, we consider synchronization by phase locking. Phase-locked clocks were first used to ensure that the processors of FTMP [1h operated in lock step. We consider a total of N clocks to be synchronized in the face of up to m

93 faulty clocks. The clocks are at the nodes of a completely connected graph. The basic theory behind their operation is simple. In Figure 21, we provide a schematic diagram of an individual clock. Each clock consists of a receiver which monitors the clock pulses of the N-1 other clocks in the arrangement, and these are used to generate a reference signal. By comparing this reference with its own pulse, the receiving clock computes an estimate of its own phase error. This estimated phase error is then put into an appropriate filter, and the output of the filter controls the clock oscillator's frequency. By thus controlling the frequency of the individual clocks, they can be kept in phase-lock and therefore synchronized for as long as the initial phase error is below a prescribed bound, i.e. for as long as the clocks started reasonably in step and their drifts are sufficiently low. A discussion of clock stability is provided in [21]. The arrangement for N=4, m=1 is, to our knowledge, the only phase-locked clock constructed and fully analyzed [20]. Unfortunately, when one attempts to increase m without care, synchronization can be lost due to the presence of malicious faults. In this section, we show how to design phase-locked clocks to tolerate a given arbitrary number of malicious failures. Our work is a generalization of the original design [20] which can tolerate at most one failed clock. The following notation and definitions are used in this section. Definition 1: If the overall system of clocks is properly synchronized, all individual non-faulty clocks must agree closely with each other. A well-synchronized system thus has global clock cycles. Global clock cycle i is the interval between the i-th tick of the fastest non-faulty clock (i.e. the non-faulty clock that has its,-th tick before that of all the other non-faulty clocks) and the (i+l-th tick of the fastest

Elxternal Clock Signals:.Voting Crct Comparator. Filtcr Oscillator Circuit Clock Output Figure 21. Component of a Fault-Tolerant Phase-Locked Clock.

95 non-faulty clock. For brevity, we shall denote global clock cycle i by gcci. Definition 2: Each of the clocks "sees" through its receiving circuitry, the ticks of the other clocks. These ticks, together with the receiving clock's own tick, can be totally ordered in any gcci by the relation "prior or equal to". Such an ordered set, called a acenario, for clock a in gcci is denoted by Sa,. We shall frequently drop the superscript for convenience: where this is done, it will be understood that we are talking about some gcci. If a non-faulty clock c does not receive a tick from clock d within a given timeout period in any global clock cycle, the tick for d is arbitrarily assumed by c to be at the end of that timeout period. The scenario of every non-faulty clock therefore has exactly N elements. Definition 3: If clock a has clock b as its reference in some gcci, it is said to trigger on b in that gcci. Definition 4: Given the various triggers, we can draw a directed graph with the clocks as the vertices, and the directed arcs reflecting the relationship "triggers" in some gcci. Such a graph is called the trigger graph. For example, in Figure 22, a triggers b and c, and is itself triggered by d, while d is triggered by b. A clique of clocks is a component of the trigger graph. In Figure 22, there are two cliques: {a,b,c,d} and {e,f,g}. Notation: G and NG are the set of clocks and non-faulty clocks, respectively, in the system. There are Nclocks in all, and up to m failures must be sustained. Definition 6: A partition of G is defined as a set P={ G1,G2), where G1 and G2 are

de.1 >I -f g Figure 22. Trigger Graph: An Example

97 subsets of G with the following properties: (i) G= G U G2 (ii) GilnG2nfNG. X, (iii) G.nNG 1, i —1,2. From (i), each clock must belong to at least one of G1 and G2. From (ii), only faulty clocks may belong to both G1 and G2. From (iii), there must be at least one non-faulty clock in each of GI and G2. Definition 7: A clock a is said to be faeter than a clock b in scenario S if a precedes b in S. In a partition P={ G1,G2), G1 is said to be faster than G2 if every nonfaulty clock in G1 is faster than every non-faulty clock in G2. Notation: Given a partition P={ G1,G2}, NG1 and NG2 are the non-faulty clocks in G, and G2, respectively. By definition 6, neither NG1 nor NG2 can be empty and NGnNG2 = N. Definition 8: Cliques A and B (of clocks) are said to be non-overlapping if the non-faulty clocks of A are either all faster than those of B, or vice versa. Notation: Denote the position of a clock c in its own scenario S, in geci by pC. Again, we shall frequently drop the superscript for convenience. The reference signal (i.e. the trigger) is a function' of N and of Pc. It is denoted by fp,(N). By this, we mean that clock c triggers on the fp(N)-th signal in S,, not counting itself. For the system to operate satisfactorily, all the non-faulty clocks must have their ticks close together. Also, they should tell good time, i.e. the length of every

98 global clock cycle should be about the length of an ideal (or absolute time) clock's inter-tick interval. These conditions dictate the following two conditions of correctness C1 and C2. Definition 9: Each of the following conditions of correctness must be satisfied in gcci if the system is to be correctly operating in every gcci. C1. For all partitions P=( G,G2} of the set of clocks' G, in which the non-faulty clocks in G1 are all faster than those in G2, each of the following (K1 and K2) must apply: K1. If, in gcci, all clocks in NG1 trigger on clocks in G1, then there is at least one clock in NG2 that triggers on a clock in G1. Furthermore, if no clock in NG2 triggers on a clock in NG,, at least one clock kENG2 must trigger on a faulty clock hE G1 such that in the scenario Sk, there is at least one clock rENG1 that is slower than the clock h. 1(2. If, in gcci, all clocks in NG2 trigger on clocks in G2, then there is at least one clock in NG1 that triggers on a clock in Go. Furthermore, if no clock in NG1 triggers on a clock in NG2, at least one clock kENGl must trigger on a faulty clock hEG2 such that in Sk, there is at least one clock rENG2 that is faster than h. C2. If a non-faulty clock z triggers on a faulty clock y, then there must exist nonfaulty clocks zl and z2 such that z1 is faster than or equal to /, and y is faster than or equal to Z2. Either zl or z2 may be z itself. Intuitively, we may regard C1 as preventing the formation of non-overlapping cliques -- which would obviously destroy synchrony -- and C2 as ensuring that the

99 system keeps good time, i.e. that each global clock cycle is close to being the clock cycle of an ideal clock. Finally, we assume that the transmission of clock signals through the system takes negligible time. This ensures that all non-faulty clocks are seen by all clocks in the same mutual order. The phase-locked clock system for N=4, m=1 is simple enough to be proved correct by an exhaustive enumeration of all eventualities. It is, to our knowledge, the only phase-locked clock actually constructed [20]. Here, the reference used is the second incoming pulse (in temporal order), i.e. the median pulse. Such a clock is proof against the malice of a single faulty clock. To give the reader a feeling for why this is so, and to enhance his intuition about malicious failure, we provide below a simple explanation. Call the four clocks a, b, c, and d. Let d be the maliciously faulty clock. Because d is malicious, it may provide different timing signals (i.e. lie) to different receivers. Since the non-faulty clocks by definition send their ticks at the same moment (or do not lie) to all the other receiving clocks, the mutual ordering of the non-faulty clocks within every scenario is the same for all non-faulty clocks. That is to say, if clock b sees clock a faster than clock c in some gcci (i.e. clock a sends its i-th tick to b before clock c does so), then a will appear faster than c to both the other non-faulty clocks in the system, i.e. to a and c in that gcci. d, however, may appear in different positions in the scenarios of the non-faulty clocks since it is malicious. One way of proving that a four-clock arrangement works despite d's being malicious, is to enumerate all possible actions of d and show that the system still continues to satisfy the conditions of correctness.

100 Assume without loss of generality that a is prior or equal to b which in turn is prior or equal to c in some gcci. Consider a sample set of scenarios for our fourclock example. The triggering clock is denoted in bold-face type. So= a<b<c< d Sb = a<d<b<c s=- a<b< d<c The scenario Sd is irrelevant, since d is faulty. Notice first that the position of the faulty clock d changes relative to the others, while the mutual ordering of the non-faulty clocks remains unchanged, as indeed it should. It is easy to see that both conditions of correctness will be satisfied, and that the clock will operate correctly if the above scenario holds. It is not difficult to write down all the 43=64 possible scenarios (with the ordering of the non-faulty clocks fixed as above) that are made possible by the arbitrary positioning of d, and to convince oneself that, for all possible scenarios, C1 and C2 are satisfied. Unfortunately, if we try to allow for m=2,3,..., by expanding the system arbitrarily without sufficient care, the conditions of correctness can be violated. In fact, it is even possible for a system to contain an arbitrarily large number of clocks, and still to be vulnerable to just two malicious failures. To see this, consider the following example. Let us choose, for each clock y in the system, fp(N) as the median clock signal in the scenario, not counting clock y. If N is odd (and there is thus an even number of "other" clocks), choose the slower of the two middle clocks. Then, fp(N) is only a function of N. We therefore drop

101 the subscript for this example. Choosing the median signal is certainly good intuition. Let there be only two faulty clocks, z1 and z2, and n=N-2 non-faulty clocks al1, a,. Case 1: N>7. Consider some gcci. Assume that ak is faster than at in gcci if k<l. Now, let z1 and z2 present themselves as the fastest two clocks to al,..., as, and as the slowest two clocks to the other non-faulty clocks, i.e. ap+l,...a,, where p=[n/21 ==AN)-1. Then, the set of scenarios can be represented as in Figure 23. Recalling that a clock triggers on the AJN)-th tick in its scenario not counting itself, we can draw the trigger graph as in Figure 24. It follows that {al,..., ap} and {(a+1,..., a,} will be two non-overlapping cliques, no matter how large n may be. It is easy to work out the case for N==7 to convince oneself of this fact. Case 2: N<7. This is trivial, and showing that the system is incapable of sustaining even two maliciously faulty clocks is left to the reader. This has been a cautionary tale of the unbridled use of intuition in designing phase-locked clocks. Assured now that a more careful approach is needed, we turn in the following section to showing how to expand phase-locked clocks. Our job is to (i) find the lower bound, N, on the size of a system of clocks that must sustain up to m maliciously faulty clocks, and (ii) find the functions f(N) for z=x,...,N. We begin with the following two lemmas that smooth our way to the main result.

102 Sal zX1 aI al ' an $: X1: al a... an 0 Sa: l a l (z aan Sa +1 al a, aX Sa% TI at a I a a 1 Sa,+. a a2 v an zX 1) sa; a c. z... a l X I '~ Sa: al ah a 1 vz Figure 23. Scenarios In Example

CZ aJnSl& Ul solsuaoS Jol qdejD aoB2IJL.e awBnSj I -I Ip+dD I-UI 0 v - % IgO

104 Lemma 1: Condition C2 is satisfied for all partitions P=(G1,G2} if and only if there exist functions fj(N) for x=1,...,N, such that min{m, z-1} < f(N) < max{N-m, z) (38) Proof: Let k be a non-faulty clock such that Pk = z. We must show that Eq. (38) holds for all z for which Pk is defined iff condition C2 holds. Suppose that there exist functions f(N) for z=I,...,N satisfying Eq. (38). This implies min(m, z-1)+1 < max{N-m, z) for all zE{1,2,...,N}, leading to N>2m+1. Hence, it is sufficient to consider the following three cases: (i) z_ m: Clearly, max{N-m, z) = N-m, min{m, z-1} = z-1 and therefore z-1 < f(N) < N-m. If the reference clock is non-faulty, we have nothing to prove. If it is faulty, then since there are at most m faulty clocks, there must be at least one non-faulty clock slower than the reference clock. Also, from the left half of the inequality, f,>z-1, and since clock k is non-faulty, there is a non-faulty clock (i.e. k itself) faster than the reference clock. So, C2 is satisfied. (ii) N-m>z>m: minm{ em,z-1 }=m, max{ N-m,z)=N-m and therefore m+i < f,(N) < N-m-i. Since at most m faulty clocks exist, if the reference clock in Sk were faulty, it must appear in Sk as slower than at least one non-faulty clock (the right half of the inequality), and faster than at least one non-faulty clock (the left half of the inequality), and C2 is satisfied.

105 (iii) N>z>N-m: min{m,x-1)=m, max{N-m,z}=z and m+l < f(N) < z-1. As with the previous cases, there must appear in Sk at least one nonfaulty clock that is faster than the reference clock, if the reference clock is faulty. Also, since k is non-faulty, and appears in the z-th (i.e. pk-th) position, there is at least one non-faulty clock, in particular clock k, that is slower than the reference clock in Sk, thus satisfying C2. Conversely, suppose f (N)_mln{m,z-1}. Then, C2 is violated when faulty clocks appear in positions 1,...,f(N) of Sk. Similarly, if f(N)>max(N-m,x}, C2 is violated when faulty clocks appear in positions fjN)+l,...,N of Sk.2 Q.E.D. Lemma 2: If all clocks in NG1 trigger only on clocks in G1 (where the notation is the same as in definition 9), then the following are equivalent: (i) ql > min fpk(N) where q! is the number of non-faulty clocks in Gi. kENG2 (ii) K1 is satisfied. Proof: (i) implies (ii): If (i) holds, then it is easy to see that no matter how the up to m faulty clocks in G arrange themselves, K1 is satisfied. (ii) implies (i): Suppose, to the contrary, that q1 < min fpk(N). Consider the kENG2 nonempty set L = {y: yENG2 and fp(N) = min fp(N)}. Assume that there are kE NG2 iKm faulty clocks in G1. Since the faulty clocks may present themselves in any 2 Once again, the addition of 1 occurs because a clock does not count itself when counting to fl(N).

108 position in any scenario, consider the case where they present themselves in the scenario of every yEL in the ql+l,...,ql+i positions. Then, there is no non-faulty clock in G1 that is slower than the reference clock of any clock in NG2, a contradiction. Q.E.D. The two theorems below yield the main result of this section. Theorem 1: To ensure that, despite up to m malicious failures, the conditions of correctness are satisfied, the system must have N>3m+l clocks. Proof: We will only consider here the case of partitions P ={ (G,G2} in which all clocks in NG, trigger on clocks in G1. The other case (i.e. K2) can similarly be dealt with. Let there be q1 and q2 clocks respectively in NG1 and NG2. Let A={y: yE N'G and fp%(N)= max f/,(N)}. Let i be the number of faulty clocks that belong to Gl. Then, the assumption that all non-faulty clocks in G1 trigger on clocks in G1 is equivalent to saying that one of the following Eqs. (39) and (40) must apply: q1+i > kmax fp(N) + 1 = fp,(N) + 1 (39) which applies if there exists at least one py, yEM, such that py < f,(N). The addition of 1 follows from the fact that clock y does not count itself when counting to fp,(N). If py > fpY(N) for all yEM, the following Eq. (40) applies: — ENGt NA(40) First consider the case where Eq. (39) applies. The condition that C1 (more

107 specifically, K1) holds implies, from Lemma 2, that q1 > min fk(N) (41) Since this must be true for all partitions of G, we have for all qlE(1,...,N-e-1}: q, > max fp(N)- i+ l K- ENG1 if ql > min fpm(N). Hence, K1 can be written as: kENG2 For all q1E{1,...,N-i-1}, {ql> max fN (N)- i+1 D ql> min p( In particular, this is true for ql- maxfp,(N) -i+1. Thus, kENG, max fpk(N) -i+l > min fpk(N) kENG1 kENG2 or max fpk(N) - miN fpk(N) > 81 (42) kENG1 kENG2 Recall that this is true if Eq. (39) applies. Similarly, if Eq. (40) applies, we have from an identical argument, max fp,(N) min fp,(N) > i (43) kENG kE6NG2 Eqs. (39)-(43) must hold for all possible i. Since there are at most m faulty clocks, we must have:: max ftp(N) - min f(N) > m-! (42') kE NG, 9 NlG2 if Eq. (39) applies, and

108 max p,(N) - min fp,(N) > m(43') kENG1 k kENG2 if Eq. (40) applies. We first consider the case where Eq. (39) applies. We claim that it implies that N>3m. To see why, let y be the slowest clock in M and z the slowest clock in L (with L defined as in Lemma 2). Then, due to Lemma 1 and Eq. (42') the following inequality must hold: maxN-m, py)} > max fIP(N) n-l+ min fp,(N) > m-i + min(m, p —l) (44) kENGI kE-NG2 Then up to m faulty clocks in the system can arrange themselves in any order. In particular, they can so order themselves in Sy that p,<N-m, and so order themselves in Sz that pz> m. Since Eq. (44) must hold alwaye, no matter what the faulty clocks do, we must have: N-m > max fPk(N) > m-l+ min f,k(N) > (m-l) + (m+l) (45) kENG- kENG2 from which we arrive at the equation N>3m (46) Recall that this applies whenever Eq. (39) holds. If, instead, Eq. (40) applies, we can similarly show that N>3m+l (47) Since we seek the smallest N to satisfy the conditions of correctness, we have done if we can show that there exist functions f(N) such that Eq. (39) always applies (and therefore Eq. (40) never applies), and for which Eq. (45) is satisfied. But, we can always construct f(N) to (i) be monotonically non-increasing functions of z and (ii)

109 satisfy Eq. (45): an example of such a construction is provided in the statement of Theorem 2 below. Hence Eq. (39) always applies, and N>3m+l, is the necessary condition. The case when all clocks in NG2 trigger on clocks in G2 can be similarly treated. Q.E.D. 2m if z<N-m Theorem 2: If N>3m+l and fNN) 1m+l if z>N-m then the conditions of correctness are satisfied. Proof: f(N) as defined here satisfies Lemmas 1 and 2 and is monotonically nonincreasing in z. Clearly, C2 holds. Also, it is easy to see that if N>3m and Eq. (39) implies Eq. (41), then case K1 in Definition 8 will hold. We therefore only have to show that the definition of f(VN) as given above satisfies Eq. (41) if Eq. (39) is satisfied. This can easily be verified by a direct substitution. Case K2 can be similarly seen to hold. Q.E.D. It should be noted that the set of functions f(N) is not always unique. From the proofs of Theorems 1 and 2, the following inequalities are sufficient: (i) m+l < fAN) < N-m-1 for all z = 1,...,N. (ii) fJ(N) > m-l+f1Nm(N) for all z<m+1, (iii) fN-m(N) < fJ(N) < Im+l(N) for N-m>z>m+1, (iv) f,(N) > fj(N) iff i<. The intervals z>N-m and z< m+1 arise from the up to m faulty clocks in the system. All that we can tell about the fastest non-faulty clock g in the system (this

110 clock must have the maximum value of f(N)) in clock g's scenario is that it is in the first m+l clocks in that scenario. Similarly, all that we can tell about the position of the slowest non-faulty clock a in the system (which must have the minimum value of f(N)) is that it occupies a place in the last m+l clocks. This leads at once to the intervals z> N-m and z< m+l. It is interesting to note that if conditions C1 and C2 are both satisfied, and the functions f,(N) are monotonically non-increasing in z, then a stronger condition than C2 automatically holds. Corollary: If the conditions of correctness are satisfied, with the f(N) being defined as monotonically non-increasing functions of z, then the following condition C3 holds. C3. Every non-faulty clock necessarily triggers on either a non-faulty clock, or a faulty clock that is sandwiched between the other non-faulty clocks. Proof: Now, C3 follows immediately from C2 for all but the fastest and slowest non-faulty clocks. Consider the fastest non-faulty clock. In the course of proving Theorem 1, it was established that N-m > f(N) > m for all z 1,...,N, and that max fpk(N) - min fp(N) > m-l, leading to 2m as the smallest value for max fpk(N) kEG kEG kENG where NGC G is the set of non-faulty clocks. From the monotonic nature of the fpk(N), the trigger for the fastest non-faulty clock must lie in the interval 2m+,..., N-m. But, since N>3m+1, any faulty clock in this interval must be sandwiched between non-faulty clocks.

111 The proof for the slowest non-faulty clock is similar. Q.E.D. Remark 1: Synchronization Overhead In the case of a phase-locked clock, there is some time overhead due to the oscillations that are possible as a result of malicious behavior. However, these are minimal when good crystal clocks are used, and so it is reasonable to treat the overhead of hardware synchronization as negligible. Also, the clock skew is very small/negligible in a well-designed phase-locked clock. Remark 2: An Alternative Design The only other hardware arrangement that we are aware of for keeping synchronization in the face of malicious behavior is the multi-stage synchronizer arrangement proposed by Davies and Wakerly [221. The idea is shown in Figure 25. It consists of m stages of N synchronizers each. The system works on the principle that, with this redundancy, there must be at least one level of synchronizers that assures proper synchronization in the presence of malicious faults. An informal proof is provided in [22]. This arrangement results in a proliferation of hardware. As may readily be verified, the total number of devices (processors and synchronizers) in the cluster is 2m2+3m+1. The total number of I/O ports required is given by 8m3+16m2+10m+2. The potential enormity of the above numbers should be driven home by the consideration that in order to maximize returns from redundancy, the individual modules must be isolated from one another as much as possible. This dictates that power supplies must also be replicated in large numbers, and that the benefits of large-scale integration cannot be brought to bear on the issue: individual synchroniz

Processor Cluster Synchronizer Cluster I Synchronizer Cluster 2 Synchronizer Cluster m l ll Sy, r ~ 'fyrl.2,1 | Syn,^l |||,., _...,,.,,............... I sy.,,V I! I,.. Go Ready's —IX-L Ready's - Go -t —f — Ready's Go ~ — IRayr C._v 2 SY111,2 I G. Go Ready'2 Go G 'o Rea Red/ o ed | P3 Synl,3 /S*l72,3 |SynJ Go, Go R Aeadyedya Go dy's Go Ready P- Synl'N oS/- ynm, Go Go Ready - Ready's Go - Rdy'- Goead Figure 25. Davies and Wakerly's Mlultistage Synchronld-yr

113 ers must be on separate devices -- even, perhaps, on separate cards. Otherwise, correlated and common-cause failures could wipe out reliability gains made by device redundancy. Compared with the gargantuan nature of the redundancy required by the Davies and Wakerly approach, the N=3m+l requirement of phase-locked clocks represents an extremely elegant hardware solution to the problem of synchronization in the presence of malicious faults. Software Synchronization The use of decentralized algorithms for synchronization offers an alternative to the hardware methods described above. Such algorithms enable a system consisting of many processors with their own clocks to operate in close synchrony. The degree of synchronization obtained by these algorithms depends primarily on the performance of the communications system, the precision of the clocks, and the frequency of resynchronization. The task-to-task communications system's one-way message time is at least B+6 where B is the maximum transmission time and 6 is the maximum clock skew. The most time-efficient of the software algorithms that we know of is the interactive convergence algorithm [23]. In the interactive convergence algorithm, each processor in the system determines its skew relative to every other processor in the system. If any relative skew is greater than a predetermined threshold, it is set to zero. An average of all the relative skews is calculated and used to correct its clock.

114 The following theorem (a trivial adaptation of one proved in [231) characterizes the maximum clock skew of the system in terms of the following system parameters. - maximum error in reading another processor's clock p - maximum drift rate between any two clocks in the system N - number of clocks in the system. m - maximum number of faulty clocks accommodated. R - resynchronization period. S(IN) - execution time of the resynchronization task. 60 - maximum clock skew at start-up. Theorem 3 (adapted from [23]): If the following conditions hold: 3m<N 3m m mN-m > [1- -2p(1- 'n)]-'[2{l+p(1- n)} + p(R + 2 SN)) 6 N + p{ 6> 60 + pR max(, S(N)) < R pb<<, then, the non-faulty clocks remain in synchrony, i.e. the maximum skew is 6. The synchronization algorithm is run periodically, the major component of the execution time usually being the time required to read every other processor's clock in the system. In the SIFT system, each processor's clock value is broadcast during a window of time allocated to it. There are N such windows, one for each processor in the system. All other processors wait during this window to receive the broadcast data value.

115 In order to accommodate the worst-case situation, each window must be at least B+6 long. The interactive convergence algorithm takes an execution time equal to S(N)=MN(B+6)+K, where K is the time needed to compute and carry out the clock correction. It should be noted that this execution time of the synchronization task affects the synchronization process itself. Indeed, since this is a function of N, there is a maximum cluster size that can be synchronized in this way. To see this, substitute the above expression for S(N) in the formula for 6, and obtain: a> N[N-3m-2p(N2+N-mN-m)J1? [2f+p{R+2(N-mXB+K)}J (40) From this, one can (a) compute the minimum execution time of the synchronization task as a function of the cluster size, (b) obtain the quality of synchronization (the smaller the 6, the better the synchronization), and (c) determine the largest possible cluster that can be thus synchronized: this is the largest N for which S(N) < R. The values for the SIFT system are given by B==-18.2 micro-seconds, and execution time for the synchronization task is 1.760 milli-seconds. Numerical results on the synchronization overhead using these values are plotted in Figure 26. The maximum cluster size permissible for synchronization is tabulated in Table 6. Although the expression S(N) = MN+B)+K was presented as emanating from the SIFT system, it is easy to see.that in any system where communication is by broadcast, and clock transmission slots are pre-determined, this expression will hold. It should also be reiterated that such communication protocols are the most commonly used protocols in real-time systems. In any case, it is obvious that whatever the protocol used, 5(N) is very unlikely to be less than a first order function of N.

pVaqJAQ uoVlazluoJqu uAS oaJmlloS '93 aanZl, S ~ OO'O OSUO08d 00JO CD S1!0SoS~g c 0000-O1N, I-) 0 C)V cry M 3:)) 'C;) n0 0 =rr //7~~~~~~~c~ z 0 crm 9-01 XI d cn-0 ~ ( CZ.-i ~. 0r X ~'~-=d P 9-01Xild 0 # d~~~~~

117 Drift Rate Maximum Cluster Size 1X10- 43 5X 10-6 40 1X10-5 40 2.2X10-5** 40 X 10-5 37 1 X 10-4 34 5 X 10-4 22 1 X 10-3 16 ** SIFT Value Table 6. Mfaximum Cluster Size Permissible for Software Synchronization

118 Even if, in a hypothetical case, S(N) were negligible (which, of course can never happen but nevertheless represents an extreme case), 6 will continue to be a function of NV, and there will be a point for which 6>R, at which synchrony will break down. B. Voting and Byzantine Generals Algorithm Voting Once the delay involved in synchronization is taken account of, there is very little additional delay if the voting is carried out in hardware. With software voting, however, the additional overhead can be significant. Voting in software is carried out by individual processors placing data to be voted on in pre-specified "mailboxes:. or "pigeonholes". The voter searches the mailboxes for valid data, fetches them, and then votes on them. The execution time in the retrieval step is directly proportional to the number of processors in the cluster, N. The execution time required to. vote N values and diagnose up to m faults is 'at least (N-1)CI + C2 but less than [(.N-l) + 2m - 2 Cl + C2 = [5(N-1)-2] C. +.C where C1, C2 are some constants [24]. Experimental data exist for 3-MR and 5-NMR in SIFT. These data can easily be introduced into the linear model obtained above. If a is the number of data values voted, and VAa) the time taken for an N-way vote on a data values, the following expression was found to hold for SIFT: V\()= 58.5 a N+ 91.5z+38 micro-aeconda. (41) This is a large overhead: in SIFT, for example, voting is performed at the beginning

119 of a 3.2 milli-second subframe. If s=6, N=5, then 73% of the subframe is consumed by the voting algorithm [25]. Byzantine Generals Algorithms The Byzantine Generals, or interactive consistency, algorithm must be used when it is necessary to isolate the sources of errors as well as to mask the errors themselves. It finds use when reconfiguration upon failure is to be attempted and the executive is distributed. The algorithm takes into account the fact that faulty processors may be malicious, in other words, that they need not fail only in "safe" directions. To be absolutely certain that faulty processors can be properly identified for isolation, it is necessary to allow for every possible misbehavior: thus the case when a faulty processor is malicious, i.e. that it actively and intelligently attempts to hide its malfunction, must also be handled. Such algorithms are typically used to reach agreement between processors in a cluster on incoming sensor data, and in certain clock synchronization algorithms. For further details, see [26-28]. The input of data is accomplished by every processor reading the external sources independently or by one processor reading the external sources and then distributing the obtained value to the rest of the processors. In the first case, each processor would very probably get a different value -- even if they were in perfect synchrony -- due to the inherent instability in reading analog data. Hence, a subsequent exchange of values read along with a mid-value selection is required to get a consistent value. However, this process suffers from sensitivity to malicious faulty processors and interactive consistency (or Byzantine Generals) algorithms are essential where fault isolation and reconfiguration are required.

120 The interactive consistency algorithm consists of the following steps: (1) The source value is distributed to the N processors. (2) The received values are exchanged m times to handle up to m faulty processors. (3) A consistent value is obtained by use of a recursive algorithm. When m=l, this reduces to a majority calculation. The overhead for these interactive consistency algorithms can be considerable. N must be at least 3m+l. The number of messages required to obtain interactive consistency is of the order of N"1. To give an idea of the actual numbers incurred in practice, some experimental results from the SIFT computer [251 are used. In SIFT, with five-way voting, only one fault can be located. The simple flight-control applications currently running in SIFT use 63 external sensor values, each of which goes through the interactive consistency algorithm. From the data collected, execution times for steps (1) and (2) of the algorithm can be estimated, and a lower bound determined for step (3). The following data were measured: step (1): 3.05 ms, step (2): 2.22 ms, and step (3): 6.57 me (total 11.84 ma). For larger m, the step (1) execution time should not change significantly, while the step (3) calculation would require at least 6.57 me (very likely much more). The step (2) process consists of only message exchanges and thus varies directly with the number of messages which are sent. The following formula represents an approximate execution time for step (2) as a function of m: 2.22 N"l ms). WVe may add the timing values for steps (1) and steps (3) above to this this expression to obtain a lower bound for the overhead of the Byzantine Generals algorithm in SIFT. Since the interactive consistency tasks must be executed at the data sample rate, a large

121 portion of the available CPU time is consumed: see Table 7. These results indicate the extremely high overhead imposed in an attempt to achieve interactive consistency. It should be pointed out that there have lately been some more efficient implementations of the Byzantine Generals algorithm [29J than have been implemented on SIFT. However, even such implementations exhibit high overheads as the number of faulty modules to be accommodated increases. C. Reconfigurable and Non-reconfigurable Systems To locate faults after they have been detected by voting, diagnostic tests must be run. To ensure agreement amongst all non-faulty processors about the results of the tests, the interactive consistency algorithm must be executed. Unfortunately, as we have seen, this algorithm is extremely time-consuming to run. Reconfigurable systems must therefore contend with a large overhead as compared to non-reconfigurable systems. However, reconfigurable systems have the advantage of dynamic redundancy management. When widespread failures occur, it is possible to retire some clusters in order to keep others at full strength. Also, by periodically purging itself of faulty components, a reconfigurable system can survive in the face of more failures than can a non-reconfigurable system. For example, if one started operation with a 7 -cluster, the reconfigurable system would not fail unless either (a) all but two processors fail, or (b) more than m (m=2 for a 7-cluster, and 1 for a 4-cluster) processors fail between successive tests, while the corresponding non-reconfigurable system would fail if more than 3 processors failed. This does not automatically mean that a

122 Data Sample Period m=l m=2 m=3 100 ms 11.8 % | 25.1 % >380 % 50 ms 23.7 % 50.2 % >760% 33 ms 35.9 % 76.2 % > 1140% 25 ms 47.4 % >100 % > 1520% m = number of faulty processors accommodated Table 7. Overhead of Byzantine Generals Algorithm: Lower Bound for SIFT

123 reconfigurable system is necessarily better than a non-reconfigurable one, since as we shall see, timing requirements impose severe constraints on the size of reconfigurable clusters. We shall contrast the reliability of reconfigurable and non-reconfigurable systems with the following example. Assume that there is a single critical task in the system that requires 1.6 milli-seconds to run, and that this task is dispatched every 50 milli-seconds, and that the system must be ready to begin executing the task the moment it is released. There is a total of N processors available. Processors fail according to an exponential law with specified MTBF. The mission lifetime (duration between successive service stages) is also specified. In the following sections, we consider non-reconfigurable and reconfigurable systems separately. In both cases, we assume that synchronization is by means of phase-locked clocks. Since these can be made arbitrarily reliable and are common to both reconfigurable and non-reconfigurable systems, we do not consider the probability of clock failure in what follows. Numerical results in the section on reconfigurable systems are based on the lower bounds obtained from SIFT. Processor failures are assumed to occur independently, forming a Poisson process with mean interarrival time 5 X 105 seconds. Non-Reconfigurable System Under the above assumptions, the probability of dynamic failure is simply equal to the probability of static failure, i.e. the probability that fewer than rN/27 processors fail over the mission lifetime. This probability is graphed in Figure 27.

(A). Mission lifetime _ 10.1 seconds 10-3 10-6 _ 10-9 10'X in 10-15 4 \Rcconfigurable ~V systemi 0 0 10-21 10-24! \ Non-reconfigurable s ystem 10 200 30.00 oaga S.Of 10.00 15Coo.00 20.00 25.00 30.0] NO. OF PROCESSORS Figure 27. Comparison of Reconfigurable and Non-Reconfigurable Systems

Probability of Dynamic Faufre HI I 8C3 ~~ ~1 - b oE '" 0> ~ P CD~lrn ~-,CC r) Q3 3 (J.)I C 0 SKI C:),= (]~~~~~Z

(C). Mission lifetime = 106 seconds 10-2:Z 10-4 1 it ~ _ Non-reconfigurable. ' system i o-, o 10-8,10-10 0 10-12 Reconfigurable system 10-14-\ 10-16I 0.CO 5.00 10.00 15.00 20.00 25.00 30.00 NO. OF PROCESSORS Figure 27(c)

127 Reconfigurable System We assume here that the interactive consistency algorithm will only be invoked when a vote detects processor failure. The problem of replicating simplex data to amongst multiple processors is not treated here: it is assumed that the slight variations in analogue sensor data obtained without the Byzantine algorithm are acceptable. The purpose of the interactive consistency algorithm here is to obtain agreement on diagnostic tests. The assumptions are that the tests have 100% coverage, and for convenience, that the diagnostics take 3 ms. Clearly, the diagnostic period is insensitive to the value of m. It is not difficult to alter the analysis to allow for a relaxation of these assumptions. Doing so may alter the numerical values presented, but will not change the qualitative nature of these results. The execution time is bounded below by 2.22Nm-'+9.6 milli-seconds. Since the task execution time is 1.6 milli-seconds, an unreplenishable reserve of 50-1.6=48.4 milli-seconds of time is available. If the overhead is smaller than this, the probability of dynamic failure is equal to-the probability of hardware failure: if not, it is equal to unity. As may be seen from a simple calculation, for clusters with m>2, the overhead exceeds the reserve of time, so that the maximum allowed size of the cluster is N=7, m=2. For this reason, if we start with more than 7 processors, the additional processors will have to be on stand-by for inclusion upon a failure within the cluster. Failure occurs if either during a single execution more than m failures occur in the cluster, or the processor pool is exhausted, i.e. if there is an insufficient number of processors left to make up a cluster.

128 The reconfiguration policy is simple. The system begins operation with either a 7-MR or a 4-MR cluster (depending on the value of N). As processors fail, they are replaced if spares are available. If the stock of spares is exhausted, further failures are handled by the 7-cluster reconfiguring into a 4-MR cluster. If N<7, only a single failure can be tolerated. It is thus a combination of hybrid and adaptive voting. Numerical results for the probability of failure of reconfigurable systefas are plotted in Figure 27 for a ready comparison with their non-reconfigurable counterparts. It is apparent from Figure 27 that while increasing the number of available processors in a non-reconfigurable system reduces its probability of failure, there is. a lower bound to the probability of failure for reconfigurable systems. This bound is caused by the fact that the cluster size is limited to 7, since the overhead exceeds 100% for larger clusters. There is therefore a point after which the probability of more than m processor failures over a single execution (aggregated over the mission lifetime) becomes the dominant component in the probability of failure. As one might expect, the reconfigurable system performs better than the non-reconfigurable system when the mission lifetime is larger. This has been at the horrible price, in this case, of a 50.3% overhead. Clearly, the results in Figure 27 are problem-specific, indeed, they are critically dependent on the length of the inter-dispatch interval, the unreplenishable reserve of time that is available, and the time taken to execute the Byzantine algorithm. Also, while the reconfigurable system may appear to be the better performer in Figure 27, this is largely due to the large reserve of time available. Suppose that the task, instead of taking a maximum of 1.6 milli-seconds to perform, took 35 milli

129 seconds. Then, the reserve f time is 50-35=15 milli-seconds, and the largest reconfigurable cluster that could fit in this reserve is N=4, m=1. In Figure 28, we display results for such a task. Naturally, the reconfigurable system couies out much more poorly here.

10-3 10-6. 10-90 \ Reconfigurable system 1oa - i o- 12 1Q 10-18 s d 10-Is I 10 -I 0 lo10-2 10-27' 10-30 0.00 5.00 10.00 15.00 20.00 25.00 30.00 NO. OF PROCESSORS Figure 28. Comparison of Reconfigurable and Non-Reconfiurable Systems

CHAPTER 7 SCHEDULING TASKS WITH A QUICK RECOVERY FROM FAILURE Part of the procedure for recovery from a processor failure must consist of ensuring that hard deadlines of critical tasks continue to be met to a very high probability. This is usually done by replicating tasks. One may choose (i) to have all replicates, or clonca, of tasks always running in parallel, or (ii) to have some of them (called "ghosts" in the sequel) initially inactive, and then to activate them in case a processor carrying an active (or "primary") colleague fails. When the mission lifetime is long, or the failure probability requirements are stringent, the first approach causes a considerable incremental processor loading (due to the large number of replicates that are then required), and a worsening of response times. Even if such a worsening does not increase the probability of missing a hard deadline, it will tend to degrade the quality of control provided to the real-time system: It might therefore be profitable to consider the second approach, which has the advantage of requiring fewer replicates to run in parallel. Implementing this second approach involves optimally (more realistically, quasi-optimally) inserting the ghosts into the schedule. The ghosts cannot be regarded as primary clones, or the schedule reduces to that in the first approach. In many cases, it is unwise to plan to schedule them on-line when a processor failure is detected: that might not be possible under the imposed timing constraints. In such 131

132 instances, one must embed a set of contingency schedules into the original or primary schedule and invoke it whenever a processor failure is detected. In this chapter, we first show how to find the contingency schedules. Then, a dynamic programming approach is used to construct out of these contingency schedules, the primary schedule, which is defined as the schedule followed if there is no processor failure. The contingency and the primary schedules consists only of primary clones. A ghost schedule is also derived, and it is then a simple matter to amalgamate the ghost and contingency schedules on-line when one or more ghosts have to be activated due to processor failure. While a more detailed problem statement is provided in the next section, it might be useful to preview here what we have done in this chapter. In general, the job of scheduling tasks on processors consists of optimizing some objective function, while ensuring that given constraints are met. In this case, the constraints are on the hard deadlines being satisfied, and the objective function is the expected sum of the cost functions of the finishing.times of the executed tasks. We assume that an algorithm exists that accepts as input a list of tasks with their release times, run times, deadlines, and cost functions, and produces as output a schedule that minimizes (more realistically, quasi-minimizes) the above objective function. Such a schedule will not be fault-tolerant. We show how to use such an algorithm to create a fault-tolerant schedule, consisting of a quasi-optimal primary schedule which is executed as long as no processors fail, and a set of contingency schedules that are invoked when one or more processors fail, making it necessary to activate the ghosts of the primary tasks that were originally supposed to have run on the now-failed processors. The object is to keep the number of replicates that must run in parallel at the desired level. The primary and contingency schedules between them ensure

133 that reliability specifications are met, i.e. that even in the face of a given number of processor failures, the probability of missing a hard deadline is kept acceptably low. Papers in the literature [38,39] that have addressed the problem of faulttolerant scheduling for real-time controllers have taken a different approach to handling the effects of processor failure. The scheduling algorithm used in [38] is a best-fit procedure that is claimed to be quick enough to compute a new post-failure schedule in real time. There is, however, no guarantee that a certain given number of processor failures can be sustained: indeed, the method assumes that as failures occur, tasks that cannot be scheduled under this approach are simply shed. The algorithm in [391] has the same features, being an improved version of the one in [38]. Section A contains a list of assumptions, and a more detailed statement of the problem. Section B contains the main result of the chapter, Section C a numerical example, and Section D a brief indication of this algorithm's practical usefulness. A. Assumptions and Problem Statement The computer is assumed to be a multiprocessor following the definition of Enslow [40], that is to say, the processors of this computer each have their own private memories. The schedule to be derived is locally-preemptive, i.e., tasks resident on the same processor are allowed to preempt each other, but tasks resident on separate processors are not. The locally-preemptive regime has been assumed in order to avoid the possibility of causing excessive congestion on the interconnection structure. It also removes the need to allow for queueing delays as part of the schedule.

134 By a fcaaible achedult, we mean a schedule in which all hard deadlines are met. As in [38] and [39J, the tasks are treated as having no precedence constraints. This does not rule out tasks forming a pipe, as in Figure 29, if the tasks can effectively be decoupled by bounding above the deadline of a parent task in the taskprecedence tree by the minimum of the release time of all its child tasks. This is not always possible, since for example, the release time of a child task may solely depend on the moment that the parent task finishes executing. In such cases, the approach presented here (as also those in [38] and [39]) breaks down, and a more powerful algorithm must be sought. Each version has a given release time, a deadline, and represents a certain volume of computation demand. Other assumptions are as follows. Al. The mission lifetime, L, is much smaller than the mean time between successive processor failures. A2. The release time, deadline, and computational demand of each version are deterministic. So is L. A3. The number of processors available is sufficient to satisfy all deadlines and reliability requirements. Al is reasonable, considering that mean times between processor failure are nowadays in the 1,000- to 10,000- hour range. Given a ten-processor system, with processor MTBF's of 10,000 hours, the mean time between successive processor failures varies from about 1,000 to 10,000 hours, depending on how many processors are left functional, and assuming Poisson fault occurrences. We shall later consider the consequences of relaxing Al. A2 is unrealistic: the real-time environment is

135 Task - 2 3. 4 ~5 6 \ r I 7 Fl;gure 29. A Typical Pipe

138 stochastic, and this influences the task characteristics. However, in practice, worstcase numbers may be used to ensure that performance bounds can be met, thereby making A2 acceptable. Without A2, the problem would turn into one of stochastic scheduling which is known to be extremely difficult, if not impossible. Without sufficient processors, we clearly cannot proceed with a scheduling operation, which justifies A3. It is worth pointing out here that determining the minimum number of processors required is an NP-complete problem. As we saw in Chapter 3, a completion time of Ci of a particular version j can be linked to a coat, or overhead, of f( Cj), where 1f is the cost function. Each schedule defines the task completion times, and therefore has a cost associated with it, computed by adding together all the contributions from individual tasks. This is the objective function that must be minimized for determining an optimal schedule, subject to meeting thee specified probability of dynamic failure. We now consider rescheduling upon processor failure. Rescheduling must not be expected to rescue a given version of a task: that must be done, if at all, by other means. It is at the end of a version execution that the system responds to a failure by rescheduling the affected tasks. To understand why, we must consider the fault-detection mechanism. The traditional approach to fault tolerance using massive redundancy is to partition the multiprocessor logically into clusters of processors with respect to the individual tasks, and have the tasks run in parallel on each processor in the cluster. At the end of their execution, voting is carried out on the results. If unanimity is achieved, then it is assumed that no failure has occurred, the event that all members of the cluster arrive at the same wrong result being considered practically impossible. If

137 the majority of processors agrees on a result, that is considered the "true" result, and all processors in the cluster that do not agree with the majority result (or put out no result at all up to a given timeout moment) are regarded as suspect. Of course, if there is no majority result, the cluster as such fails, hard deadlines are missed, and catastrophic or otherwise unacceptable failure is the result. When unanimity is not achieved, fault-location algorithms must be invoked to locate the faulty processors. One way to do this is to run diagnostic tests on all the processors and then to agree on the results by using, say, the interactive consistency algorithm In any case, the detection aind location of a faulty processor occurs shortly after a given execution on that processor is complete. There is nothing that the rescheduling algorithm can do if the cluster as a whole fails, i.e., if the voter can find no majority result. The probability of that happening must be kept acceptably low by using sufficient active redundancy. Also, since everything (except failures) in the system is deterministic by assumption A2, the voting times are fixed, and so faults can only be located at certain given moments. When a processor failure is located, the primary clones allocated to the failed processor and not already started, cannot be executed. One ghost of each of these clones must therefore be activated. Clearly, the system will know of the need to activate a certain ghost only when it knows of the processor failure that gave rise to this need. We define the moment by which the system knows of the need to activate a ghost as that ghost's notification time. The notification time for ghost j is denoted by vi, with the ghosts being numbered 1,2,.... For convenience, we define v0=O. We assume that vi is less than or equal to ghost is release time. Notification interval j on processor i is defined as the interval between the j-th and

138 (j+ 1-th notification times of ghosts on that processor. Failure can occur due -to one of two reasons: Fl. Failure of one or more redundant clusters during one execution cycle so that at least one task misses a hard deadline sometime during the mission. As we said above, there is nothing that the rescheduling algorithm can be expected to do about this. F2. F1 does not happen, but so many failures occur over the entire mission lifetime that the system is unable to ensure that each version begins executing on a full-strength cluster, i.e. one might have to make do with fewer than the prescribed number of clones running in parallel (so that the probability of F1 occurring is raised above that considered tolerable). F2 is a simplification born of pessimism: not every weakened cluster is guaranteed to fail. From assumption A2, F1 and F2 can be used to determine the maximum number, N.,,, of processor failures that should be sustained. It is easy to see Pdy, < Prob{Fl} + Prob{F2} (55) where the < relationship arises from the fact that F2 is pessimistic. N,,,t can then easily computed from (1), and the given reliability specification, p7,C. This is illustrated in the following example. Example: Let there be two clusters of three processors each, together with some standbys. Assume that processor failures occur independently as a Poisson process with rate p. Let the period between fault-detection tests be constant at ca, and the mission lifetime be L. Define fi-rL/al. Clearly,

139 Pro{,Fl) ' 2jp2(a) (3-2p(a)) (56) where p(r)=l —cr. The approximation in (1) holds so long as the probability of FI is much less than one. In such a case, the probability of F2 is given by: Prob{F2} = 1- p L)[l-p(L)]i' (57) when M is the number of failures that can be sustained. Then, N," is the greatest M in (3) for which Prob{F) 2} > Prob{Fl) (58) When py,,c=l10-, MTBF=10,000 hours, the mission lifetime, L=10 hours, and fault-location checks are carried out every 0.1 second, Prob(Fl} is about 1.67X 10-1, (totally negligible when compared to p,,=10-~), and N,,,,=2. As noted before, in order to sustain the allowed-for number of processor failures, ghost clones of the various critical tasks are scheduled. Now, ghost clones are not activated if there is no processor failure, and we have assumed (Al) that failures are not common. So, the only constraint upon the scheduling of the ghosts is that their hard deadlines must be met: we are not unduly concerned with the efficiency of a schedule that has one or more ghosts activated. Although the ghosts are initially passive, and do not in that state require any processor time, they represent a latent demand for processor time that must be allowed for. For this reason, they affect the scheduling of the primary clones (since they share processors with them), and tend to lower the efficiency of the primary schedules. This can best be illustrated through the following example. Consider a processor to which is allocated primary clones 1 and 2, and ghost clone 3. Let their release times be rc4l=O, rei=l, and reI3=3, their deadlines be t-= 10, t12=12, and ta=7, and their cost

140 functions be IA(t)-t-1, f2(t)=(t-2)2, and J3(t)=O for arguments t less than the corresponding hard deadlines, and infinity above them. Let their run times be rl=6, r2=0.5, and r3=4. Without the ghost, the optimal primary schedule is clearly as shown in Figure 30a. However, although the ghost is not initially activated, does not even form a part of the primary schedule, and if not activated needs no processor time, we must always allow for the possibility of its being activated, and then occupying the processor from t=3 to t=7. To do this, the only acceptable primary schedule is that shown in Figure 30b, which is much more expensive than the one in Figure 30a. Since, in the overwhelming majority of cases, the primary schedules are the only ones that will be executed, their efficiency is of prime concern. Any faulttolerant scheduling with ghosts must therefore strive to minimize the increased cost of the primary schedule that can be attributed to the existence of the ghosts, while ensuring, if the ghosts ever have to be activated, that their hard deadlines are met. For this reason, we define the ghost cost functions as identically zero for values of the response time less than their associated hard deadlines. Since the penalty to be paid. for an activated ghost's missing its deadline is as great as that for a primary clone, the ghost cost function for response times greater than the deadline will continue to be infinity. When this cost function is used as input to an algorithm that performs optimal scheduling, the result will be to degrade the efficiency of the primary schedules as little as is consistent with the need to allow the ghost clones to meet their deadlines. We also assume that the cost functions of the primary clones for response times less than their hard deadlines are monotonically increasing, not just monotonically nondecreasing as defined in Chapter 3. Given that the set of real numbers is everywhere dense, this inflicts no practical inaccuracy on our

141 I 2 1 12 1.0 2.0 2.5 6.5 (a) Optimal Primary Schedule WVithout Influence of Ghost 3 _ 2 1.0 6.0 6.5 (b) Optimal Primary Schedule Showing Influence of Ghost 3 'Flgure 30. ~Effect of Ghosts on Primary Schedules: An Example

142 calculations. The number of ghosts per critical task, and the scheduling rules that they follow are developed in the next section. We can now state our problem. Assume we are given an algorithm P that finds either an optimum or quasi-optimum feasible schedule by minimizing the costs associated with individual schedules when all clones are assumed to be primary.1 Owing to the NP-completeness of the scheduling problem, P is most likely to be a heuristic procedure. A good example is a best-fit procedure, along the lines of the algorithm in [38] (although some changes might be needed to accommodate the different optimization criterion here). As in 138], we make the assumption that we can split P into two sub-algorithms: P1 which finds the optimal allocation of tasks to processors and also the optimal schedule, by calling P2, which is an optimum scheduler for uniprocessor systems. Our task is to develop an algorithm Q that can be used in conjunction with P1 and P2 to obtain an optimum schedule when enough ghosts are incorporated into the task set to sustain up to NJUf processor failures, and also to obtain contingency schedules that can be invoked when the system is notified that a given ghost is to be activated. B. Main Result We begin by stating certain conditions that the ghosts should satisfy. lTo avoid tedium in the sequel, by "optimal" we shall mean either optimal or quasi-optimal. When we wish to exclude quasi-optimality, we shall use the term "truly optimal".

143 C. There are N,,,t ghost clones for each version. C2. Ghosts are conditionally transparent. That is to say, (a) two ghost clones may overlap in the schedule if and only if none of their corresponding primary clones are scheduled on the same processor, and (b) primary clones may overlap ghosts on the same processor, and may indeed be scheduled without regard for the ghosts, so long as the following condition is met: for every instant t in the schedule, if all ghosts whose notification time is greater than t are activated, the schedule for beyond t can be reordered to ensure that all ghosts and all primaries allocated to that processor meet their hard deadlines, provided only that when some ghosts mutually overlap, at most one of them is activated. C3. Two clones of the same version, whether primary or ghostly, cannot be scheduled on the same processor. An illustration of C2 is provided in Figure 31. In this example, the notification time of a ghost is its release time. (Note that the notification time should be less than or equal to its corresponding release time.) In Figure 31a, we see an overlap of primary and ghostly clones that show condition C2: when the system is notified that the ghost is to be activated, the schedule for beyond that time can be reordered to ensure that all hard deadlines are met. Figure 31b shows an overlap that does not meet condition C2: in the event that the ghost is activated, then either ghost i or primary j will miss its deadline. It is easy to see that C1, C2, and C3 are necessary and sufficient conditions to ensure that reliability specifications are achieved.

144 ghost i primry 3 Vi;td tdj Overlapped schedule ghost i primary j vi td_ td Ghost i Activated (a) Permissible Overlap ghost i [4pr ary j_ V' id, tdj (b) Impermissible Overlap Figure 31. Illustratlng Condltlon C2

145,We can now turn to obtaining algorithm Q. We begin by stating some additional definitions and notation pertinent to algorithm Q. A hole arising out of an allocation of tasks to a particular processor is defined as an interval in which it is impossible to schedule any of the allocated primary clones due to their release times and deadlines. A point in the schedule representing time t is in a hole if and only if (i) the deadlines of all clones released prior to t are less than t and (ii) no clone is released at t. Clearly, a hole depends only on the allocation of clones and not on their specific positioning in a schedule. In other words, all feasible schedules for a given allocation have the same holes. In order to avoid confusion, we emphasize that not all blank spaces in a schedule need be holes: under our definition, only zones in which it is impossible to schedule anything allocated are holes. By primary holes, we mean the holes that would exist in a processor's schedule if the ghosts allocated to that processor did not exist. Algorithm P2 has, as input, the deadlines of the clones, and the remaining run times. Running P2(A) means that P2 is invoked with the set of clones A allocated to the processor. By initializing P2 at some time r we mean that P2 generates a schedule starting at r, using the inputs mentioned above. We denote the set of ghosts that have been allocated by Pl to processor i by,-. By ghostj)v we mean the jth ghost (in order of notification time) allocated to processor i. Algorithm Q adjusts or transforms the hard deadlines at step 3b. A deadline transformation is said to be feasible if, when it is possible to generate a feasible schedule of clones with pre-transformed deadlines, it is also possible to do so after the clones' deadlines have been transformed. Finally, we define P to be a modification of P2 that fills the holes to the extent possible with ghosts. In the event of

146 more than one ghost competing for the same hole space, P2 will try to overlap them. If it is not possible to do this and still satisfy condition C2 (recall that if ghosts overlap in the schedule, only one of the overlapping ghosts can be activated), priority will be given to the ghost with the earliest deadline. The remaining ghosts will be scheduled as primary clones, with the cost functions indicated earlier, with the additional condition that the time-slices given to the ghosts under the schedule are moved as far right as possible, consistent with the need to meet all hard deadlines. This last condition is easily met. For, with all primaries having monotonically increasing cost functions, the only case in which this is not automatically done by P2 is when it would not alter the finishing time of a primary clone. See Figure 32. In that event, all that is required to do is to "nest" the ghost within the primary. This means that, for any given time t, the non-primary-hole space occupied by the ghosts prior to t is the minimum. Algorithm Q is as follows. Let the set of primaries and ghosts allocated to i be Ir and 0, respectively. For every allocation of primaries and ghosts to processors by P1 do steps 1 to 6 for every processor i: Step 1. Record the holes for processor;, defined by the allocation. Step 2. Run P(7rUjf,). Return control to P1 if the schedule is found to be infeasible. Otherwise, record the positions of the ghosts in G. Step 3. Set j ---=O, and, while Ui is nonempty, do (a) - (f) (a) Initialize P2 at VIo,tl.

147 ghost j. prima: y i a I 2 1 3 I I primary _primary ghost j i a tdj.+a.l-.a2 j 3, tdi Figure 32. Nesting a Ghost Inside a Primary

148 (b) Transform the hard deadlines of the primary clones kEri to be JOQ-uti)l(O,*(), where i2 is the original hard deadline of clone k and uti4.(a,b) is the non-primary-hole space occupied by the ghosts in 0,, according to the ghost schedule G in the interval [a,b]. (c) Run P2(r) with these new deadlines, and let Sii be the schedule obtained. (d) Calculate the remaining run time of the clones scheduled in (c) by subtracting from the current remaining run time the duration for which they were scheduled in the interval [vfhht.),vo.jt./i+l)]. (e) Set j: —=O - ghost() (f) Set j:=j+l Step 4. If all primary clones kEir have not been completed by vo,,l.j), set t:-:=t(t, initialize P2 at VgLogtl), and run P2(ri), recording the output in Si. Step 5. Form schedule Si by piecing together the schedules represented by Sj in the interval [vhO,tjl),vghotqj+t1. Letting (Ck}, kEri be the completion times of the primary clones k according to schedule Si, compute, fk(Ci), and return this value as the cost associated with Si. Step 8. Restore all hard deadlines of the primary clones kEr; to their original values t~d~c.

149 Notice that only in step 2 are the ghosts scheduled: this ghost schedule determines the position of the ghosts for the given allocation. Q puts out a matrix of contingency schedules S$, with the optimal primary schedules (optimality is proved in Theorem 2) being given by Si', where i is the processor index and j the notification sequence index. Contingency schedule Si, is invoked at time vgho,(t) if (a) ghoatj) is to be activated, and (b) all ghosts I with vl<vi and allocated to processor i will not be activated. Q is a dynamic programming aigorithm. For every processor i in the system, it begins by obtaining the ghost positions. Then, it obtains the optimal schedule that also satisfies conditions C1-C3 for the ghosts in 9,. This is schedule Sio. The portion of SO in the interval [vo,vhostl)] is also the corresponding portion of Si. Run times of each primary clone are reduced by the amount of time they have been scheduled for under Sio in the interval [zov'O, tXl)l. Then, at v horhtll) goat,(1) is discarded from O,, and the above procedure is repeated to obtain Sil. The portion of Si, in the interval [vohollg)vAontl2J is the corresponding portion of Si'. Proceeding in this manner, Q pieces together the optimum primary schedule, which is the one that will be run if there is no ghost invocation. If ghosts(n) is activated and no ghost allocated to that processor with notification time prior to that of n is activated, the contingency schedule to be followed is Si,. This schedule will ensure that even if any or all of succeeding ghosts (i.e. ghosts in 0, with notification times greater than Vghon,,)) are activated, all hard deadlines of tasks assigned by P1 to processor i will continue to be met. The actual schedule to be used in the case of failure is then trivially obtained by amalgamating the ghost schedule onto the contingency schedule. When a ghost is activated, it retains its position as defined in the ghost

150 schedule, interrupting any primary clone that may be occupying the same timestretch in the contingency schedule. This effect can propagate, and all primary clones that are affected by this are moved right by an appropriate amount. Figure 33 provides an illustration of this. In this example, there is no hole in the schedule prior to t=40. If only ghost i is activated, then it retains in the actual schedule the position it had in the ghost schedule and interrupts primary f. If ghosts I, j, and k are all activated, each ghost retains its position as prescribed by the ghost schedule, appropriately interrupting and translating the primary clones. Theorem 1: If step 2 of Q yields a feasible schedule for a processor i, then all deadline transformations in Q pertinent to processor i for that allocation are feasible. Proof: This follows immediately from the fact that ghosts are only scheduled by P2 either in the holes left by step 2 of Q, or at the last possible moment (consistent with the need to satisfy all hard deadlines). Indeed, owing to the way we have defined P the schedule of ghosts G obtained from P2 is the one for which utl;, (O,t) is a minimum for any ghost-schedule G' and time t if all deadlines are to be satisfied. All this means that if one or more primaries cannot be scheduled feasibly under the deadline transformation in step 2, neither can a feasible schedule be found in step 2, leading to a contradiction. Q.E.D. Theorem 2: If P is truly optimal, S. is truly optimal. Proof: Let schedule S! found by Q not be truly optimal for some *. Then there exists a schedule S, with an objective function value less than that of Sr, and in which at least one primary clone is scheduled to finish after the transformed hard

151 3.0 5.5 6.5 8.7 10.0 12.5 Ghost Schedule 0.0 2.0 4.0 6.0 16.0 14.0 Contingency Schedule: Notification that i will be Activated lt 11~ 1 1k,7,,1 0.0 2.0 3.0 5.5 6.5 8.5 10.0 12.5 Schedule if only i is Activated a |3| i | j| i ||k 7 8 0.0 2.0 3.0 5.5 6.5 8.7 10.0 12.5 14.2 16.0 19.0 Schedule if i, j, and k ar.e Activated. Figure 33. Amalgamating Ghost and Contingency Schedules

152 deadline as defined in step (3b). Consider one such clone j, assigned to processor i. Let clone j finish at time 6 in schedule Si' and e in schedule SI. By assumption, e>6. Let 6 be in the notification interval h, i.e. 6E[vg[otAv(),Vgotl+l)). Denote by util a,b) the amount of non-hole space occupied on processor i by ghosts B=(c: v,>vgo,fth) and c ia allocated to,] in the interval la,bj. Clearly, (E(t()-uti1,0,lj)), t1 ), where tdv denotes the original (i.e. pretransformed) deadline of clone j. If utilUXO,)=util,(O,t(,)), then, by the minimality of util,, we have a contradiction: clone j will miss its deadline if all the ghosts in B are activated. So, utilO,e)< util,(O,4)), i.e. utill(,80)>O. If all ghosts in B scheduled prior to e are activated, then clone js finishing time will be increased by at least util,(O,f) + util(E,util,(O,)+e). Now, utilU e,util(O,E)+E)-al>O since otherwise util,(util,(O,)+~,t(,j) > 9d-)-util,(O,), a contradiction. So, the finishing time for clone j will be incremented again, this time by al. Similarly, we can show that utillutil(O,)+E,utilXO,t)+(+al)=2>,O, and so on. An infinite sequence (a}) will be generated. 00 Now, if we can show that _ oa=util,(O,( ))-util,(O,t), we have proved that Y k=1 00 is infeasible. Suppose, to the contrary, that k=-m< utlaO,J))-utUl,{(,). Then, k=1 utilf+m+util,(O,f),t~))-uti4o(,4t))-util,(Of-m. Then, clearly, we have util,( + m+ utilO(,( ),4d))< t(d)- -m-utilhO,), i.e. i < 14j9-utilX(O,)), a contradiction. So S' does not exist. QE.D. If P is optimal, then Q will also be optimal, since utilO0,tta) depends only on P, not on P2.

153 It is important to realize that it is only Si' that is optimal, i.e. that a contingency schedule Si., is, taken as a whole, not guaranteed to be optimal, in that one or more of the schedules that arose from superimposing the ghost schedule on the contingency schedule need not be optimal. For Si,, to be optimal, we would have to piece it together as we did Si'. This would give rise to schedules Si,,r+, S +2, etc. One would then have a contingency schedule for each possible combination of ghost invocation. (This is further explored in the example in the next section.) The set of such schedules would be very large. However, under assumption Al, failures are rare, and so the probability of any of the contingency schedules being invoked is very small. This means that the expected cost (in terms of the sum of cost functions of the clone finishing times) of the above set of schedules (i.e. Si and the contingency schedules) is only slightly greater than the expected cost of the schedule that consisted of S.' as primary schedule, and the whole set of contingency schedules, pieced together as explained above. Indeed, it is because of assumption Al that we have been able to use the cost of Si as the optimal cost associated with the given allocation of clones to processor i. This brings one to the question of relaxing assumption Al. If Al is relaxed because the mean time between successive processor failures is small, our method breaks down since S' can no longer be regarded as the optimal cost associated with a given task allocation. Fortunately, as remarked earlier, the mean time between failures is usually not small. It is more likely, however, that Al must be relaxed because the mission lifetime is very long. This is true, for example, of spacecraft. In such a case, it is useful to use the contingency schedules, not for the entire duration of the mission, but to

154 cover the interval during which a new optimum schedule (complete with a new primary and corresponding contingency schedules) is computed. This is profitable as long as the mean time between successive processor failures is much greater than the time taken to compute a new optimum schedule. Finally, we look at the burden this algorithm places on the computer during operation. This consists solely of amalgamating the ghost schedule with the contingency schedule, which amounts only to preempting the contingency schedule with the ghosts wherever necessary. The maximum number of preemptions is equal to the number of ghosts, and so the overhead for any given processor is directly proportional to the number of ghosts carried on that processor. Specifically, it is the product of the number of ghosts and the time consumed in handling preemptions. Since the latter time is usually very small, this is indeed an on-line algorithm. By contrast, nothing can be said a priori about the complexity of the algorithm Q, since it depends on the given algorithm P. C. Example We illustrate here the formation of an optimal schedule for a given allocation (specified by algorithm P1). We provide schedules for one of the processors only: the others are similarly derived. Assume that P1 has allocated primary clones of tasks 1,2,3, and 4, and ghost clones of tasks 5,86,7, and 8 to this processor. The cost function for primary clones of taslk i is given by:

155 _5s-#) if t< td A O t)0o otherwise The hard deadlines for the tasks are t&=-130, t,2110, t-3=90, t=-60, t&=22, ta==39, tc7=55, and t —105. The run time, rt, of all these tasks is 15 time units. The release times are: rcl,-i for i=1,2,3,4, rels=O, re=624, rcI7=40, and rel/=90. Ghost notification times are: uY, V=20, 17=40, and v8=90. These values were chosen arbitrarily, and have no physical significance by themselves. The algorithm P2 employed by us is a heuristic that takes scheduling decisions at the release and at the finishing Qf clones. At each of these moments t, it computes, given the set of available clones R (i.e. clones released, but not yet finished), an array of values Vs), where V(t)= fj(t+rrt,+rrti), and rrtk is the remaining (at time t) run time of clone k. The clone scheduled is i: V 0)=min{ V(j)} for all jER. The holes are the intervals [0,1) and (130,oo). The contingency schedules are shown in Figure 34. The optimal schedule S' for that processor is pieced together from these as shown. This schedule, together with the cost it represents, is returned to algorithm P1. In Figure 35, we show the actual schedules followed upon ghost activation. The case treated is one in which contingency schedule S0 has had to be invoked, i.e., the system has been notified that ghost 5 is to be activated. If only ghost 5 is activated, then it occupies the schedule during the intervals allocated to it by the ghost schedule, G. Primary clone 4 is thus preempted, and only resumes execution after ghost 5 has stopped running at time 22. The rest of the schedule gets shifted right by 14 time units to absorb the non-hole space occupied by ghost 5. If ghosts 5 and 6 are activated, then primary 4 is interrupted twice: once by 5 and then by 8.

158 0 I5 0 ( 1 8 22 24 39 40 55 105 120 Ghost Schedule, G 1 4 1 3 2 1 4 19 31 46 61 r[~~~~~~ ~~Contingency Schedule S,: 1 I q4 3 2 16 31 46 61 Contingency Schedule Si2 1 i 31 46 61 I i Contingency Schedule S. I I I I { I 1 4 Contingency Schedule S,3 1 12 4 3 1i 18 20 35 48 B0 81 I I I I Optimal Schedule S '{,_ 16 20 3 - 40 Flgure 34. Obtadning S.

157 5 C6 7 1 8 8 01 8 22 24 40 55 105 120 Ghost Schedule, G 141 1 3 1 2 0 4 19 31 48 81 Contingency Schedule S,, 5 4 l l.4 4 1 f 3 2 01 4 8 22 33 45 60 75 Schedule if Only Ghost 5 is Activated 5 4 4 I 1 1 5 6 4 ' 3 2 0 1 4 8 22 24 39 48 60 75 0 Schedule if Ghosts 5 and 6 are Activated '5 4 13l 7 1 3 2 0 1 4 8 22 33 40 5 60 75 90 Schedule if Ghosts 5 and 7 are Activated 5' 4 4 4 i I I I Hll 8 4 3 2 01 4 8 2224 39 40 5 60 75 90 10 Schedule if Ghosts 5, 8, and 7 are Activated Figure 35. Schedules when Ghosts are Activated

168 In each case, it must wait for the ghosts to complete before resuming execution. The rest of the schedule now gets shifted right by 29 time units. If ghosts 5 and 7 are activated, primary 4 is interrupted only by ghost 5, but primary 1 is also interrupted, by ghost 7. If ghosts 5, 6, and 7 are activated, then primary 4 is interrupted thrice, once by each of the ghosts, as shown. Notice that primary 4 just manages to meet its deadline: if it, in its turn, had not interrupted primary 1 upon release, this would have been impossible. If the system is notified that ghost 5 is not to be activated, then schedule Sio is discarded. Since -5=0, this can be done at time 0, which is why S' does not contain any portion of SiO. We can now easily see that Sio is not in itself optimal, in that it does not always lead to an optimal (optimal here is defined as "at least as good as the best schedule possible by using algorithm P2") schedule if the ghosts are activated. In Sio, primary 3 is set for execution before primary 2, although, if they exchanged positions, the resulting schedule would have lower cost. This is because we want to ensure that primary 3 meets its deadline in the event of all the contingencies that are notified after after v5-0, specifically the contingency that ghosts 5, 8, and 7 are all activated. If we were willing to afford the extra memory space required to store more contingency schedules, we could piece together Sio as we did S', and store, in addition to it, the second-order contingency schedules SO0, Si ], and SO 2. These second-order contingencies are shown in Figure 38, together with the "new" Si. (Clearly, Si 0 is just the same as the old Sio.) Of course, these changes do not affect Si' as found by piecing together the original contingency schedules, since the new contingency schedules Si, for m=1,2,..., differ from their old counterparts, if at all,

159 I 4 19 31 46 61 Second-Order Contingency Schedule Se, { 3 1 46 61 Second-Order Contingency Schedule S, I i 14 l 311 48 61 u,-.0 vv7= 40 New Contingency Schedule Figure 38. Piecing Together a Contingency Schedule New Contingency Schedule Figure 36. Piecing Together a Contingency Schedule

160 only for values of t> OvhOl+l) (defining vA.,j.")=oo if there are fewer than n ghosts allocated to processor t), and this piece of Si. is not included in Si'. D. Discussion This chapter has introduced a dynamic programming algorithm to generate fault-tolerant schedules. This algorithm is expected to be used in embedded computers where extremely high reliabilities are required. Examples are aircraft- and spacecraft-controllers, life-support systems, etc. The algorithm is useful because it minimizes the number of primary clones. This makes the schedule Si' efficient for each processor i, while still providing fallback schedules to be invoked in the (rare) instances of processor failure. The usefulness of this approach increases as increased demands are made on reliability: in the "obvious" approach which has all clones primary, an extremely stringent reliability requirement would swamp the system with a large number of primary clones, with all the implications that this would have for the computer response times and the consequent quality of (control) service provided by the computer. An example of this is provided in Figure 37, where the obvious approach is used, all the clones in the earlier example being declared primary. The degradation in response times is clear on comparison with Figure 35.

161 1 21.. 4 i 3 2 8 o 15 24 39 40 55 70 75 90 105 120 4.j (c, ) = 3,725,796 3-=1 Figure 37. All Clones Treated as Primary

PART IV DISCUSSION 162

CHAPTER 8 DISCUSSION In this dissertation, we have introduced new performance measures for assessing more appropriately the performance of multiprocessors in the control of critical real-time processes. In addition, we have, among other things: (i) Provided detailed examples of the derivation of these measures. (ii) Studied problems of synchronizing real-time systems in the presence of malicious failure, showing the impracticality of software methods, and presenting means for the indefinite expansion of phase-locked clocks. (iii) Presented a dynamic programming algorithm to generate fault-tolerant schedules. The work presented here was impelled by the need for a more orderly approach to the design and evaluation of multiprocessors used in real-time applications. The proliferation in the literature of (often-times ill-considered) multiprocessor architectures is partially a consequence of the lack of an effective framework in which designers can function. Coupled with this design activity is a growing awareness of the limitations of the theory associated with distributed processing. Queueing theory and scheduling theory are in their infancy. Even moderately complex queueuing and scheduling 163

164 problems are proving to be - for the present at least - intractable. The need is therefore growing for carefully considered approximate models that combine the result of experience and intuition with the accuracy and power of the analytical method. Before such models can be confidently used, however, a great deal of groundwork has to be completed. Central to this groundwork is the definition of appropriate characterizations of computer performance. As we have pointed out elsewhere [421, "by the simple act of choosing a particular criterion, the researcher imposes a bias on the results that follow. Performance measures by definition specify the commodity that is of importance... They have a subtle influence on the way systems are viewed. They are languages through which we seek to convey system performance. We know from experience with natural languages that these affect not only the way in which ideas are expressed, but also the very ideas themselves. Performance measures are no exception to this. They manipulate our view of system behavior, contorting it to fit a pre-conceived mould. For this reason, the choice of a performance measure determines the practical usefulness of the results that are then derived." We have therefore concentrated the first part of this effort on the derivation of appropriate performance measures. The rest of the dissertation may be considered a natural follow-up on these measures, and the problems treated there are among the most troublesome in fault-tolerant real-time system design. Extensions of this work are not difficult to think of. These include the following. (i) Determining ways to find the cost functions in real-life systems more efficiently than is currently possible. One way might be to derive approximate closed

186 form expressions for the functions. (ii) Extending our performance measures to general-purpose systems. The chief difference between these measures and the traditional ones is that the former are task-oriented (i.e. user-oriented), and the latter are system-oriented. Part of this work would include further investigations of user-perceived computer performance. In particular, much work remains to be done on user-perceived response delay: it is not clear that mean response time is an accurate measure of this. Perhaps some measure based on the second moment will be more appropriate. Some preliminary work in this area has been done by Geist and Trivedi [41]. (iii) Studying the optimal placement of checkpoints. We have argued elsewhere [42] that the currently-used measures of mean response time and availability are inappropriate, and proposed new measures that are more sensitive to the benefits that accrue from checkpointing. This is interesting and relevant since it is likely - as we pointed out above -- that mean response time is not a correct measure of perceived response delay. (iv) Measuring the cost of status information in computer systems with distributed control. There is beginning to be increasing interest in such systems. When one employs distributed control, the perception of the state of the system can vary from controller to controller. Naturally, the closer each perception is to actuality, the better the control provided. However, distributing status information amongst the controllers is expensive since it takes time. If we used time-based performance measures such as these (they would obviously have to be modified if we were dealing with general-purpose computers), the cost of

168 distributing such status information could then be determined. One can then transform the problem into one of taking decisions in the face of costly information: a problem widely studied in statistical decision theory. The work that we have presented here was motivated by the need to study control computers as a distinct - and important - branch of computer engineering. Control computers have for too long been treated as an unimportant appendage of general-purpose computers. The problems specific to control computers are too difflcult and important to be subjected to benign neglect.

APPENDIX EXPRESSIONS FOR FINITE COST FUNCTION Finite cost functions for Example 2 in Chapter 2 can be expressed by an if-thenelse construct as follows: if xlix2i>O then if x2i >k then g(x,) = 2 [t,(xi,k,() + t,(xj,-k,)]J else g(xi,') = 2 [tl(x,, sgn(x2i)k, () + t2(x, -sgn(x2i)k, else if X2i >k then g(xji,) = - [t2(XiXk,) + t2(Xi,-k,f)J else g(x;,,) = 2I [tl(xi, -sgn(x2i)k, 0) + t2(xi, sgn(x2i)k, ~)] end if; where a = H/m, y(x2i,k)-x2i+k, xi- (Xi,X2i)T, _ 2a I xl I -y(x21,k)-2a I y(x2i,k) I/ x4ay(x2i,k)" 16l7

188 2a I x1i -y2(x23,k)-2a I y(x21,k) e =xjpk~f 4ay(x2i,k) tI y(x2kX l+/) I hi(xi,k, --- + a I xi I-I y(x2i,k) ((+v(xi,k,O)) r2(xk,) f < 2a x1 I-2(x2i,k) t2a.x (2 2 y(x,k) I t2(xi,k,) = y2(x /I +2 2(x.) -ix.k' otherwise ~~ Pn V a2 - a~

REFERENCES [1] A. L. Hopkins, et al., "FTMP -- A Highly Reliable Fault-Tolerant Multiprocessor for Aircraft," Proc. IEEE, Vol. 66, No. 10, pp. 1221-1289, October 1978. [2] J. H. Wensley, et al., "SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control," Proc. IEEE, Vol. 66, No. 10, pp. 1240-1255, October 1978. [3] M. S. Fox, "An Organizational View of Distributed Systems," IEEE Trans. Sys., Mlan, and Cyber., Vol. 11, No. 1, pp. 70-80, January 1981. [4] J. Gertler and J. Sedlak, "Software for Process Control - A Survey," Automatica, Vol. 2, No. 6, pp. 613-625, 1975. [51 J. F. Meyer, D. G. Furchgott, and L. T. Wu, "Performability Evaluation of the SIFT Computer," IEEE Trans. Comput., Vol. C-29, No. 6, pp. 501-509, June 1980. [6] R. E. Barlow and F. Proschan, Mathematical Theory of Reliability, John Wiley, New York, 1962. [7] M. D. Beaudry, "Performance-Related Reliability Measures for Computing Systems," IEEE Trans. Comput., Vol. C-29, No. 8, pp. 501-509, June 1978. [8] J. Losq, "Effects of Failure on Gracefully Degrading Systems," Proc. FTCS-7, pp. 29-34, March 1977 [9] R. Troy, "Dynamic Reconfiguration: An Algorithm and its Efficiency Evaluation," Proc. FTCS-7, pp. 44-49, March 1977. [10] F. A. Gay and M. L. Ketelson, "Performance Evaluation of Gracefully Degrading Systems," Proc. FTCS-9, pp. 51-58, June 1979. [11] J. M. De Souza, "A Unified Method for the Benefit Analysis of FaultTolerance," Proc. FTCS-10, pp. 187-192, October 1980. [12] X. Castillo and D. P. Siewiorek, "A Performance Reliability Model for Computing Systems," Proc. FTCS-10, pp. 187-192, October 1980. [131 T. C. K. Chou and J. A. Abraham, "Performance/Availability Model of Shared Resource Multiprocessors," IEEE Trans. Reliability, Vol. R-29, No. 1, pp. 70 -78, April 1980. 189

170 [14] H. Mine and K. Hatayama, "Performance-Related Reliability Measures for Computing Systems," Proc. FTCS-9, pp. 59-62, June 1979. [15] S. Osaki and T. Nishio, Reliability Evaluation of Some Fault-Tolerant Computer Architectures, Springer Verlag Lecture Notes in Computer Science, Berlin, 1980. [16] R. Huslende, "A Combined Evaluation of Performance and- Reliability for Degradable Systems," Proc. ACM SIGMETRICS Conf. on Measurement and Mlodelling of Computer Systems, pp. 157-164, September 1981. [17] J. F. Meyer, "On Evaluating the Performability of Degrading Computer Sustems," IEEE Trans. Comput., Vol. C-29, No. 8, pp. 720-731, August 1980. [18] J. F. Meyer, "Performability Modelling of Distributed Real-Time Systems," Int 'l Workshop Appl. Math. and Pcrf/Reliab. Model. Comp./Comm. Syst., pp. 345- 358, September 1983. [19] G. K. Manacher, "Production and Stabilization of Real-Time Task Schedules," J. A CM, Vol. 14, No. 3, July 1967, pp. 439-465. [201 D. E. Kirk, Optimal Control Theory, Prentice Hall, Englewood Cliffs, NJ, 1970. [21] F. J. Ellert and C. W. Merriam, "Synthesis of Feedback Controls Using Optimization Theory -- An Example," IEEE Trans. Auto. Control, Vol. AC-8, No. 4, April 1963, pp. 89-103. [22] L. Essen, The Afcasurement of Frequency and Time Interval, H.M.S.O., London, 1973. [23] T. B. Smith, "Fault-Tolerant Clocking System," Proc. FTCS-11, pp. 262-264, 1981. [24] A. W. Holt and J. M. Myers, "An Approach to the Analysis of Clock Networks," R-1289, C.S.Draper Laboratory Contract Report (Preliminary), June 1978. [25] D. Davies and J. F. Wakerly, "Synchronization and Matching in Redundant Systems," IEEE Trans. Comput., Vol. C-27, No. 6, pp. 531-539, June 1978. [26] J. Goldberg, et al., "Development and Analysis of the Software Implemented Fault-Tolerance (SIFT) Computer," NASA CR-172146, June 1983. [27] S. L. Hakimi and K. Nakajima, "Adaptive Diagnosis: A New Theory of tFault-Diagnosable Systems," Twentieth Allerton Con!. Comm. Control,

171 Comput., pp. 231-240, October 1982. [281 D. Palumbo and R. W. Butler, "SIFT - A Preliminary Evaluation," Proc. Fifth Digital A vionics Symp., August 1983. [29] M. Pease, et al., "Reaching Agreement in the Presence of Faults," J. ACM, Vol. 27, No. 2, pp. 228-234, April 1980. [30] L. Lamport, et al., "The Byzantine Generals Problem," ACM Trans. Prog. Lang. and Syst., Vol. 4, No. 3, pp. 382-401, July 1982. [31] D. Dolev, "The Byzantine Generals Strike Again," J. Algorithms, Vol. 3, No. 1, pp. 14-30, January 1980. [32] D. P. Siewiorek and R. S. Swarz, The Theory and Practice of Reliable System Design, Digital Press, Bedford, MA, 1982. [33] J. Losq, "A Highly Efficient Redundancy Scheme: Self-Purging Redundancy," IEEE Trans. Comput., Vol. C-25, No. 6, pp. 569-578, June 1976. [34] F. P. Mathur and A. Avizienis, "Reliability Analysis and Architecture of a Hybrid-Redundant Digital System," Spring JCC, AFIPS Conf. Proc., Vol. 36, pp. 373-383, 1970. [35] P. J. B. King and I. Mitrani, "The Effect of Breakdown on the Performance of Multiprocessor Systems," Performance '81, North-Holland, Amsterdam, 1981. [361 N. A. Lynch, M. J. Fischer, and R. J. Fowler, "A Simple and Efficient Byzantine Generals Algorithm," Proc. Reliab. Dist. Soft. and Database Syst., pp. 46 -52, 1982. [371 D. Gross and C. M. Harris, Fundamentals of Queueing Theory, John Wiley, New York, 1974. [38] K. S. Trivedi and J. A. Bannister, "Task Allocation in Fault-Tolerant Distributed Systems," Acta Informatica, Vol. 20, Fasc 3, pp. 261-281, 1983. [39] J. B. Mazzola and A. W. Neebe, "A Heuristic Procedure for Allocating Tasks in Fault-Tolerant Distributed Computer Systems," Naval Research Logistics Quarterly, Vol. 30, pp. 493-504, July 1983. [40] P. H. Enslow, "What is a Distributed Data Processing System?" Computer, Vol. 11 No. 1, pp. 13-21, January 1978.

172 [41] R. M. Geist and K. S. Trivedi, "The Integration of User Perception in the Heterogeneous M/G/2 Queue," in A. K. Agrawala and S. K. Tripathi, eds. Performance '83, North-Holland, Amsterdam, 1983. [42] C. M. Krishna, Y.-H. Lee, and K. G. Shin, "On Optimization Criteria for Checkpoint Placement," Communicatione of the A CM, to appear.

I11 IIIIIII IIIIIIIIII II1 3 91 E0 0612e r d