AL-TDR-64-24 FOREWORD This report, performed under U. S. Air Force Contract No. AF 33(657)-7391, describes results of a study conducted for the Air Force Avionics Laboratory, Research and Technology Division, Wright-Patterson Air Force Base, Ohio. Mr. D. J. Boaz was the project engineer. The contractor participants were drawn from the Information Systems Laboratory, Department of Electrical Engineering, The University of Michigan. Those engaged in the research weret Dr. Harvey L. Garner, project director, Rodolfo Gonzalez, Sandra Palais, Thomas F. Piatkowski, and Jon S. Squire, with frequent and useful collaboration from the rest of the members of the Information Systems Laboratory. The work reported was performed during the period October, 1961, to December, 1963. This is the final report on the contract.,

um Q 16?

AL-TDR-64-24 ABSTRACT The main objective of this research has been to investigate the problems and possibilities generated by the idea of a computer built as an iterative array of elementary self-contained processors. Since the problem was a new one, a large number of areas of study were available, and the following were treated in detail: 1. Programming aspects: a. Decomposition of a program in order to obtain maximum concurrency. b. A translation algorithm to facilitate programming. c. Algorithms for path-building. 2. Proposed organizations: a. A multi-layer computer. b. An iterative circuit computer with n-dimensional geometry. 3. Statistical evaluation of accessibility for different geometrical structures. 4. Reliability problems and new possibilities. 5. A theoretical model, linking the iterative circuit computer structure to that of an n-head automaton. Due to the variety of topics covered, each section has been provided with its own introduction, leading the reader to the problem and providing the necessary relationship with previous work. In this way, each section provides a comprehensive treatment of the title subject in a self-contained form, allowing for independent reading. Publication of this technical documentary report does not constitute U. S. Air Force approval of the report's findings or conclusions. It is published only for the exchange and stimulation of ideas. iii

AL-TDR-64-24 TABLE OF CONTENTS Page 1. INTRODUCTION 1 2. PROGRAMMING ASPECTS OF MULTI-PROCESSOR COMPUTERS 6 2.1 Path Building Procedures 6 2.1.1 Introduction 6 2.1.2 General Outline of the Path Building Problem 8 2.1.3 Definitions and Nomenclature 13 2.1.3.1 Definitions: Neighborhood Relations 13 2.1.3.2 Nomenclature 14 2.1.3.3 Labels 14 2.1.4 Detection of Barriers and Isolated Regions 19 2.1.4.1 Programming the Detection of Barriers 22 2.1.5 Final Considerations for Path Building 25 2.1.5.1 Restriction on the Class of Paths Admissible 25 2.1.5.2 Use of Redundant Paths 28 2.1.5.3 Extra Requirements Introduced by the Redundant Mode of Operation 31 2.1.5.4 Assignment of Priorities 31 2.1.5.5 Path-Building Procedure 34 2.1.6 Flow Diagrams for the Implementation of the Barrier Detection and Path-Tracing Procedures 40 2.1.6.1 Adapting the Procedures for Computer Solutions 40 2.1.6.2 Flow Diagram for the Detection of Barriers 40 2.1.6.3 Flow Diagram for the Path-Tracing Algorithm 42 2.2 Translation Algorithms 44 2.2.1 Maximal Decomposition of Algorithms 44 2.2.1.1 Step 1. Choice of Primitives 45 2.2.1.2 Step 2. Branch Assignment 46 2.2.1.3 Step 3. Formation of Tree 47 2.2.1.4 Step 4. Determine Computation Time and Degree of Concurrency 48 2.2.2 Analysis of Sixteen Numerical Computation Algorithms for Concurrency of Arithmetic and Control 53 2.2.2.1 Sum or Products of N Numbers 53 2.2.2.2 Evaluate Nth Degree Polynomial 54 2.2.2.3 Multiply Vector by Matrix 55 2.2.2.4 Multiply Two Matrices 55 2.2.2.5 Matrix Inversion 55 2.2.2.6 Solving System of Linear Equations 56 iv

AL-TDR-64-24 TABLE OF CONTENTS (Continued) Page 2.2.2.7 Solving Ordinary Differential Equations 56 2.2.2.8 Least Square Bit 56 2.2.2.9 Compute the Nth Prime 57 2.2.2.10 Unordered Search 57 2.2.2.11 Sort N Elements 57 2.2.2.12 Coefficients of Fourier Series 58 2.2.2.13 Evaluating Fourier Series 58 2.2.2.14 Neutron Diffusion Equation 58 2.2.2.15 Neutron Transport Equation 58 2.2.2.16 Eigenvalues of a Matrix 59 2.2.2.17 Summary 59 2.2.3 Machine Implementation Via a Translation Algorithm 60 2.2.3.1 Phase I-Recognition of Concurrency 63 22..3.2 Expression Scan 65 2,2.3.3 Phase 2 of Translation Algorithm 70 2.2.3.4 Phase 3 of Translation Algorithm 71 2.2.3.5 Machine Instructions 72 2.2.4 An Augmented Language to Permit More Concurrence in Processing 84 2.2.4.1 Limitations and Potential Refinements 96 2.2.5 Example of Application of the Translation Algorithm 98 3. MACHINE ORGANIZATION 102 3.1 A Multi-Layer Iterative Circuit Computer 102 3.1.1 Introduction 102 3.1.2 Description of the Computer 107 3.1.3 Description of the Planes 109 3.1.4 Description of the Modules 110 3.1.5 Word Format 112 3.1.6 Path-Building Procedure 113 3.1.7 List of Instructions 118 3.1.8 Operation of the Computer 120 3.1.9 Geometrical Operations 131 3.1.10 Conclusions 132 3.2 Physical and Logical Design of a Highly Parallel Computer 135 3.2.1 Introduction 135 3.2.2 Objectives 135 3.2.3 Organization 137 3.2.4 Instruction Code 144 3.2.4.1 Instruction Format 145 3.2.4.2 Execution Bits 146 3.2.4.3 Interprogram Protection 148 v

AL-TDR-64-24 TABLE OF CONTENTS (Continued) Page 3.2.4.4 Indirect Addressing 148 3.2.4.5 Arithmetic Operations 149 3.2.4.6 Byte Modification 150 3.2.4.7 Transfer Instructions 151 3.2.4.8 Inhibit Modification 152 3.2.4.9 Input-Output Instruction 152 3.2.4.10 Operation Codes 153 3.2.5 Physical and Logical Design 156 3.2.6 Conclusion 172 3.3 Hardware Requirements for Machine as Described 175 3.4 Matrix Inversion Program for an I.C.C. 176 4. DETERMINATION OF ACCESSIBILITY 180 4.1 Description of an Iterative Circuit Computer 180 4.2 Evaluation Techniques 182 4.3 The Matrix Inversion Problem 183 4.4 Analysis of the Path Building Problem 189 4.5 Conclusions 203 5. RELIABILITY IN ITERATIVE MACHINES 205 5.1 Redundancy at the Module Level 205 5.2 Checking Program Approach 211 6. A MATHEMATICAL MODEL OF AN I.C.C. 214 6.1 Introduction 214 6.2 n-Head Finite State Machines-A Description 217 6.2.1 Alphabets 217 6.2.2 Tapes 217 6.2.3 Machines 220 6.2.4 State Graphs 224 6.3 The Language 231 6.3.1 Operations on Alphabets 231 6.3.2 Operations on Partial Tapes 232 6.3.3 Operations on m-Tuples of Partial Tapes 237 6.3.4 Operations on Sets of m-Tuples of Partial Tapes 238 6.3.5 Regular Expressions 240 6.4 Equivalence Theorems 242 6.4.1 1-Way 1-Dim I-Head Machines 242 6.4.2 I-Way i-Dim n-Head n-Tape Machines 247 6.4.3 2-Way 1-Dim 1-Head Machines 251 vi

AL-TDR-64-24 TABLE OF CONTENTS (Concluded) Page 6.4.4 2-Way D-Dim 1-Head Machines 256 6.4.5 2-Way D-Dim n-Head n-Tape Machines 258 6.4.6 2-Way D-Dim n-Head m-Tape Machines 261 6.5 Assorted Algorithms and Theorems Dealing with the Decision Problems and Speed of Operation of n-Head Machines 266 6.5.1 Algorithm for Deciding 1-Wayness of Machines 266 6.5.2 Algorithm for Deciding the Realizability of Regular Expressions 268 6.5.3 1-Way 2-Head Equivalents of 2-Way 1-Dim 1-Head Machines 269 6.5.4 The "Particular Input" Decision Problem 277 6.5.5 The Emptiness Decision Problem 279 6.5.6 Boolean Properties of n-Head Machines 283 6.5.7 Speed Theorems 286 6.6 Topics for Further Study 294 6.6.1 Reduction Problems 294 6.6.1.1 Head Reduction 294 6.6.1.2 State Reduction 296 6.6.1.3 Speed Reduction 297 6.6.2 Representability Problems 298 6.7 Suimary 300 6.8 Some Practical Problems with n-Head Machines 305 6.8.1 The Finite Nature of Real-Life Problems 305 6.8.2 Non-Implications of Section 6.4 307 6.8.3 The Advantage of End-Marks on Tapes 307 6.8.4 "Time" as a Tape Dimension 308 6.8.5 A Consequence of Touched Heads 309 6.8.6 Applications of n-Head Machines 309 7, REFERENCES 311 vii

AL-TDR-64-24 LIST OF FIGURES Figure Page 1. Internal switching network. 9 2. Internal registers. 9 3. Terminal and connecting modules. 9 4. States of connecting modules. 10 5. Modules belonging to several paths. 10 6. Barriers formed by single paths. 12 7. Multiple path barriers. 12 8. Adjoining and contiguous modules. 15 9. Neighboring modules. 15 10. Normal and non-regressive paths. 16 11. Regressive paths. 16 12. Labeling of modules. 17 13. Generating method for vertex coordinates. 17 14. Labeling of vertices. 18 15. Pre-existent paths. 21 16. Duals of the pre-existent paths. 21 17. Continuous barrier. 23 18. Tracing path A-B. 23 19. Dual and barrier. 26 20. Dual and barrier proper. 26 21. Series connection of n modules. 30 22. m parallel paths of n modules each. 30 viii

AL-TDR-64-24 LIST OF FIGURES (Continued) Figure Page 23. The two cases of priorities. 33 24. Two cases starting with the low priority direction. 33 25. Maze problem. 36 26. First solution with priority VsH. 36 27. Change in path route induced by the inclusion of G'. 38 28. Change in path route induced by the inclusion of I'. 39 29. Flow diagram for the detection of barriers. 41 30. Flow diagram for path tracing. 43 31-32. Example of phase 1 of translation. 74-75 33. Status of list before each quintuple is generated. 76 34-35. Example of phase 2 of translation. 77-78 36. Merge-link table at end of phase 2. 79 37-38 Final program after phase 3 of translation. 80-81 39. General block diagram of translator. 82 40. Expression scan flow diagram. 83 41. Inter-layer and wrap-around connections. 108 42. Three-plane structure and common buses. 111 43. Information-line switching in a module. 115 44. Column and row information lines. 115 45. Path connection for the instruction* (33;22)(store)(55;66). 116 46. Overlapping of phases. 121 47. Execution phase of instruction 1. 125 ix

AL-TDR-64-24 LIST OF FIGURES (Continued) Figure Page 48. Execution phase of instruction 2. 126 49. Execution phase of instruction 3. 127 50. Execution phase of instruction 4. 128 51. Operation complete pulse. 130 52. Transfer of instruction 6. 130 53. Operation 6 executed. 150 54. Delayed operation complete pulse. 130 55. Extend operation. 133 56. Reproduce operation. 133 57. Displace operation. 133 58. Detail of path segment wiring. 142 59. Instruction format. 146 60. Location referred to by indirect address. 149 61. Format for full-word number. 150 62. Top view of the I.C.C. 161 63. Side view of the I.C.C. 161 64. Function and flow block diagram of module. 163 65. Information flow during execution. 165 66. Function and flow diagram for path-connecting circuitry. 168 67. Two-Dimensional priority selector. 169 68. Progression of a path connection. 171 69. Simple I.C.C. program. 181 x

AL-TDR-64-24 LIST OF FIGURES (Continued) Figure Page 70-71. I.C.C. program to invert a matrix. 185-186 72. Master control timing program for a 5x5 matrix inversion. 188 73. Distribution of module distance for the 2-dimensional I.C.C. 195 74. Distribution of module distance for N-cube I.C.C. 196 75. Tape t1. 218 76. Tape t2. 219 77. Subtape t'. 220 78. State graph of 012.1' 225 79. Simplified state graph of 021' 226 80. Machine 01 22' 230 81. Machine 0 4.1- 243 82. Machine 0142' 245 83. Machine 42- 246 84. Machine 0143 248 85. Machine 01 44 250 86. Machine 01 44. 250 87. Machine )0 45. 253 88. Machine 0b 46. 255 89. Machine t 4.6 255 90. Machine t 4.7- 256 91. Machine 01 487 257 92. Machine 01 4.8' 257 xi

AL-TDR-64-24 LIST OF FIGURES (Concluded) Figure Page 93. Machine 0L 4.9 259 94. Machine 1 4.9 259 95. Machine 1 4 10 260 96. Machine 4.10 261 97. Machine 01 4.11 263 98. Machine Q0 4.12 265 99. Machine t 5.1 27 100. Machine (' ( 1). 276 101. Machine (0. 285 102. Form of tapes in Ak. 289 103. Machine 0t 6.1 297 104. Machine t 6.2 297 xii

AL-TDR-64-24 LIST OF TABLES Table Page 1 Summary of Operations 61 2 Comparison of Three Computer Organizations for the Matrix Inversion Problem 184 3 Analytic and Simulation Results 202 4 Methods of Applying Redundancy 209 5 Detail of Lines 5 and 6 of Table 4 210 6 The Existence of Effective Procedures for Decision Problems 302 xiii

AL-TDR-64-24 1. INTRODUCTION It appears that the ever present need for faster and more efficient computers will continue. In the past, computer advances have been obtained by improvements in physical devices which have allowed faster operation and greater logical complexity. While component advances will continue there is no projected development, with the possible exception of coherent optics, which will obtain a speed increase of many orders of magnitude within the framework of conventional machine organization. The subject of parallel computation is not new. Examples of parallelism and concurrency are found in early computer designs. The ENIAC obtained increases in basic speed over electro-mechanical computers by electronic circuitry and also employed a parallel arithmetic structure consisting of 20 accumulators. Difficulties in programming this machine lead to the stored program concept in conjunction with a single high-speed arithmetic unit as seen in the Princeton class of machines, or the EDVAC and its descendants. This organization has been universally used until recently when again computational requirements are beginning to exceed the capability of present machines. The most recent generation of computers, LARC, STRETCH, GAMMA 60, Bendix G20, and the RW 400, employ either concurrency of parallelism to a limited extent. At least one computer being constructed, but not yet delivered, has a processing unit capable of simultaneously executing many arithmetic operations. It would appear that the question in the area of computer organization is not whether Manuscript released by authors April, 1964, for publication as an ASD Technical Documentary Report. 1

parallel machine organization is needed, but rather what should the parallel organization be to obtain effective and economical computation. The Iterative Circuit Computer (I.C.C.) concept was developed abstractly by John Holland of The University of Michigan. The mathematical existence of this class of machines was required for Holland's37 studies in the theory of adaptive systems. The original abstract specification is sufficiently broad so that all previous machines, both theoretical and real, can be represented in the I.C.C. framework. In the theory of adaptive systems, the I.C.C. provides a homogenous, isotropic medium for imbedding programs which simulate some physical systems. The interaction and modification of programs in the I.C.C. media correspond to the interaction and changes in the physical system. In our current research we are evaluating the I.C.C. concept with respect to the organization and the analysis of organization of practical computing machines. In this respect, the I.C.C. appears to represent a reasonable abstract structure for the study of problems related to the organization of parallel computers. We also seek to determine whether some form of the I.C.C. concept provides a basis for the design of a practical parallel computing machine. In this paper, some of the preliminary results obtained from studies considering the I.C.C. as a real computer are presented. These studies have been undertaken to determine the effectiveness and the nature of the limitations of the I.C.C. concept. It should be emphasized that our research is being conducted to learn more about the I.C.C. concept. At this time, we are not in a position to either encourage or discourage the concept of a practical I.C.C. 2

Ultimately, the I.C.C. as a practical computer concept must rise or fall on the basis of application. The apparent problems of the I.C.C. concept can be summarized by four pertinent questions: (1) What are the programming difficulties? (2) What are the path building limitations? (3) Can large numbers of components be used reliably? (4) Can the components in the I.C.C. organization be used efficiently, and how many components are required for an effective computer for a given problem or class of problems? Aspects of the concept which seem extremely favorable are the possibility of economical manufacturing of iterated structures and the degree of local control inherent in the Holland I.C.C. structure. It is the degree of local control which distinguishes the Holland I.C.C.8 30 from other I.C.C.'s such as the Ungar machine, McCormick's ma38 16 chine, or the SOLOMON.16 In the Holland I.C.C., control must be established by the program. Thus, the degree of logical or global control is variable and controlled by the user. There is no doubt that parallel computation requires some local control. The pertinent questions which must be answered in detail are whether effective use can be made of the high degree of local control which is obtainable, and whether the price which must be paid to obtain variability in the degree of control is reasonable. A high degree of local control was required for the abstract studies of adaptive systems. It is expected that the development of a highly parallel computer and the development of techniques for applying such machines to problems not effectively solved on existing machines will necessitate variable local control. The need for local control in conventional problems is being studied. The question of the need for local control is critical to the evaluation of the Holland concept. 5

Path building and iterative structure are also fundamental to the Holland I.C.C. concept. Detailed considerations such as the number and types of instructions and the path building mode are pertinent to a detailed design study, and they are treated in Sections 2.1 and also when two proposed organizations are presented in Sections 3.1 and 3.2. The actual number of instructions that can be executed simultaneously is limited severely by the geometry of the machine. Maximum accessibility is gained when the modules are interconnected as if they were located at the vertices of a n-dimensional cube. Comparative studies for typical problems are presented in Section 4. The large amount of hardware implicit in any iterative structure brings an old problem into a completely new environment; the large number of elements in itself seems to increase tremendously the probability of failure, while at the same time, the very redundant structure allows for several new schemes, totally impractical in machines with a standard organization. While the redundancy can still be introduced internally at the module level, it is much more significant that the availability of "extra" modules not busy with the actual calculations permits the introduction of a checking program, which "circulates" over the array, testing each module in turn. The problem is thus changed radically; now it is a matter only of guaranteeing the simultaneous availability in a working state of a number of modules for the short interval between two scans by the test program. Section 5 treats this problem and presents calculations on the expected scan time between failure for a practical size machine using microelectronic elements available at this date. 4

In a new field like this, where there exist no practical guidelines or previous experience, it is perfectly admissible to make assemptions about number of components, mode of interconnection, etc. But if we expect to solve the problems of how to program effectively such a machine, what types of problems are more suitable, what advantages can be expected, etc., a formal mathematical model becomes a necessity. One such model is presented in Section 6, along with comments on the practical consequences and limitations of the theory. The material presented in this report has appeared or has been the base for the following publications: "Iterative Circuit Computers," by H. L. Garner and J. S. Squire, Proc. of the Workshop on Computer Organization, Spartan Books (1963). "A Multi-Layer Circuit Computer," by R. Gonzalez. Presented at the 1963 ACM National Conference. Also to appear in the IEEE Trans. on Elect. Computers, Special Issue on Machine Organization, December, 1963. "Programming and Design Considerations of a Highly Parallel Computer," by J. S. Squire and S. M. Palais. Presented at the Spring Joint Computer Conference, May, 1963. "A Translation Algorithm for a Multi-Processor Computer," by J. S. Squire. Presented at the ACM National Conference, August, 1963. "n-Head Finite State Machines," by T. F. Piatkowski, Ph.D. Thesis, The University of Michigan, December, 1963. 5

2. PROGRAMMING ASPECTS OF MULTI-PROCESSOR COMPUTERS 2.1 PATH BUILDING PROCEDURES 2.1.1 Introduction The need for increased computational capabilities, brought into evidence when dealing with problems in the fields of pattern recognition, game playing or simulation of physical models, has suggested the use of highly parallel iterative circuit computers.81 Furthermore, the recent advances in the technology of micro-miniature, integrated and "grown" electronic devices show promise that in the near future the availability of low cost modules essential for the practical realization of this type of machine will make even more pressing the need for detailed studies of this new concept. A study of the pertinent literature reveals that increase in versatility is accompanied by the introduction of several new problems inherent in this novel machine organization. Some of these problems are: (a) Data allocation difficulties due to the "floating" address, as used 8 in Holland's paper. (b) Programming difficulties due to the unlimited interaction possible between concurrently running programs. (c) Necessity of some allocation protection method due to the presence of several program running simultaneously. (d) Problems in the flow of control due either to the lack of a centralized organ of command or to its impractically enormous complexity. (e) Problems in programming and in interconnections generated by the lack of continuity of the geometrical properties of the space over which the machine is spread. A number of these problems appear only in some of the proposed machine organizations, but others are germane to the very essential features of the 6

general class of I.CoC.'s. Therefore, these latter problems deserve especial consideration and detailed study. The flow of information and control, as determined by the procedures used to connect operators and operands, is one of the factors that lies at the crux of the successful operation of I.C.C.'s. 8 One approach to this problem is given in Holland's paper. In Holland's theory, access to new operands is gained by adding or deleting modules from the termination of a fixed path. This method is essentially a counting procedure along a direction specified by the current instruction, and requires a small amount of hardware to implement it. This advantage is offset by the long time needed for path building since the procedure is a sequential one, and by the fact that the time needed for completion of the path-building phase is a function of the relative position of the modules to be connected. At the other extreme, one could suggest a method using coincidence of addresses detection, resulting in a very fast but expensive procedure. As in many physical situations, speed is traded for complexity since these are the only two factors that can be rearranged in this case, A different technique would be to generate a fresh path from each successively active module, and to leave the already used paths connected for possible future use. The potential gain in speed could, however, be offset by the difficulty in finding a new connection through all the pre-existenrt. paths. An algorithm for path tracing is presented for the case of paths of some restricted shapes. In many cases, a situation will be reached in which paths belonging to one or more programs will form an obstacle such that some region of the network becomes isolated. In this report, attention is given to the problem of path:interference as related to the question of generation of barriers, An algori+thm is also given for the detection of barriers formed by paths of any shape. 7

2.1.2 General Outline of the Path-Building Problem A network of modules arranged in an n x n grid is given. The modules are all alike and contain a switching network (Figo 1) and four registers that can be connected to any of the four outputs. Figure 2 indicates these for one of the registers. The problem consists of successively connecting pairs of modules leaving intact a fixed number of the previously established connections for possible future use. This is referred to as path building. The end modules, i.e., the pair connected, are the originators and receptors of information and are called the terminal modules. The intermediate modules in the path serve only as connections and these are called the connecting modules. See Figo 3o The connecting modules can be in any of the nine states shown in Fig. 4 (a through i). A terminal module of one path can be at the same time a connecting module for another path, as shown in Fig. 5. Here the shaded squares indicate different states of the connecting and terminal modules. A preexistent path may be crossed by a new one, and in general, all the possible connections which the internal structure of the modules will allow can be made as long as the independence of the paths is maintainedo Therefore, any connecting module can be part of as many as two different paths, and any terminal module can be associated with as many as four different paths, as the connection diagrams of Fig. 5 show. In this and subsequent diagrams, the modules are shown contiguous to each other, but this is simply an illustrational convenience not implying any other connections between the individual modules. The paths are composed of segments, these being defined as the connecting lines between adjacent moduleso However, since the internal configuration of the modules is not shown on the diagrams the connecting segments are con8

Fig. 1. Internal Fig. 2. Internal switching network. registers. Fig 3 Terminal and connecting modules Fig~ 3o Terminal and connecting modules. 9

a )j~ i i_ ii,Ii) -J LL - -J:F""~t i _:.. - I F' _ -I L_ -- I -I Fig. 5. Modules belonging to several paths. 10

sidered to extend from center to center of the modules. As the number of pre-existing paths increases, it becomes more and more difficult to find a path connecting two given modules. The difficulty is caused by the accumulation of "obstacles" of various types. In some cases, these obstacles (previous paths) combine in such a way that a region of tne network becomes isolated from the rest of the modules In other cases, a single path of special shape suffices to isolate a region. Whenever this happens, we call the isolating line a barrier. Thus, a barrier is defined as a path or combination of path segments such that because of its particular shape and neighborhood relations it divides the whole network into two disconnected regions. In other words, it becomes impossible to connect two modules lying on opposite sides of the barrier, However, it is not easy to show the barrier as a physical entity because it entails geometrical shape as well as a positional relationship of the path or paths. In general, if a path or a combination of path segments divide the network into two disconnected regions, one can visualize the barrier as the set of modules through which the segments constituting the path run. A barrier may be generated by: (a) A single path: See Fig. 6 a, b, c, d. (b) Several paths: Any number of paths may contribute segments to form a barrier, as in Fig. 7. Paths a and b form a barrier between B and C; paths c, d, and e form a barrier between A and Cc Since a barrier is a function of the shape of the paths and of some neighborhood relation, it becomes imperative to find a mapping such that when both conditions defining a barrier are met, then some easily detected geometrical property is simultaneously satisfied. This leads to t he definition of a "dual" of a path, as explained in Section 2.1.4. 11

Fig. 6. Barriers formed by single paths...I.... -: -..,,.._..._L. I I' __ _l_=,_ ___ __ __ a________l__ _..........A..... -e -. - __. - i i. i.-. Fig. 7. Multiple path barriers. 12

Once this easy identification of barriers is obtained, another procedure is needed to ascertain whether two given modules lie in the same region or in disconnected regions with respect to every barrier detected up to then. If the two modules belong to disconnected regions there exists no path connecting them, and one has to resort to erasing some of the existent paths. This is done even if the allowable fixed number of paths has not yet been reached, It must be remembered that here we are concerned with a path-finding procedure that must be repeated in its entirety for every path that is to be created. Consequently, some restriction on the types of paths to be considered is almost mandatory in order to avoid resorting to maze-solving techniques. Maze-solving techniques are essentially sequential algorithms and, as such, are much too slow for this application. 2.1.3 Definitions and Nomenclature The purpose of this section is to present definitions and to establish uniquely the meaning of labels and names used in this report. The need for this rigorous defining of commonly used words is evident when, for example, several neighborhood relations with subtle differences have to be distinguished. Although the language is rich enough to permit this fine gradation of shades in meaning, the everyday use of these words has assigned to them almost synonymous implications~ In the next sections, the words defined below will be used exclusively within the meaning here indicated and, in many cases, a word of attention will be included to prevent misinterpretations 2.1.3.1 Definitions: Neighborhood Relations Contiguous: Meeting or touching on one side. Adjoining: Meeting or touching at least at some point. This concept includes that of contiguity. 13

Neighbor: Satisfying some empirical rule established as the "neighborhood relation," not necessarily implying contiguity. As an example, the shaded modules of Fig. 8a and b are adjoining, but only the ones in Fig. 8b are simultaneously contiguous. This apparently excessive detailing is useful in cases like that of Fig. 8c: If we refer to the modules adjoining A, we refer to B, C, D, and E. But if we refer to the modules contiguous to A, only B and C qualify as such. The neighborhood relation can assume any form and is not restricted to the common meaning of "immediate" neighbors. For example, the shaded modules of Fig. 9a and b represent the neighbors of module A under the conditions of the following rules: For 9a: Those modules having one side in common with A. For 9b: Those modules within a "Manhattan" distance of 3 from Ao 2.1.3.2 Nomenclature Non-regressive path: Any path traced in such a way that no two consecutive turns are in the same direction. Figure IOc and 10d. Regressive path: Any path with two or more consecutive turns in the same direction. Figure 11. Normal path: A distinguished set of the class of non-regressive paths characterized by having only one turn. Figure O1d. Barrier: A collection of paths dividing the network of modules into two disconnected regions. 2.1.3.3 Labels Labels of modules: Modules are labeled with a pair of coordinates, as indicated in Fig. 12. The first term of the pair indicates the row, and the second the column to which the module belongs. Labels of vertices: The labels of vertices are derived from the ones of the corresponding modules according to the rule indicated in Fig. 13. The resulting ccordinates are shown in Figo 14. 14

I~ r 77-j1I-IL —EIII —-I EB.__'F. f _ __. Fig. 8. Adjoining and contiguous modules. A-A A _ ^i _ S La A.L b- G o d~___ J Fig. 9. Neighboring modules. 15

IF1 r" I L4 r "l1 _:I t t I,:...... f I i -..r" -i............-...... L....J.~,i'ZIl r T -i...... Fig. 10. Norma an- non-r — egressive paths ii I I t! I i I [ i 1. _ I J_ I. I i I _ i1 frr.V:^..I..1.t. I;..........l i I-.... rI' — — l ^ -—. - -._-_ —1-.;.. -. i..." ~ t l Fig. 11 Regressive paths. 16 ~~~i~~~~IUnI)r~-ll ny l~ M I~~m( ~-~~CI161 1~~

(01; 01) (01;02) --- (01; n) (02;0. ) (02;2) --- (02; n) (m;01) ( mn02).A ( m: n) Figo 12 Labeling of modules. (i-l;j-l) (1-1; 9) ( i; 3) ( i';,-l) ( 1; J) Figo 13. Generating method for vertex coordinates, 17

(o00o,) ^^(0,0ol)^^(00,02) - ~w(0 }n) tS~~ iIf I i i i I(01100) -0.0-(01 01)! —(011026 ( moo) —--- mOl) —— ( m,02)- ( m Fig. i4. Labeling of vertices. 18

2.1.4 Detection of Barriers and Isolated Regions As explained in Section 2.1.2 a physical barrier is difficult to define because it entails the shape as well as a neighborhood relation between the paths contributing to the formation of the barrier. This difficulty is also present in the identification of barriers by means of an algorithm, because several tests have to be applied in sequence to ascertain whether a particular combination of path shapes and geometrical disposition constitutes a barriero Therefore, it seems logical to resort to some kind of mapping technique, such that when applied to the original configuration of paths it produces an image in which the desired property is easily identified. At this point, we shall introduce some necessary definitions: Let us define the dual segment as the common side of a pair of contiguous modules connected by a segment of a path. Note that there is a one-to-one correspondence between dual segments and path segments associated with a particular path. Let the set of dual segments corresponding to a path be known as the dual of that path. The path lists contain the labels of the modules through which the path is traced, and when the path is extended the list is updated by adding the label of the new module or modules. Simultaneously with the building of the path and the path list, the labels of the dual segments corresponding to the paths are ordered in lists. If the dual remains as a connected line then a single list is maintained, but when a disconnected dual segment is generated, it starts a new listo It can happen that while generating a new path, all the corresponding dual segments generate independent new lists. Further, if a dual segment Si belonging to a dual list D1 is connected to some dual segment S2 belonging to a dual list D2 originally created by some oiher path, then the contents of the dual list containing Si are added to the list con19

taining S2o At any stage, a dual list contains only a connected sequence of dual segments, perhaps contributed by several paths. Therefore, there is no correspondence between the path lists. The paths simply generate the dual. segments and these, depending on their resulting connections, can add to the dual of their own path, generate other dual lists, or join some pre-existent dual listso Consequently, each dual list contains the coordinates of dual segments such that they have one coordinate in common, identifying them as belonging to a connected treeo Lemma 1: A list of dual segments containing either a closed sequence of coordinates or the coordinates of two points belonging to the border of the network defines a line which divides the grid into two disconnected regions. This line is called a barriers Proof: Since each dual segment arises from a corresponding path segment connecting two contiguous modules, clearly no new path can connect them. Hence, no new path can cross the dual segment. Therefore any continuous sequence of dual segments cannot be crossed anywhere by a new path. So if both ends belong to some border or close on themselves, a disconnected region is generatedo Hence, by definitiona barrier exists. Corollary 1: Discontinuity of the set of dual segments arising from a single path indicates the possibility of crossing the path by a new path at every point of discontinuity. Example: It is desired to determine if points A, B and C can be connected two at a time, given that the four paths indicated in Fig. 15 are pre-existent. First, the duals of the paths can be generated according to the definition of dual. In Figo 16, the four duals corresponding to the paths are shown distinctly to emphasize their origin. However, we are really interested in 20

';i g. l.l L Js i "' _ _. -. t a Ii ______ ______....,.,__ _.___._ _L_ ______. _. ________; h- - I L ___ I1111 __________ _____a______ _ Fig. 15. Pre-existent paths. i 7 rTIT^ + Ctlll / I I W Fig. 16. Duals of the pre-existent paths. 21

connected sequences of dual segments independently of whether or not they belong to the same pathO Therefore, at this stage, there should be nine dual lists as can be verified from Fig. 16 by counting the number of connected dual lines. Of all these nine lines, only one satisfies the requirements of Lemma 1; and this line is shown in Fig. 17. Because it is a continuous line formed by dual segments and with the ends on the border of the network, it is defined as a barrier and thus divides the network into two disconnected regions. As a result, it can be stated that module C is isolated from A and B, but A and B can be connected If now all the duals are shown again, as in Fig. 18, it is easy to see how to trace the connection ABo This suggests the possibility of solving the problem of connecting two points through the existent paths by treating it as a maze problem in which the walls are represented by the duals instead of by the actual paths. While this is perfectly possible when only a few of these connections are needed, it is completely impractical in this case when the procedure is to be repetitive, and one has to create a new path as soon as a connection has been made and some information has been transmitted between the terminal modules. Thus, a much simpler and faster, if less general, procedure is needed. 2.1.4.1 Programming the Detection of Barriers While the number of paths is increasing, the lists of paths and of dual segments are constantly being kept up to date, and thus they increase both in length and number. A new kind of list denoted as a "barrier list" is generated from the list of duals whenever either of the following cases occur: (a) When one entry happens to have each of the two coordinates common to one coordinate of two previous entries, identifying it as a link closing a loop; and (b) When the list contains two coordinates belonging to the borders of the networko 22

_;w.jt!r. I #~j... - I Ia I.r- i~ t ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~t; w..._........_..... Fig. 17. Continuous barrier. ~......I....~~W~~L-ri-; I _ _ I I _ I r —-t —ii —-l - i -2 —t 5,I —. 1'< i J i ~~~~~~~~~~~~~~~~~~~~ii t I i i.... Fig, 18. Tracing path A-B. 23

The continuous updating of the three kinds of lists suffices to tell if a path is possible between any two given modules at any moment. It is evident that if no barriers have as yet been completed and if no restrictions are placed upon the path as to the number of corners or its permissible length, then the connection is always possible. But even if one or more barriers are already present, it is still possible that the two modules to be connected lie in the same region with respect to the barrier or barriers. Two solutions for this problem are suggested: (a) For each new barrier, two tables would be generated containing the labels of all modules belonging to each of the two disconnected regions. Any two new modules to be connected would be checked to see if their labels are contained in a single table, in which case both lie in the same region with respect to the barrier inducing the partition under consideration. Obviously, this method implies the shifting of enormous amounts of data (the labels of all modules), and the whole procedure has to be repeated for every new barrier r Furthermore, the amount of data to be treated remains appreciably constant and does not diminish as could be expected, since the modules belonging to other previous barriers can now act again as originators of new paths. (b) A normal path connecting the two modules is built on an assumed blank network, and its dual list is generated. Then Lemma 2 is applied o Lemma 2: Given two modules, if the dual of any of the two normal paths connecting them contains an. odd number of segments in common with the barrier proper, then the modules lie in discornected regions and a path is not possible. 24

In Fig. 19, the black segments indicate the dual of the normal path connecting A with B. The green line is the barrier created by the red path. In Fig. 20a, the dual of the normal path (black) and the barrier (green) have one common segment, but Lemma 2 specifies with the barrier proper, so 20b is the appropriate representation of this case. Here, C, A and B are found to be in the same region because an even number, namely two, of segments are common to both the dual of the normal path and the barrier proper. If the other normal path is traced it is found that it too has an even, namely zero, number of segments in common with the barrier proper. 2.1.5 Final Considerations for Path Building 2.1.5.1 Restriction on the Class of Paths Admissible It has been mentioned in Section 2.1.2 that the path-building procedure has to be repeated for every instruction to be executed, and therefore it is imperative to employ a very simple and fast algorithm for path tracing. Fuhrthermore, since some of the paths may be of considerable length, it will be helpful if the algorithm is amenable to parallel processing. If one tries to resort to maze-solving techniques, it is found that these use a method of cell classification, assigning relative weights according to some neighborhood relation and in a monotonically varying sequence. Therefore, these methods are intrinsically sequential since the weight of a cell cannot be determined until the weight of its immediately preceding cell. is specified. In order to simplify the problem, we restrict the types of admissible paths to what is referred to as "non-regressive paths." A non-regressive path is defined as one traced following a set of priorities on the vertical and horizontal directions. The set of priorities establishes which of t.he 25

trt- - -- i - -''' 1| 1 t d - | ^ 1.~^, / KI- ___'... - _. _ L t _ -- -* ___ ___L-_ _ iI T + I i I i? 1 Fig. 19. Dual and barrier. I ~~I l r-l -e X r..~I i [ I / J - i- - t [ ~ t _= Fig. 20. Dual and barrier proper.

two senses are to be followed when tracing the segments in each of the horizontal and vertical directions. This means that once the vertical and horizontal priorities are established, for example vertical down and horizontal to the right, all the path segments have to be traced in either of the two specified sensesexclusively. It is to be noted that this restriction on the allowable class of paths has the advantage of eliminating the need for an algorithm capable of tracing a minimum length path, since all non-regressive paths have the same length when measured in terms of the number of segments needed to connect the modules. This method of measuring distance has been called "Manhattan distance." And if each module is assigned a pair of coordinates, then this distance can be assimilated to the "Hamming distance." It could be thought that this restriction on the types of paths would severely limit the possibility of connecting two modules through a set of obstacles, but it is easy'to show that this is not the case even for networks of small size. For an m x n network, the number of paths (non-regressive) connecting two opposite corners is: /n+n-2 +n-2) noo of n-r paths \n-i/ (m-l)! (n-l): For a minimal iterative circuit computer, with a 40 x 40 network, *the number of non-regressive paths is ( > =~~ ~ ~46 (x280 x 10 m-1/ 9 \399 (2.03 x 1046) Even for the impractical case of a 10 x 10 network, the number of paths is still very high: N -. (. = 8: = 48,620 \ 9 9 9 9! 27

Within the class of non-regressive paths we distinguish the "normal paths" and the "broken paths " illustrated as a, b, and c respectively in Fig. 10o 2.1.5.2 Use of Redundant Paths Since the number of available paths is so large, it is necessary to define some priorities as to the type of path most convenient to try to build first. Evidently the normal paths employ a very simple algorithm and therefore seem to be the natural choice for a first try. Furthermore, the length of all non-regressive paths, including the normal ones, is the same, and therefore. there is no particular advantage with respect to propagation time through the path. But there is an advantage in time during the path-building phase, since only one "turn" instruction is needed in the case of the normal pathso Since it is very probable that it may be possible to trace more than one path, one could very well use the extra paths to add redundancy to the 4;ransmission of information between the modules. If only one path is found, then it is used for the transmission with no checks. If two paths are available, the information received from them at the receiving module is checked for agreement and it is then used or discarded, stopping the program. If three or more paths happen to be available, a "majority vote" can be taken at the receiving end to determine the correct information. The "majority vote" technique when used with three paths affords a high degree of reliability since the condition for acceptance of incorrect information is the occurrence of the same type of error at the same time in two of the three channels. All the modules in'the path except the end ones act only as transmission elements and therefore perform no logical function. It is safe then to assume that the only type of error that can be introduced is the failure of trans28

mission, as opposed to the dropping of bits or the generation of erroneous "ones" filling the spaces of some original "zeros." Under these conditions, the bounds for cumulative errors as treated in Reference 4 do not apply. The simpler rules that follow give an estimate of the increase in reliability that can be expected from a transmission line composed of m parallel paths, Let's denote by Ri the reliability of the individual module, and by RB the reliability of the total patho Similarly, Fi = 1-Ri will indicate the individual "failability" of the modules, It is necessary here to remark that reliability is understood as the probability that the element will perform correctly during a certain time interval; in this case, during the time it takes to execute the transmission phase. That is, a reliability of 0.95 indicates that the element functions correctly 95 out of 100 times that it is pressed into service. It does not indicate that the element performs correctly during 95% of the transmission phase in any one attempt since in this case only a very few special patterns would be transmitted with no errors. In order to calculate the reliability of systems with elements having individual reliabilities Ri, one proceeds in the following way: The total reliability R of a series of n elements with individual Ri's is the product of these R's. n Rp - AR i In this case, all the modules are physically alike, so that barring different environmental conditions all are supposed to possess the same Ri. Ro Also, in this case, the toctal reliability of a path consisting of n elements in series is simply: Rp = Rn See Fig. 21. 29

Figo 21. Series connection of n modules. When m such paths of individual. reliabilities Rp each consisting of the same number n of elements are connected in parallel with no intermediate interconnections, as in Fig. 22, then the total "failability" is FT = FM, and therefore: RT -FT = - - -- -(1- Rp)m = - (1-Rn)m p 0 —._ - — _-_ —- -0 -- m paths Figo 22. m parallel paths of n modules each. It is evident from the previous equality that the number m of parallel. paths has a greater influence than the number n of elements in series in a path. This is especially true for values of R approaching unity and means that even for a long path there exists a possibility of compensating the effect of the large number of elements in series with a few parallel paths. For example: For R = 0.95, n = 10, m 3: For a single path: Rp (0o95) 10= 0.5987 For 3 paths in parallel: Rr 1- ( - R)m = 0.9345 It can be seen that the total reliability has not yet reached the original reliability of the individual module. 30

For R = 0.99, n = 10, m = 3: For a single path: Rp = (0.99)10 = 0.9045 For 3 paths in parallel: RT = 1 - (1 - 0.9045)3 RT = 1 - (0.0955)3 RT = 1 - 0.00087 = 0.99913 In this case, the total reliability of the network is greater than the reliability of the individual module. 2.1.5.3 Extra Requirements Introduced by the Redundant Mode of Operation It is to be noted that this feature of built-in redundancy is obtained with almost no penalty in time or equipment. Time is not lost since all m paths can be traced at the same time and all are of the same length, and thus the procedures terminate simultaneously. An extra requirement is the necessity of having extra hardware in the modules to implement the majority vote. This simply means that for a fairsized machine there is a slight reduction in the number of modules available for the rest of the program. 2.1.5.4 Assignment of Priorities Given two modules to be connected we will consider the one with the lowest row coordinate as the starting point. Informally, we can say that the "uppermost" module is the starting point. The priority is established as a sequence of two directions, and is indicated as the pair consisting of V (for vertical) and H (for horizontal) in one of the two orders. Thus, the pair (V,H) indicates that the vertical direction is followed as long as it is possible whether or not a change to horizontal direction is possible. Only when an obstacle is met does the change to horizontal direction take place. But still the vertical direction has latent priority, 31

and consequently even if the horizontal direction is clear the vertical one is resumed as soon as it becomes possible to do soo As the starting point has been defined as the uppermost module, the vertical direction needs no further qualification since it can only be in the "down" sense. The qualification for the horizontal direction is automatically given by the relative position column-wise of *the "lower" module relative to the "upper" module. Given two modules, the two combinations of (V,H) and (H,V) can, and generally do, give rise to completely different paths. In Fig. 23 the two different combinations of priorities are illustrated. In 23a the priority is (V,H) and the starting direction is vertical. Notice that when the path is proceeding in the horizontal direction, it resumes the vertical direction as soon as this becomes possible. This is notwithstanding that there are two or more modules available in the horizontal direction. In some cases, the path starts from the uppermost module to be connected, following the low priority direction and thereby apparently violating the rules of direction precedence. What actually happens, as in Fig. 24, is that the immediate neighbor of the starting module in the direction specified by the priority is either an obstacle or is non-available because it lies in the shadow of some other obstac.leo Therefore, as the high priority direction finds no available modules through which to trace the path, the secondary priority is followed, but only as long as is necessary to find an available module in the high priority direction. In Fig. 24a the obstacles are such that even with a (H,V) priority the resultant path has all'the features of one traced according to a (V,H) priority. The same happens for a (V,H) priority, as seen in Figo 24b. 32

it ~ I 11~ 1I K\> I L a) Priority: (V H) b) Priority: (H V) Fig. 23. The two cases of priorities. ^ 2ii._. 5, _ c;~l -h gt I-:! n 17 i I I r) r I ) Y d i i -vuh9I t t_ _ L jIJ___ a) (H V) porority b) (V H) priority Fig. 24. Two cases starting with the low priority direction. 33

2.1.5.5 Path-Building Procedure The path-building procedure involves two different steps: (i) Eliminating zones of modules non-acceptable as path components, and (ii) Tracing procedure. (i) The elimination of module zones non-acceptable as possible path components is carried on by a "shadowing" technique. This procedure takes into account the priority pair and indicates which zones are forbidden for path penetration. This implies that if the path is allowed to enter one of these zones, then it becomes necessary to trace back part of the path in a direction opposite to one of the directions specified in the priority, thereby producing a regressive path which is not admissible. The shadowing method operates in the following way: (a) Determine the starting point and choose a priority pairo (b) From the priority pair and the relative position of the two modules, determine the sequence of directions in which the path has to progress. (c) From this sequence of directions, determine the two sides of the network which will serve to determine the starting obstacles for the shadowing procedure, If the directions are vertical down and horizontal left, then they can be indicated by a pair of arrows, like aJt From here we deduce that the right-hand border and the lower border are the starting places for the shadowing procedure. (d) For every obstacl.e contiguous to the lower side, determi ne its highest point and project a horizontal l.:ine to the right, until it intersects the right-hand s:ide of the network, even if'this means running over some obstacle~ The set of modules limited by this.i.n.e, pls the obstacle and the two s.des of the network constitute 34

the "shadow" of the obstacle, which is considered a zone forbidden to the path-tracing procedure. (e) Repeat the same procedure of (d) for obstacles attached to the right-hand side, projecting the shadow vertically from the leftmost point until it intersects the lower side of the network. (f) Repeat the horizontal and vertical shadowing procedures as described in (d) and (e) for every obstacle now adjoining any of the shadowed zones. Adjoining means having at least one vertex in common. (ii) When no more shadowing is possible because the rest of the obstacles are detached from the shadowed zones, the path is traced following the assigned priorities and considering both the obstacles and shadowed zones as obstacles, In Fig. 25 the preceding method has been applied to the problem of connecting modules A and B through the set of obstacles indicated by C through S. The elimination and tracing procedures are explained below, with reference to steps (a) through (f): (a) The starting module is determined by the lowest row coordinate. A priority pair is set arbitrarily: (V,H). Complete specification: (VsH) (b) Module B is to the left of A, therefore the sequence of directions is vertical down and horizontal left. (c) The sequence of directions can be represented as: <J. Therefore the right-hand side and the lower borders are the starting places for the shadowing procedure. (d) In Fig. 25, obstacles R and S are contiguous to the lower side. Thus their high points project shadows to the right indicated by (1) in Fig. 26, extending to the right side of the network. 35

:-L~P T -' - E; Fig. 25, Maze V - -.-.-. ~1i problem. -_- 4 - - -.. _'e I.i~' i i t. _ ___ t tt I- I --- j I- L —'* —- t - I - t r — ----- __ _ oiluinwt.. p. -—. —-a.... g. 6-....is!'...' —.'... I I i -- -- i -9 _ _..- nU.. _f ^:.. t__ -....' _ ~...J.^' I................'. ~ —~ — ~-.~ ~..~t~ ~r,. —-.-~ ~ | pig 2......First iI I I I I T | I,. priority ViH. 11 t -t ^r"'""' "~ -- l-. -:.... 1 -. -~ ~-~ ~ —r-^-i- - l —- 86 - ^ f i 2'tX aW _:5v,' F == x= ==. 36outinwt

(e) Obstacle J is contiguous to the right border, and therefore it projects shadow 2? vertically extending to the lower border. Here it is not necessary to cover up region 1 in the shadow of S because it is already eliminated as a possible region for path tracing. (f) Obstacle N is now contiguous to a shaded zone, and so it projects a horizontal shadow 3 and a vertical shadow (3'). Q is now contiguous to a shaded zone. It projects shadow 4 and 4'. The shadowing possibilities are now exhausted. The limitation to non-regressive paths makes the route pattern very sensitive to small changes in the shape of obstacles. If all types of paths were admissible, a small change in the shape of one of the obstacles would probably imply only a small detour of the original path, but with non-regressive paths not only the shape but the adjacency of other obstacles to the modified shadow intervenes to modify radically the shape of the path. This means that a small variation in the configuration of one obstacle has a local influence plus a "long distance" effect according to whether the new shadow profile becomes contiguous or ceases to be contiguous to some other obstacle. In order to illustrate the wide variation induced in the path route by minor changes in the shape of the obstacles, the same basic pattern of Fig. 25 has been used in Fig. 27 with the addition of a one-module obstacle G' The priority is still (V,H), but the exit from A is impossible in the vertical direction, and consequently the path is started in the direction of secondary priority, H. If now a second one-module obstacle is added, like I' in Fig. 28, the path again is modified substantially; this time by the inclusion of obstacles L and 0 and their shadows in the forbidden zone. Figure 28. Here, the new 37

|I I e'I t[r~w I I - i i -........- X- - "/i Fig. 27. Change in path route induced by the inclusion of G' 38

Fh -IT- -I - - -- ll-!!16j J L. _... — j:P __ ________l_____ Fig. 28 Change in path route induced by the inclusion of I V 39 39

additional shadow projected by I' is enough to cause obstacle L to become contiguous, which in turn brings 0 into the forbidden zone. The new path, as a result, is very different from the one in the original problem (Fig, 26), 2.1.6 Flow Diagrams for the Implementation of the Barrier Detection and Path-Tracing Procedures 2.1.6.1 Adapting the Procedures for Computer Solutions The implementation of the barrier detection procedure does not present any difficulty. On the contrary, the implementation of the path-tracing procedure, as explained in Section 2.1.5, requires the use of pattern recognition techniques since it depends on the determination of such features as the leftmost and uppermost corner of an irregular pattern of modules consituting an obstacle. In order to circumvent this difficulty, the obstacles are not treated as patterns of modules, but each individual module in the pattern is treated as a "unit obstacle." Therefore the obstacles referred to in previous sections are now conglomerates of "unit obstacles." These unit obstacles are also now treated individually as independent contiguous obstacles. The procedure is now applicable in the same way as before, since the contiguity of the unit obstacles assures the final treatment of the whole pattern by the shadowing method. 2.1.6.2 Flow Diagram for the Detection of Barriers Figure 29 shows the flow diagram for the detection of barriers. It starts with the given coordinates of the two modules to be connected. Then the existence of any barrier is checked by reference to the barrier.listo If no barriers are present, the whole algorithm is skipped and the path-tracing algorithm is entered directly. If there is a barrier, then. the normal path connecting the two modules is traced, and the number of common segments be40

A (xl;yl) Coordinates of the B - - -~ -~ ~ modules to be B (x2;y ) connected. Are there any barriers? No Yes Trace normal Trace the normal path AB. path between A and B. (Any of the two) Determine number of common segments. Even Odd Path is Path is ble ipossible. mpossible. Erase some paths. To path-tracing algorithm. Fig. 29. Flow diagram for the detection of barriers. 41

tween the normal path and the barrier proper is determined. If there is an even number of common segments, then the two modules are in separate regions and there is no possibility of a path. If there is an even number including zero, of common segments then the two modules lie in the same region with respect to the barrier and a path is still possible. Therefore, the pathtracing algorithm is entered. 2.1.6.3 Flow Diagram for the Path-Tracing Algorithm Figure 30 shows the flow diagram for the path-tracing algorithm. It starts with the determination of the sequence of directions. This is derived from the arbitrarily assigned priority and a comparison of the relative addresses of the two modules to be connected. Next,the baseline is determined. In the initial step, the baseline will be constituted by two sides of the original network, but later in the iteration it will be formed by the horizontal and vertical projection of the uppermost and leftmost (or rightmost) corner in the obstacle being considered. Any module adjoining (having at least one corner in common) the baseline is considered the new neighbor. and its coordinates are labeled (xo,Yo). Then all modules whose x,y coordinates are greater than xo and yo respectively, are eliminated. After the elimination or "shadowing" of these modules, a new baseline is determined and the procedure is repeated. If there are no new neighboring modules, the other half of the baseline is considered and a similar procedure is followed. If there are no neighbors to any of the two halves of the baseline, then the shadow procedure is terminated and the path can be traced. 42

----- --— Fixed, arbitrary. Determine priority Determine sequences By comparing of directions (xl,yi) and (X2,Y2) Determine baseline, horizontal Find any new neighbors; (xo,yo) No Yes Determine baseline, Eliminate all modules vertical such that x > xo and F I~, y > V_ Find any new neighbors; (xo,Yo) No Yes Trace the path. Eliminate all modules such that Trace the path. Start at A. x > Xo and y > yo Follow first priority Until hitting a shadowed zone, Follow second or reaching B. or reaching B. priority, for only ~ one module Can the first priority be now followed? Yes No Fig. 30O. Flow diagram for path tracing. 43

2.2 TRANSLATION ALGORITHMS 2.2.1 Maximal Decomposition of Algorithms The fundamentals which govern the following discussion are from the general area of mathematical logic and, in particular, from metamathematics. Markov defines a formal mathematical entity as the "normal algorithm." The metamathematical properties of the class of normal algorithms have been shown to be equivalent to other computational structures by Detlovs.22 It is interesting to note there exists a universal algorithm with properties similar to a universal Turing machine. Also, any number whose representation is computable by a normal algorithm is computable by a Turing machine and vice versa. Further, the unsolvability of the halting problem for normal algorithms can be proven using the same basic techniques as used for Turing machines. The results of Markov add to the intuitive belief of Churches thesis9 but do not seem directly applicable to the analysis of programming algorithms. The normal algorithm is a finite sequence of symbols which define a unique mapping from an input string of symbols into an output string of symbols. The number of distinct symbols in all three classes must be finite. Few programmers consider computation, input and output in this limited sense. For example, variables in algorithms are considered as representing real numbers rather than truncated approximations. The significance is that a formal procedure for the analysis of the most general forms of algorithms is desired. Yet, it has been proven that no procedure 20 21 23 25 can be developed to analyze all possible normal algorithms. 2,,25 It is worthwhile to emphasize the difference between these classes of algorithms. There 44

are a countable number of normal algorithms. This follows from the fact that each normal algorithm can be represented by a finite number of symbols from a finite alphabet. The set of algorithms in its most general form has cardinality aleph null, the order of the real numbers. This follows from consideration of the set of algorithms: Input value for the real variable, X, add X to a real constant, output the real varialbe, Y, which is the sum. By letting the constant be each real number, a set with cardinality at least aleph null is generated. Since there can be at most a countable number of variations in the computation of such an algorithm, the set of all algorithms of the most general form has cardinality at most aleph null. By observing the methods for proving undecidability results about algorithms,'5 two key factors become apparent. First, any formal procedure for analysis of algorithms must prevent self-reference in order to guarantee termination in a finite number of steps, i.e., eliminate the possibility of the halting problem. Second, the presentation of the algorithm must be interpreted as a sequence of subalgorithms where the lowest level of subalgorithm analyzed is a primitive operation on variables or constante, i.e., addition of two real numbers is considered as a primitive operation. 2.2.1.1 Step 1. Choice of Primitives Choose a set of primitive operators. This set must be sufficient to express the algorithm of interest. The algorithm must be explicitly stated in terms of the primitive operators plus connecties and control information which designate flow of computation in the algorithm.

Remark 1 Implicit in the word algorithm, as used here, is a specific beginning and termination as well as a unique sequence of operations for any given data set. It is assumed that at least one set of data allows the algorithm to terminate after a finite number of computations. The decomposition proceeds by replacing the conditional branch instructions which are a function of data by definite functions which represent a reasonable data set. Although no specific set of data on which the algorithm might operate is required, a reasonable knowledge of such data sets is assumed. 2.2.1.2 Step 2. Branch Assignment To each conditional branch statement in the algorithm, assign a function whose single argument is the number of times the conditional branch has been reached. The range of this function is a set of places in the algorithm from whence the computation may proceed. The operations performed in computing this function are not considered as operations of the algorithm itself. When following a sequence of operations through the algorithm at a branch: (i) the operations to evaluate a conditional branch variable in the algorithm are considered in the sequence, (ii) the next operation of the sequence is determined by the function assigned to that branch, and (iii) the counter which indicates the number of times the branch has been reached is incremented by one. There will be some finite number each possible branch. Remark 2 The purpose of these functions is to obtain an explicit sequence of computation that is independent of data values required by the algorithm. The

functions need not model any particular data but should represent a first order approximation of a hypothetical average data set. The type of function that will usually suffice is defined as follows: The conditional branch in the algorithm is replaced by a function which specifies branch every kth time this function is evaluated and otherwise continues to the next step in the algorithm. 2.2.1.3 Step 3. Formation of Tree Form a tree of primitive operations from the algorithm starting with the terminal branches and proceeding toward the base as follows: Create a terminal node for each operation that can be performed on constants and/or input data. Replace each such operation and its associated arguments by a number sign in the algorithm. The number sign may appear as an argument for other operations or may be a result produced by the algorithm. On successive levels of the tree, create nodes for each remaining operation that can be performed on constants and/or data and/or the number sign. Repeat this construction until only number signs remain in the algorithm. Connect the remaining number signs to a single base node. The instance of the algorithm determined by the branch assignments is thus completely encoded as a tree structure of operators. Remark 3 For any algorithm Step 2 guarantees that Step 3 may be effectively carried out, i.e. the number of operations stated in the algorithm is finite. For each conditional branch there is a finite number of encounters before all possible branches have been taken. Thus, with a finite number of observations it can be 47

determined if an operation meets a condition of Step 3. If the algorithm can be expressed in some formal language, Step 3 can be performed on a computer. The description of such a computer program is presented in 2.2.3.: Machine Implementation via a Translation Algorithm. 2.24 extends the capability of 2.2.3 through an augmented language. 2.2.1.4 Step 4. Determine Computation Time and Degree of Concurrency Begin at the base of the tree resulting from Step 2 and label each node with ai where ai is the length of the longest path from the base to the ith node. The maximal length computation sequence, T, is max (ci). This is the number of execution cycles required on a highly parallel computer. To determine the degree of concurrency, apply the scheduling procedure of T. C. Hu, i.e., define Nj as the number of nodes labelled with T-j. The minimum number of processors required to complete execution in T steps is P. P = max(r) where k rk = NJ for k=l,...,T. k j=l Remark 4 The computation of Step 3 is trivial in the sense one need only count nodes, then sum, divide, and pick out the largest of a few numbers. Example The given algorithm: (Compute Co,C1,.i.,C2k which are the FeJer approximation 48

of the Fourier series X = CO + C1 sin e + C2 cos e + C3 sin 29 + C4 cos 29 +...+ C2k cos k9 where the n equally spaced values which would be supplied as data are xl,x2,..., xn. Note k and n would also be supplied as data.) n c - 7 n 7xi i=l n C2j-1 =2k+l-j 2 7 si 2 2k+l n i=l > = 1,2,..,k n i=l 2kl n Step 1. Choose the elementary operations: add, subtract, multiply, and divide, assign 1 unit of time for computation, with sin and cos assigned 7 units of time for computation. Step 2. Rather than use a specific integer for n and k, let the symbols n and k denote two fixed integers. 49

Step 3. Form the tree in sufficient detail. TIME 0 xl,x2,...n,n,k TIME 1 xlx2, x3+x, x5+6, *.. Xn-lXn TIME 2 +,.,, +/ TIME [log2n] \ TIME [log2n +l1 C - n TIME 1 2k, l- 2n, 2t in, ij for each j=0,1,... k,i=1,2,.., TIME 3' / \ sin( Cos( ) TIME TIME 1 \ xi TIME 12 TIM 1 1+log2n TIME l2+[log2n] C2j1 =. TIME 12+[1~ogrn](C2o^ ^ \ C2oj TIME 13+[iog2n] f nish Step 4. The minimal time for computation of all coefficients simultaneously is 13 + [log2n]. 5o0

To determine the minimal number of processors for a specific case, assume n = 200, k = 10. Also assume the sin and cos routines have the following computational graph: No. of Operations Time 0 x Time 1 o Time 2 o Time 3 /5 -3. o < / /I/~~~~~ 5 Time +4 7-T ~/t: o o Time 5 + / 9' *-11! ^ \~~ / ^~2 Time 6 + Time 7 + This graph computes x3 5 7 9 11 sin x =x + 5 7 + 11 3. 5. 7. 9. 11. For computation of the number of processors required a table of number of operations for each execution time is prepared: 51

rk Time 1 2002 2002 Time 2 2000 2000 Time 4 8000 8000 Time 5 16000 16000/1 16000 Maximum Time 6 20000 36000/2 = 18000 Time 7 12000 48000/3 = 16000 Time 8 8000 56000/4 = 14000 Time 9 4000 etc. Time 10 4000 Time 11 2100 Time 12 1050 Time 13 525 Time 14 252 Time 15 137 Time 16 74 Time 17 52 Time 18 31 Time 19 21 Total Sequential Operations Less Indexing 84, 244 *rk must account for the precedence relations where the computation is defined by a graph and not a tree. Each rk is the number of operations at the kth time until the computation structure becomes a tree. Thus, at most, 18000 processors could be used. The number of execution times would be 19 on a computer with this number of processors while 84,244 execution times would be required on a single processor computer. If the requirement is made that only one type operation may be performed during a given execution time, 52

then at most 6000 processors could be used and about 60 execution times would be required. 2.2,2 Analysis of Sixteen Numerical Computation Algorithms for Concurrency of Arithmetic and Control A brief description of each algorithm will be given. Significant degrees of concurrency, methods of obtaining concurrency, and degree of local or global control will be mentioned when applicable. A table with pertinent data summarized appears at the end of this part of the report. Algorithms are presented beginning with trivial computations so that complicated algorithms may be analyzed in terms of simpler ones. The following computations are examined: 1) Sum or product of N numbers. 2) Evaluation of nth degree polynomial. 3) Multiplication of a vector by a matrix. 4) Multiplication of 2 vectors. 5) Matrix inversion. 6) System of linear equations. 7) Ordinary differential equations. 8) Least square bit. 9) Computation of the nth prime number. 10) Search Procedure. 11) Sorting procedure. 12) Fourier expansion. 13) Evaluation of Fourier series. 14) Neutron diffusion equation. 15) Neutron transport equation. 16) Eigenvalues of a matrix. 2.2.2.1 Sum or Product of N Numbers The fundamental limitation is the combination of partial results. During the first step at most N/2 operations may be performed on distinct pairs of numbers. During the second step at most N/4 operations may be performed on the N/2 previous results. The significant form of the computation is a tree 53

xl x x4 *..xn x n In this form it is easily seen that at most N/2 operations could be performed concurrently and the computation time using maximum concurrency is at most [log2N]. The brackets [] denote the least integer equal or greater than the enclosed quantity. The number of sequential computations, N-l, is the same as the number of computations when using concurrency. 2.2.2.2 Evaluate Nth Degree Polynomial This computation has the form of two superimposed trees a al x a2 an x..x \v \v \./ /\ / / The minimum concurrency is formed by eliminating redundant computation from this tree and applying the scheduling algorithm of T. C. Hu.8 As might be expected, the concurrent time is twice that of a single tree, 2[1og2N]. By removing redundant computation of xk terms at most N simultaneous operations are needed to compute 54

n k Z ak x k=O in the minimum Qoncurrent time. The optimal sequential computation, ao + (al +... (aN-2 + (aN-l + aNx)x)...)x, requires 2N sequential times. 2.2.2.3 MultiplyVector by Matrix An N element vector is multiplied by an N x N matrix to produce another N element vector. During the first step, each element of the vector is multiplied by each element of the corresponding column, i.e., N multiplications. During the next [log2N] steps the addition of the row products is completed as per 2.2.2.1 above. The sequential time consists of N2 multiplications followed by N2 additions. 2.2.2.4 Multiply Two Matrices During the first step all possible products are formed between the elements of two N x N matrices. Then N applications of 2.2.2.1 during the next [log2N] steps complete the computation of the product matrix, n Cij = Aik Xk k=l The sequential time is N multiplications plus N additions. 22.2.2.5 Matrix Inversion The computation of the inverse of anN x N matrix is given in detail in the 1963 SJCC paper by Squire and Palais. This computation is in the local con55

trol category because various operations are performed on different rows of the matrix simultaneously. 2.2.2.6 Solving System of Linear Equations Given N equations in N unknowns with K different constant vectors, the K sets of solutions can be found in 3N steps by performing the same operations as in 2.2.2.5 on the augmented N x N + K matrix. Additional concurrency needed is 2NK and the additional sequential time is 5N2K over and above that in 2.2.2.5. 2.2.2.7 Solving Ordinary Differential Equations Given a system of N simultaneous second order ordinary differential equations, solve for K intervals by the fourth order Runge-Kutta method. The sequential time of 12N K was computed on the assumption that 17 operations would be required by the forcing functions. The concurrent time of 4(4 + [log2N])K and maximum concurrency N2/2 were based on the same assumption. The algorithm may be found in most introductory texts. There is no gain by overlapping the computation for each of the K intervals but there is a sizable reduction of computation time within each interval. The majority of the computation is four applications of 2.2.2.3 to obtain the required derivations. The classification as global is marginal up to a system of 20 equations were concurrent time is about 1/100 sequential time. 2.2.2.8 Least Square Bit Given N data sets with K factors in each, find K coefficients for a linear polynomial that is the least square predictor of the data. The computation consists of constructing a matrix of sums of squares and cross products, i.e., NK2

2 multiplications and NK additions to form a K x K matrix. This requires 2N concurrent time steps with K operations each step. Once formed the matrix is inverted, requiring 3K concurrent steps as in 2.2.2.5. Then a final application of 2.2.2.3 yields the desired K coefficients in [log2K] + 1 steps. Since [log2K] + 1 is small compared to 3K, it does not appear in the summary. For N at least five times K the procedure may be classified as global. 2.2.2.9 Compute the Nth Prime Increasing numbers of divisions are performed each step. During the Kth step, the k-l primes are divided into the integers greater than the K-l prime. A non-zero remainder test, made simultaneously, exhibits the value of the next prime. Storage requirements grow too rapidly for this to be of practical use but it is of interest that the method yields one prime for each computation step. 2.2.2.10 Unordered Search A simultaneous comparison is made between a given value and all N values in a table. At the entry where the comparison occurs the corresponding second entry is fetched to a common cell. This requires 4 steps for any length list. Sequentially, the log-two search is optimal if the table is ordered. 2.2.2.11 Sort N Elements An interchange sort is used which repeats the following N times for a list of N elements. Consider adjacent pairs and interchange those pairs which are not in the proper sequence. Alternate steps consider pairs displaced one position. With 2.2.2.2 instructions for the test and interchange a slight advantage 57

over a sequential process is realized. 2.2.2.12 Coefficients of Fourier Series See example on p. 49. 2.2.2.13 Evaluating Fourier Series Evaluate A0 + Al sin X + A2 cos X + o. + A2N+l sin NX + A2 cos NX. Let CS be the sequential time for sine and cosine computation. During the first step compute KX for K = 1,2,...,N, then compute the sines and cosines during the next CS steps, then compute the products, and finally apply 2.2.2.1. Classification is global if N is large due to simultaneous computation of sines and cosines. 2.2.2.14 Neutron Diffusion Equation Solution of the parabolic partial differential equation is by the alternating direction method. The computation involves computing a tridiagonal matrix then solving a system of equations using their matrix. Greater concurrency is possible in higher dimensions since the computation involves relaxing one dimension per iteration. For N intervals in the grid in each dimension, there can be an order of N to N2 reduction in concurrent time over s sequential time. As in 2.2.2.7 the saving is within an iteration. 2.2.2.15 Neutron Transport Equation This is an integral-differential equation. For the N group case there are N equations. K is the order of integration used at each step. As with 2.2.2.7 and 2.2,2.14, the saving is within an individual iteration and not in overlapping 58

iterations. No estimate of local or global control can be made without a specific choice of the integral approximation method. 2.2.2.16 Eigenvalues of a Matrix On real symmetric matrices there are several procedures where modifications of elementary row operations are the major computation. Thus, the figures for matrix inversion are increased to include the extra computation. Control is essentially the same as in matrix inversion. 2.2.2.17 Summary The notation used in the summary is defined as follows: Description refers to the previous discussion. An X in the column headed "local" implies more than 50% of the operations could not be systematized to allow global control or the use of global control would have more than doubled the concurrent time. An x in the column headed "global" implies more than 50% of the operations could be performed such that only one type of operation was performed in a given execution cycle. The " local" and "global" categories are disjoint but not all encompassing. Sequential time indicates the number of sequential operations required to complete the computation. Concurrent time indicates the minimum number of operation tie s required to complete the computation when the maximum degree of concurrency is used. Maximum concurrency is a measure of the maximum number of operations that could be performed during a given computation cycle. 59

TABLE 1 SUMMARY OF OPERATIONS................. ~S~QUENTIA n..MAIMJM DESCRIPTION LOCAL GLOBAL SEQUENTIAL CONCURRENT TIME CONCIR TIME CONCURRENC 1. Sum or produce of N numbers. X N-1 [log2N] N/2 2. Evaluate Nth degree polunomials. 2N 2[log2N] N 3. Multiply vector 2 2 by matrix. X 2N Llog2N]+l N 4. Multiply two matrices. X 2N3 [log2N]+l N3 5. Matrix inver- sion. X 5 3N 2N2 6. Solving system of linear equa- tions. X 5N(N+K) 3N 2N(N+K) 7. Solving ordinary differential 2 equations. X 12N2K 4(4+[log2N])K N2/2 8. Least square bit. X 2NK +5K 2N+3K 2K 9. Compute the Nth prime. N[lnN] N N/[lnN] 10. Unordered search. X 2[log2N] 4 2N 11. Sort N elements. X N[log2N] 2N 4N 12. Coefficients of Fourier series. see example on page 49 13. Evaluating Fourier series. X CS-N [log2N]+CS+2 2N 14. Neutron diffusion eqn. 2-dimensional 2.2 6N 4N 3-dimensional 3ON3 9N 4N 15. Neutron transport equation.?? 2N2+KN 2[log2N]+[log2K] 2N2 16. Eigenvalues of a matrix. X 30N3 18N 2N2 60

Table 1 summarizes the sequential time, concurrent time, and maximum concurrency possible in the sixteen different operations studied. 2.2.3 Machine Implementation Via a Translation Algorithm Many computers with the ability to simultaneously perform a number of arithmetic computations have been proposed.13'14'16 7 Some have been manufactured and others are in various stages of development. In most cases, these machines have no predecessors with programming compatibility. This creates a need for many programs to be rewritten and new programs developed. Thus, it is desirable to have an ALGOL-type language available so that people other than "the experts" can gain early access to the machine. The purpose of this paper is to present a method of extending current translation techniques to obtain obJect programs for multi-processor computers, while keeping the translator straightforward and fast. A specific translation algorithm which has been implemented on an IBM 7090 is used for expository purposes. It will become obvious that there are many variations possible in table structures, scanning methods, and intermediate translations. The purpose here is to present the complete picture of producing a code for multiprocessor computers via height assignment rather than to justify the particular implementation. The translation algorithm can be divided into three basic phases. Phase 1 reduces the input statement to tabular form, phase 2 generates sequences of operations from the tables of phase I, and phase 3 fills in addresses to generate the translated program. Phase 1 processing includes reading statements, determining the statement type, and applying the decomposition routine appropriate to that type. Substi61

tution type statements are decomposed into tabular form, quintuple and symbol tables, by the expression scan routine. Iteration and conditional type statements are reduced to control quintuples, generated directly, plus quintuples from using the expression scan on the appropriate parts of the statement. All statements in a program are processed by phase 1 before phase 2 processing begins. At the end of phase 1 the original statements are no longer needed since the sequence of quintuples contains all of the computational information. Information from declarative type statements is preserved in appropriate tables. Figure 31, which will be explained later, for a specific example, shows the result of the phase 1 translation. Phase 2 processing converts each quintuple to machine operations. Each quintuple consists of two operands, an operator, and two pieces of height information. By a table lookup procedure each operator which appears in a quintuple is converted to a sequence of machine operations. The operands and created temporary locations may appear as data adresses for the machine operations. The height information us used to create a merge-link table, Fig. 36, which is used by phase 3 to generate sequencing addresses. Phase 2 processes one block of quintuples (statements), then phase 3 does the final translation on the code for that block. A block is defined as the largest sequence of quintuples which does not contain a label or a conditional jump operator. Intuitively, if any statement in a block is executed, all statements in the block must be executed. Figure 34, shows an example of the result of the phase 2 translation. Phase 3 processing uses the merge-link table to set up sequencing addresses. The most general translation algorithm is described, assuming an unlimited num62

ber of processors are available. By a trivial modification, the results of 18 T. C. Hu can be applied in phase 3 to produce code for a computer with any fixed number of processors. In the example described later and in Figure 37, symbolic addresses and operation codes are used for exposition. Sufficient information is known after phase -2 so that completely numeric code could be produced. Figure 39 gives a macroscopic flow diagram of the general translation algorithm. 2.2.3.1 Phase 1. Recognition of Concurrency The previous section could apply to a translation algorithm for a conventional computer. The significant features needed in the translation algorithm for a multiple processor computer are the use of the block and assignment of heights to operands and partial results. The translation algorithm uses a one pass scan of a statement applying a set of simple rules. Before giving the rules, an intuitive understanding is more readily obtained from the tree representation of a few statements* Consider the statement w + x + y + z and two of its possible tree representations: + 3 /x X Notice that the tree on the left computes w + x then adds this to y then adds

the sum to z obtaining the result in 3 execution cycles. Thus, the height of this tree is 3 as indicated at the far right. Also, only one arithmetic unit was required during each execution cycle. Now compare the tree on the right which simultaneously computes w + x and y + z during the first execution cycle, then adds the two sums during the second execution cycle. Thus, the same numerical result is obtained, but by having more than one processor, two arithmetic operations could be performed during the same execution cycle. In the next section all information will be presented as quintuples which are equivalent ways of representing trees. Each quintuple represents a computation of a partial result as does each operator in the tree. The quintuples corresponding to the trees above are: R1 w + x 0 1 R1 w + x 01 R2 R1 +y 12 R2 y + z 01 R3 R2 + z 2 3 R3 R1 + R2 1 2 The first column is an implicit numbering of the quintuples (rows of the table). The next five columns are respectively: operand 1, operator, operand 2, starting height, ending height. The starting height corresponds to the execution cycle during which the operation is performed. The ending height corresponds to the execution cycle in which the partial result represented by a quintuple may be used. In addition to concurrency within a single statement, there can be concurrency between statements in the same block. There cannot be concurrency be64

tween statements of different blocks since the sequencing of blocks can be a function of data and thus is not generally known at translation time.* An example of two blocks and their trees is given below: Statements In A = B + C D + E I = J +K X = A- I Out I = I + 1 Y = B/C HEIGHT I 1 B C ~ /+ \ 2 B E D JC B E *Using a combined execution and translation mode of operation suggested by A. J. Perlis, this restriction could be eliminated. 2.2.3.2 Expression Scan Almost every executable statement will require some use of the expression scan. It is in the expression scan that the algebraic notation of a statement is reduced to quintuples. All recognition of concurrency occurs during this scan. There are a number of cases where the previous discussion must be expanded. 65

These cases include requiring all arguments to be computed before a function call may be executed, compensating the height values if an operation requires more than one execution cycle, etc. These cases will be defined by a set of rules for performing the expression scan. The rules will be more precise and clear than a prose description of each case. Because function calls and subscription are allowed in expressions, each sumbol must have heights associated with it. The occurrence height of a symbol is the largest numbered execution cycle in which the symbol was used. The availability height of a symbol is the execution cycle when the symbol may next be used. In the example, Fig. 31 has the symbol table and heights for each set of quintuples. Constants always have occurrence and availability heights of zero. At the beginning of each block all heights in the symbol table are set at zero. a. Precedence Scan The statement is scanned from right to left using a well-known precedence scan.1 The essential features of the precedence scan needed to understand the following discussions are: 1) A simple push down list is established which is denoted hereafter as "LIST." Figure 33 shows the status of the LIST for the example which follows. 2) Precedences are established for operators and punctuation which are not handled as special cases. The relative magnitude of precedences must be such that addition has lower precedence than multiplication. Also addition and subtraction must have the same precedence as must multiplication and division. 3) The purpose of the precedence scan is to reduce the input statement to tree form. As was shown earlier, this tree is not always unique and thus, height considerations can be used to generate the tree with lowest height. The flow diagram of the precedence scan is given in Fig. 40. The references on the flow diagram to 7.1, 7.2, and 7.3 refer to application of the height rules which follow. 66

b. Weight Determination Rules The procedure to be applied at point 7. 3 on Fig. 40 is: 1) Scanning the LIST from left to right, sleect the first two operands such that no other operands have lower height. In comparing two operands of equal height, preference is given to row references over variables or constants. Denote the leftmost operand selected as A and denote the other as B. Do not scan past an operator with precedence different from the leftmost operator on the LIST. The form of the LIST during this scan is alternate operands and operators. 2) If there is a subtraction or, division operator immediately to the left of the A operand then the operator immediately to the left of the B operand is replaced by its dual. 3) If the B operand has lower height than the A operand and the operator is commutative, then the A and B operands are interchanged on the LIST, In comparing two operands of equal height, preference is given to row references over variables or constants. 4) A quintuple is produced consisting of: i the A operand ii the operator immediately to the left of B iii the B operand iv the maximum of the height of A and B (START) v the maximum of the heights of A and B plus 1 (END) Note that either A or B can be variables or row references, thus the height used would be the availability or ending height, respectively. 5) If neither A nor B is a row reference, item iv is the maximum of the height of A and the height of B minus 1. Item v is the new item iv plus 2. 6) If the operator is " —" and B is a row reference, and the starting height 67

of the last quintuple of the chain of first operands which are row references beginning with B is greater than the occurrence height of A, then item v is set equal to item iv. 7) The quintuple is assigned the next row in the quintuple table. This row reference replaces the A operand on the LIST. The B operand and the operator immediately to its left are removed from the list. 8) If the A operand is a variable and its occurrence height is less than item iv, then the occurrence height of A is set equal to item iv. If the B operand is a variable and its occurrence height is less than or equal to item iv,the occurrence height of the B operand is set equal to item iv plus 1. 9) If the operator is ", the available and occurrence height of the A operand is set equal to item v. 10) Repeat 1) through 9) until the precedence of the first operator on the LIST is different from the precedence of the first operator on the LIST before these rules were applied. The procedure to be applied at point 7.4, unary operator, on Fig. 40 is: 1) The quintuple produced consists of i a blank operand ii the operator being scanned from the statement iii the first item on the LIST (operand) iv the height of the first item on the LIST v the height of the first item on the LIST plus 1 Note that the operand may be a variable or a row reference, thus the availability or ending heights, respectively, are used. 2) If the operand is a variable, item v is increased by one. Further, if the operand is a variable and its occurrence height is less than or equal to item iv, the occurrence height of the operand is set equal to item iv plus 1. 68

The procedure to be applied at point 7.2, function call, of Fig. 40 is: 1) The arguments for the function call are on the LIST enclosed in parenthesis and separated by commas. An argument may be either a variable or row reference. The arguments are scanned from left to right until the closing parenthesis is encountered. The first quintuple produced consists of i the name of the function ii the CALL operation iii the return height of the function (next merge number) iv the maximum height of the arguments and function name v the same as iii 2) The second quintuple generated consists of i the row reference of the first quintuple ii the operation PAR (denoting parameter) iii the first argument iv blank v blank 3 ) The third and successive quintuples consist of i the next parameter ii the operation PAR iii the next parameter after item i. if any iv blank v blank 4) For those arguments which are variables and have occurrence height less than or equal to the maximum heights of the arguments, their occurrence heights are set to one greater than the maximum heights of the arguments. 5) The availability height of the function is set to v. The procedure to be applied at point 7.1, subscription, on Fig. 40 is: 1) If there is only one subscript, then a quintuple is generated which consists of: i the variable to be subscripted ii the operation iii the subscript iv the availability height of the subscript v the maximum of iv plus 1 and the availability height of the subscripted variable. 69

2) If there is more than one subscript, the same procedure as the function call quintuple generation is used. The variable to be subscripted is used in place of the function name except for the additional condition on item v above. The number of subscripts juxtaposed to an 4i is used in place of the CALL. The subscripts are treated as arguments. A refinement must be made to the definition of height to allow for variable heights. A variable height is necessary for the returned value from a function call. The number of execution cycles required by a function is not always known and is usually dpendent on values of the argument. The technique used is to assign merge numbers to each height value. If a maximum availability height is required and the two heights being compared are of different merge numbers, a merge quintuple is generated and the maximum height is given a new merge number. All rules given previously apply directly to sets of heights with the same merge number. The merge number is set to zero at the beginning of each block. 2.2.3.3 Phase 2 of Translation Algorithm After all statements have been processed by phase 1, the generation of machine code from the quintuples begins, Phase 2 is essentially a table lookup procedure. For each operation that may appear in a quintuple there is a sequence of machine instructions to be inserted. Either/or both symbolic and numeric machine instructions can be generated. Each quintuple is examined and the corresponding sequence of machine instructions is located. Flags set from previous sequences can be used to recognize instructions which may be deleted or changed to produce more efficient code. 70

This process is very dependent on the specific machine* for which the object code is being prepared. Insertion of one or both operands from a quintuple are made as designated by flags in the sequence of instructions. In case a row reference is to be inserted, a temporary location is assigned. As each temporary is assigned and used, pertinent height information is retained. In phase 3 of the translation temporaries are reused as much as the logic of the program permits. As each instruction is generated its position in the execution sequence is computable from the height information in the quintuple. For each instruction produced, an entry in the merge-link table is augmented. The merge-link table consists of entries which count the number of instructions for each (merge number, height) pair. The merge-link table is cleared at the beginning of each block. 2,2,3*4 Phase 3 of Translation Algorithm When all quintuples in a block have been processed, phase 3 processing begins a scan of the instructions generated by phase 2. Using the information in the merge-link table, a successor address is appended to each instruction. Each instruction is transferred from phase 2 to phase 3 with its execution number in the form of a (merge, height, number) triple. If the merge-link table has a positive count in the entry for the same merge number and one greater height, then a successor address is generated and inserted. If there is a successor for a given instruction, a zero is inserted as the successor address. Each time a *A specific set of instructions ispresented in the example. 71

non zero successor address is generated, the corresponding count in the mergelink table is decreased by 1. The condition may arise when the last instruction of a given merge number and height is encountered and the count in the merge-link table for the same merge number and one greater height is greater than 1. In this case, an indirect address is generated and placed in the successor address of the instruction. The required number of indirect address words are generated to form a tree. The tree begins with the indirect successor address and terminates at the remaining locations indicated by the count in the merge-link table. If the last entry in a row of the merge-link table is a merge designation, an indirect successor address tree is generated. The tree begins with the last instruction for that merge row and terminates at all the instructions of the merge designation with height zero. The beginning of a block is a special case of the above. Each zero merge number, zero height instruction is at the termination of an indirect successor address tree which begins with the final instruction of the previous block. 2.2.3.5 Machine Instructions Continuing with the philosophy of presenting a specific example, the following instructions were chosen to implement the final phase of the translation. In all cases y is the location of an instruction to be executed during the next execution cycle. The next execution cycle does not begin until all instructions being executed have been completed. A zero for a y address indicates an instruction has no successor. 72

List of Operations Operation Addresses Description ADD, B,7 Add contents of a to contents of 3 and place result in a. SUBTR a, B, Subtract contents of B from a and place result in a. MULT a,,7 Multiply contents of a by contents of 3 and place result in a. DIVIDE a,,y7 Divide contents of a by contents of 3 and place result in a. LOAD a,,y7 Replace contents of a by contents of 3. LOADA, The address, P, replaces the address portion of the contents of a. ADDRM Ia,,7 Address modification: the address, a, is added to the contents of P and the result is retained as an address b until the next execution of this instruction. If this instruction is referred to via an indirect address, A is used as the effective address. COUNTc a,5,7 One is added to a, if this sum equals P then a is set to zero and 7 is interpreted normally. If the sum is unequal to P, then c + 1 replaces a and y is treated as a zero for the current execution cycle. INDADR A pseudo operation which generates a tree.of indirect addresses with a four to one expansion on each level of the tree. SYN A pseudo operation which appears only in symbolic code to indicate several variable names refer to the same machine location. 73

A = A * (B+C*C+D)/(B+C)/D Row Operand Operator Operand Start End Ri B + C 0 ~T R2 C * C 0 12 R3 B + D o 2 R4 R3 + R2 2 1 Symbol Occur. Avail. R5 A D 0 2 A'4 4 6 5 | / | 1 | |'2 I| 3 BA I I R6 R5 / Rl 2 B 00 R7 R6 R6 * R 4 C 1 R8 A = R7 4 14 D 1 0 B = (C*B-D-E)/D*E+A R9 C * B 0 2 R10 D + E 0 2 1il R9 R10 2 3 A' 4 i 4 R12 D / E 0 2 B \ 5 \ 5 R13 R1 / R12 1 3 1 4 C' 1 0 R14 R13 + A: 4 1 5 D 1 0 R15 B = Rl4 5 5 1 E 1 I 0 A = 3+C 1l.6 ~3 + C 1O I0 2I 17 A = R16 1 4 i A 51 5 Fig. 31. Example of phase 1 of translation (quintuple generation). 74

C(I) = F.(D,E(I),I+J)+G-(A(J))*C(J) ow Operand Operator Operand Start End R18 C = J io i Rl9 A J 0 1 5 20 G CALL 1 0 5 1 0 21 R20 PAR 9 I g R22 11 0 MERGE' 1 - 11 1 R23 R20 * R18 11 1 2 2 Symbol Occur. Avail. R24 I + J O 2 A 5 1 5 R25 E 0 1 B 5 5 R26 F CALL 21 0 2 20 C 2 3 213 27 R26 PAR D D 3 28 R25 PAR R24 - E 3 0 t I, R29 21 0 MERGE 11 2 211 F 2 210 30 R26 + R23 21 1 2 2 G 5 1i 0 R31 C I 0 1 I O O 32 R31 =_ R30 2, 2 2 3 _J 1 i 0 C(I,A+J) = G-(I) R33 G CALL 3 0 1I 31 o R3 4 R3 PAR 55 j + 0 0 35 J + o 18 R6 C 42 41 o 8 8 440 37 R3 6 PAR I, R8, PAR R3 R39 4 0 MERGE 3 5 0 - 4 1 R40 R3 6 = R3 3 4 1 1 2 Fig. 32. Example of phase 1 of translation (quintuple generation concluded). 75

B + C ) /D R )C *C + D ) / R1 D -4 (B + R2 + D ) / p / D - ( R3 + R2 ) / R1 / D R4 ( A * R4 / / D R R * R5 R4 R / RR6 A7 R6Q ( R6 * R4 - A 7A =R R8 C *B - D - E ) / D * E + A R R1 - E ) / D * E + A R10 R1 - R10 ) / D * E + A R1 o RI / D * E + A O R1 ( R12 + A R3 Q R13 + A R14 (9 B = 14-1 R15 3 + C R16 B A - R16 R16 E _), R ) - R18 (A ( ) J ) ) * R18 RP9 Ri ( ) R18- R20,21 0 R 20 R181 R22 ~,0 I + ) + R23 R24 ^ (__ ), IR ) + R23 R25 ( ( D, R25, R24 ) + R231 R26-28 R26 + R23 R29,30 R: ) =R30- R31 0 R3 1 R30-j R3 2 I ) R33,34 A + J ) R335 R3 I: R35 ) R33-1 R36-38 ( R36 R3- 1 R39,140 Fig. 33. Status of list before each quintuple is generated. (Circled items are triggers for generation of quintuples.) 76

Execution Operation Operand Merge 4, Cycle #,& Code & Result Operand A = A*(B+C*C+D)/(B+C)/D 0,0,1 LOAD T1,B 0,1,1 ADD T1,C 0,0,2 LOAD T2,C 2 0,1,2 MULT T2,C 0,0,3 LOAD T3,B 0,1,3 ADD T3,D 0,2,1 ADD T3,T2 R4 0,0,4 LOAD T5,A 0,1,4 DIVIDE T5,D 0,2,2 DIVIDE T5,T1 R6 0,5,1 MULT T5,T R7 CHANGE T5,A R8 B = (C*B-D-E)/D*E+A 0,0,5 LOAD T9,C 0,1,5 MULT T9,B 0,0,6 LOAD TIO,D fR10 0,1,6 ADD T1O,E 0,2,3 SUBTR T9,T10 Rll 0,0,7 LOAD T12,D 2 0,1,7 DIVIDE T12,E 0,3,2 DIVIDE T9,T12 R13 0,4,1 ADD T9,A R14 CHANGE T9,B R15 A = 3+C 0,0,8 LOAD T16,3 0,1,8 ADD T16,C 0,4,2 LOAD A,T16 R17 Fig. 34. Example of phase 2 of translation (preliminary code generation). 77

Execution Operation Operand Merge j Cycle -,* Code & Result Operand C(I) = F-(D,E(I),I+J)+G.(A(J))*C(J) 0,0,9 ADDRM C,J R18 0,0,10 ADDRM A,J R19 0,5,1 CALL G',(,O,) R20 0,5,2 LOADA G+1',T20 R2 0,5,3 LOADA G+2t,(o,0,10)' 1,0,1-0,1,9 MERGE 0,2,(1,1,1) R22 1,1,1 MULT T20,(0,0,9)' R23 0,0,11 LOAD T24,I R 0,1,10 ADD T24,J 0,0,12 ADDRM E,I R25 0,2,4 CALL F',(2,0,1) R26 0,2,5 LOADA F+t,'T26 R27 0,2,6 LOADA F+2',D 0,2,7 LOADA F+3',(0,0,12)' {R28 0,2,8 LOADA F+4',T24 2,0,1-1,2,1 MERGE 0,2,(2,1,1) R29 2,1,1 ADD T26,T20 R30 0,0,13 ADDRM C,I R31 2,2,1 LOAD (0,0,13)',T26 R32 C(I,A+J) = G-(I) 1,1,2 CALL G, (3,0,1) R33 1,1,3 LOADA G+1',T33 R3 1,1,4 LOADA G+2',I 0,0,14 LOAD T35,J 0,5,4 ADD T35,A 0,6,1 CALL 4,2' (4,0,1) R36 0,6,2 LOADA 42+1, T3 6 0,6,3 LOADA ^2+2',I 0,6,4 LOADA 42+3',T35 R38 4,0,1-3,0,1 MERGE 0,2,(4,1,1) R39 4,1,1 LOAD T36',T33 R40 Fig. 35. Example of phase 2 of translation (preliminary code generation concluded). 78

D Merge Count Merge Height O 1 2 3 4 5 6 7 7 C 1i4 10 8 2 2 4 4 0 Maximum 3 1 4 1I M2 Number of Execution 3 2 1 1 1 0 Cycles Per Merge 1 3 1 M4 2 4 1 1 0 Fig. 36. Merge-link table at end of phase 2. 79

Loading Operation Address Code Operand, Operand, Successor S000 INDADR S0001'... S0014' S0001 LOAD T1,B, SO101 SO101 ADD Ti, C, S0201 $0002 LOAD T2, C, S0102 S0102 MULT T2, C,S0202 S0003 LOAD T3,B, S0103 S0103 ADD T3, D, S0203 S0201 ADD T3, T2,S50301 S0004 LOAD A,A, SOl04 S0104 DIVIDE A,D, S0204 S0202 DIVIDE A,T1, S03 02 S0301 MULT A,T3, S0401 S0005 LOAD T9, C, S0105 S0105 MULT T9,B, S0205 soo0006 LOAD T1O, D, S0106 o0106 ADD T1O,E, S0206 S0203 SUBTR T9,T10,0 S0007 LOAD T12,D, S0107 S0107 DIVIDE T12,E, S0207 S0302 DIVIDE B,T12, S0102 01401 ADD B,A, S0501 S0008 LOAD T16,3, Sol08 S0108 ADD T16, C, S0208 S0402 LOAD A, T16, S0500 S0500 INDADR S0502'... S0504' S0009 ADDRM C,J, S0109 S0010 ADDRM A,J, S0110 0)501 LOADA G, S1001, G S0502 LOADA G+', T20,0 S0503 LOADA G+2 t S0010', O S0109 YN SlOOl S1001 COUNT 0,2, S1101 SllO1 MULT T20, S0009', S1201 Fig. 37. Final program after phase 3 of translation (symbolic and/or numeric). 80

Loading Operation Address Code Operand, Operand, Successor SOOl LOAD T24, I, SOIlO ADD T24,J,0 S0012 ADDRM E,I,O S0204 LOADA F' S2001,F S0205 LOADA F+1',T26,0 S0206 LOADA F+2',D 0 S0207 LOADA F+3 tS0012' O S0208 LOADA F+4',T24, S1201 SYN S2001 S2001 COUNT 0,2,S2101 S2101 ADD T26, T20, S2201 S0013 ADDEM C,I,O S2201 LOAD S0013 1,T26, S2301 S1102 LOADA G',S3001,G S1103 LOADA G+', T33,0 S1104 LOADA G+2',I, o0014 LOAD T355,J,O S0504 ADD T35,A,5604 S0601 LOADA,2', S4001,o 2 S0602 LOADA +2+1, TQ 6, S0701 so603 LOADA 42+2,,0 so604 LOADA 42+3, 355,0 S3001 SYN s400l S4001 COUITT 0,2,S4101 S41o0 LOAD T36',T3,S0401 S0701 SYN S9999 S2301 SYN s9999 S4201 SYN S9999 S9999 COUNT 0,3,_ Entry to next block Fig. 38. Final program after phase 3 of translation (Concluded). 81

S-art ) Initialization.. End Generate phase 2 ~.'.^"~~~~~ rcode for block. Read next Produce merge-link st-atement table. Generate final, phase 3, code. Reset for..... f! -_ next block and rend of True Produce storage peat for each block. \ gprogram /I assignment False tatement True Re-initialize height Internally Produce block end generated has label 7 \ ~~^ labe quintuple label I I False,,.' 4 I Determine statement type Decomposition Routines: composition routine n r 1. Augment tables with information in declarations. 2. Generate control quintuples and use expression scan to reduce all executable statements to quintuple form. Do post processing for.3 Add new symbols -to symbol table iteration terminations and as found. merge points if applicable Fig. 39- General block diagram of translator. 82

a-ar End ^<ay^ i<~i - < ^~~ Update precedence of the \^ ^T^^ j ~ ^^ ~ Ilist. list prec =-4 Add Ri to the list. ^^Ri is^^^^ ^^y~Lrss k ^ i Generate subscription variable name or tem on the lis^ quintuple. Ri is the constant is''^ the characte r T variable being sub^^^ ^^ ^^^ ^^ scripted. Ri is ^ Generate quintuple using <^^nary- operator _________ ^ Ri ^_ and the f irst item on ^ T ^ ^^the list, delete first it em. 7.4 Ri ^ ^ Generate function call the character > ^> ~ ^ quintuple. Rit- is the T name of the function. ^^^ ^^ ~~i <- I - I _______7 J2 i i e^ charcte end cha ae Remove redundant parenitem on the list 10~ thesis from the list. is ) " Ri is < he character ) _ Add Ri to the list. recedence' <" Ri > preceden Add R_ to the list. )_ of the list T F Consider only opProduce quintuples accord- erators up to the ~>.ing to general rule using first operators height considerations. / of different ^^.^ height. Fig. 40. Expression scan flow diagram. 83

The approach taken in this section has been considerably different from the sequencing procedures of Schwartz15 and others. In evaluating the translation technique presented here, the following points are noteworthy: (1) The translation algorithm does not vary radically from several translators now in use, thus utilizing past experience can keep developmental time to a minimum. (2) The translation is fast since statements and quintuples are each scanned only once. (3) The object programs are optimal in the sense that no faster code could be produced within the limits of the input language. 2.2.4 An Augmented Language to Permit More Concurrence in Processing It was indicated in 2.2.3 that a number of computers are being developed which have facilities for doing concurrent arithmetic operations. It was also shown that an augmented language is needed if machine translation from an ALGOLtype language is to approach the efficiency of object programs obtainable by programmers. There are a number of ALGOL implementations on conventional computers and ALGOL is a fairly well accepted publication language for algorithms. Since there is no established language of the algebraic type for these computers, it is time to introduce concurrency into ALGOL. Although the following additions are directed at hardware representation, they conform to the seven guidelines for the reference language. A set of additions are proposed below which apply to the ALGOL GO report.26 For lack of a better name, these additions are called Concurrent Report on Algorithmic Language CALGOL 60. The additions below appear in the order which applies 26 to the report. The notation... indicates: repeat same information as in re84

port. The numerical indications in the following refer to the original report. 2.5. DELIMITERS < operator > =... 1< concurrent operator > < concurrent operator >.. = & < sequential operator >:: =.. Icommence if halt 4.1.1 Statement Syntax < statement >. =... \< concurrent statement >1< commence statement > J< halt statement > 4.8 Concurrent Statements 4.8.1 Syntax < concurrent statement >. = < statement > < concurrent operator > < unlabelled statement >I< concurrent statement > < concurrent operator > < unlabelled statement > < unlabelled statement > = < unlabelled basic statement >1< for statement >I< unlabelled block >j< unlabelled compound >1< if clause > < unconditional statement >1< if clause > < unconditional statement > else < statement > 4.8.2 Examples:1C: = A+B & for q: = 1 step S until n do X[q];: = B[q] 85

D; = sin (x) & Er = cos (x) & begin z: = O; f: = sqrt (x); if z > f then z: 2 x f + x end 4.8.3 Semantics The & indicates that all connected statements may begin execution when the first statement begins execution. It is implied that the computational order of the connected statements is irrelevant to the overall computation of the algorithm. The successor statement is not executed until all connected statements have completed execution. See 4.9 for method of commencing execution before connected statements complete execution. 4.9 Commence Statements 4.9.1 Syntax < commence statement >. = commence if < Boolean expression >i < commence statement >: < label > 4.9.2 Examples commence if p V q commence if x > y: 51: 57: 512 4.9.3 Semantics If no labels follow the Boolean expression, the next statepment in sequence is executed whenever the Boolean expression has value true. If labels are present, all statements so designated are executed whenever the Boolean expression has value true and the next statement in sequence is not executed. 86

4. 10 Halt Statement 4.10.1 Syntax < halt statement >: = halt 4.10.2 Example halt 4.10.3 Semantics.A sequence of computation is terminated by inserting the statement halt. It is assumed some other sequence is still in the process of computing. 5. Declarations < declaration > =.. I < independent declaration > 5.5 Independent Declaration 5.5.1 Syntax < independent declaration > 2 = independent < type list >1 independent < type declaration >1 independent < array declaration > 5.5.2 Examples independent A,B,C independent own integer array D[5:n] 5.5.3 Semantics Declaring a variable or array to be independent allows execution of instructions relating to the variable or array in any order. Each occurrence is con87

sidered as a distinct identification. In terms of subscripted variables each subscript is distinct. In terms of arguments in procedure calls, the arguments declared to be independent are not modified by the procedure. The usual conventions about the scope of a declaration within a block apply. Example procedure sfla (f, n, time); value time; integer n; independent integer array f[l:2+n]; comment sfm stands for switching function minimization. Two procedures will be initiated simultaneously, one using Quine's method and another using Ashenhurst's decomposition method. There is a maximum time limit for the minimization denoted by the argument "time" in seconds. If either minimization finishes before the allowed time is exceeded, that routine will set its indicator. The corresponding result is then moved into the f region and return is made to the caller. Note that the WAITING indicator is necessary to prevent reactivation of the commence instruction. In case of a tie, the Quine results are used. If time is up before either routine finishes, the result remains the same as the input. Local storage and variables as well as the procedure are given below. The routines of Quine and Ashenhurst are conventional procedures which set Q and A respectively to true before returning to caller. The independence declaration is used for f,g,h, and i to allow simultaneous transfer of results into the f region. begin independent integer array g:h[l:2Tn] 88

Boolean T,Q,A, WAITING; independent interger i; WAITING; = true; T: =Q = A: = false; Quine (f,n,g,Q) & Ashenhurst (f,n,h,A) & T: = timeup (time); halt; commence if WAITING (T Q A); WAITING: = false; if Q then for i: = 1 step 1 until 2tn do f[i]: = g[i] else if A then for i: = 1 step 1 until 2 n do f[i]: = h[i]; end sfm. Example for Algorithm 7 Problem: Given an n loop network of RL,C fixed with respect to time and voltage sources which vary as functions of time (assuming network quiescent at t = 0), find all loop currents as a function of time. Numerical Solution: Using Kirchoff's law around each loop, n second order differential equations are obtained. These equations are of the form shown below: Q1 = allQl1 + al2Q 2 +.+ alnQ n + bl1Q + b12Q2 + b*+ blnQn + fl(t) Q2 a 21Q'1 + a22Q'2 + a2nQ'n + b22Q1 + b22Q2 +. 2nn + f(t) Qn = anlQ' + an2' + ann'n + bnl + bn2+ bnnQn + fn(t) The a's and b s are constants depending only on the values of the RLC com89

ponents of the network. fl(t).. fn(t) must be evaluated at each time step. The initial conditions are Q = Q2 =', = Qn = The algorithm for the numerical solution for the loop currents AQ,.T.,Q. 1 n is given below: (The notation Q" means the value of Q" computed during the jth time step, i i i.e., t = j x H where H is the basic time step.). iKl 2 I Q (t, Qi' i) K H Ji K = H x Q do for i = l, n n+i 2 i t = t +H 2 J+1/4 J j Qi -i i 1 do for i =, n j+l/4 I Qi - JQi n+iK1 I _ H J+1/4 j+l/4 j+1/4 iK2 2 (t, NV ^ ) do for i = I, n _ - H J+1/4 n+iK2 2I I J+1/2 J > do for i = I, n j+1/2 j j Qi - Qi + n+iK2 90

ij:$= x j+.1/2, +1/2 j+1/2 iK3 H x /Q (t Q ) n+iK3 x +1/2Q > do for = I, n t = t +H ~=t j2 Qi 3= + do for i = 1, n:+3/Q JQ + iK J/Qi =/i +n+/K3 jK =Hx +3/4 Q J(+3/4 j+i-3/4 4 = HxIt Q(t, Qi) do for i = 1, n iK4 Hx 3/Q 3+1 3 13 3 3 3 - Q + (iK + 2xiK2 + 2x iK3 + do for i = 1, n i Qi -+ 6 (n+i iK22 n+iK3 + n+iK 4) Go to first step of algorithm until m repetitions have been completed. 1 2 n The solutions are then Qi' Qi'' Qi for i = 1, n. 91

CASE - 20 second-order differential equations + Busy work not suitable 1 16 for global control + kl, k2, k3 i + + ++~4+ JQ A / A N A A A N NK 1) f1(H)./ ^ I ^fn(H)?/'"~ \aabv 37 steps /? Same as above \ (17 step max) 7 s eSain e ar s wve \ + JQ max. ) aecreasing number at each step,possible global control) 2n multiplies by _ \ J\ 37 steps/iteration - parallel sequence X 4(4+1og n)'' \, 4.5n2 (at least) in single sequence - i.e., 4800; i.e., ratio over 100:1 20 second-order differential equations 92

Example for Algorithm 14 Parabolic Partial Differential Equation (General Diffusion Equation) Qs^,.td (i^^ = V[D(t,;>)V (F,t)] +W(,i) 1. aT 2a T a T T(O t) M T(L,y,t) T= T(x,O,t) =Tc T(x,Ly,t) Td 0 x T(x,y,) = Te supplied boundary conditions n+l n T -T n+l n+l n+l n n n i T 2T T + T 2T + T 2.a. j j,j i - ij ii +1 At (AX)2 (Ay)2 n+2 n+l i -i T -2T ^ +T TuT 2T +T T i -T il T+l n+ T n+l n+2 n+2 T+2 2.b. j 1j -,j iJ IiJ + i,-1 ija ij+l At (AX)2 (AY)2 i = 1,2,..,N in x direction AX-N = Lx j = 1,2,..,M in y direction AY-M = IY n = 1,2,...,T in temporal direction At-T = total time of observation AX AXAY Let S Y P At n+l n+l n+l n+2 n + n+2 n+2 2.d. T,j1 -(2+P/S) Ti, +1 Ti+ i -Dy 95

where n+1 (Ta) i f i = i 2 n 2 n 2 n DX + S Ti,_l -(2S -pS) Tij + S Ti+ + 0 otherlse n+l (Tb). if i = N n+2 (Tc)i if j = 1 j 1 n+l n+l n+l Dy - i, -(2-pS) Tj +T + 0 otherwise n+2 (Td)i if = M n+lt step -(2+p/S) 1 ( T,-D 2 I1 s\yXs = M 1'-(2+p/S) T- N x N system M M n+2nd step ditto M x M system with y subscripts, or sequential - Gaussian elimination and back substitution. Algorithm for Alternating Method of Solving Paraboli-c Partial Differential Equations 1. Initialize mesh 2. Compute matrix - IDXI N x M elements 1' 2 - 3+ 1 h 4 + - }Nothing 94

3. Solve system of equations Tri-Diag 3N steps 4. Compute Matrix - ID)y M x N elements ditto 5* Solve system of equations Tri-Diag 3M steps 6. Test l 12 < M x N elements TU3 7* Go to 2. Evaluating Degree of Parallelism Proc. V \/ v N xM 5NM N N xM 2.2NM \ \N 5NM M Nx M 2.2MN \. / \. / M x N 3MN Test parallel 3(M+N+3) execution cycles per double length time step, sequential 21MN if M = N parallel order N ~ the number of steps 95

2 sequential order N the number of steps If the last entry in a row of the merge-link table is a zero, the successor address of the last instruction of that merge number is set to the address of the final instruction of the block. The final instruction of the block is a count instruction which causes execution of the next block when all merge sequences of the current block have executed. The use of a final merge instruction prevents subsequent blocks from beginning execution before the current block is finished. A secondary naming of temporaries is performed in phase 3- Conventional methods of minimizing temporaries are applicable, but only within sets of instructions which have the same merge number. 2.2.4.1 Limitations and Potential Refinements It is easy to verify that the object program is as efficient in execution time as the input language allows. The precedence scan of a statement determines the order in which operations are to be performed within a permutation of operators of equal precedence. The height operator determines the order of these operators to obtain the fastest computation. By making the height cumulative within a block, every scan within a block is assigned to execute at the earliest possible instruction time. Arbitrary simultaneity between blocks is impossible with an ALGOL-type input language since the sequence of blocks is, in general, a function of data and, as such, not specified in the input language. By a limitation of the source language is meant a general rule with a finite number of exceptions has not been found to detect a useful property. For example, in the translation algorithm given here, all occurrences of a subscripted 96

variable were deemed to refer to the same variable. This method was used since arbitrary expressions can be used as subscripts and there is no general method of determining whether two expressions compute the same value without knowing the values for all the variables. Similar examples are the inability of deciding whether two blocks can run concurrently or not. A restriction was made for the expression scan as presented. A function in an expression was assumed to leave the values of its arguments unchanged. Further, it was assumed a function would not use or would preserve within itself values of arguments needed more than one execution cycle after its entry. These assumptions allow the availability height of the arguments to be unchanged and occurrence height of the arguments to be one greater than the function call. For subroutine calls which are for the purpose of computing values for arguments, a separate statement using a single call must be used. Heights of arguments and quintuples are computed as though new values for arguments will be available upon return from the subroutine. The algorithm is complete in the sense that special statements for inputoutput and tape or disc operations just generate subroutine calls. Storage allocation and/or isolation is essentially the same as on single processor computers. Techniques suitable for current translators should extend with minor modification to this multiprocessor translation scheme. Heuristics which sometimes but not always give advantages are purposely omitted from this text. These can be useful and should be considered in the light of a specific implementation of a translation algorithm. A minor detail is that redundant parenthesis are respected. This is by convention rather than a limitation since,in some numerical computation, the order of performing arithmetic can prevent overflow and 97

excessive roundoff error conditions. Object programs from a translator as described here are far less efficient than a coding by a good programmer. This great difference in efficiency indicates a need for research on languages for multiprocessor computers. The problem areas include recognizing when an iteration can be reduced to a number of independent noniterative sequences. Also, the ability to quickly recognize block separations. It seems reasonable to add additional statements to an ALrOL-type language so that the programmer could supply information about potential concurrency. The difficulty is to determine what information the programmer can supply easily and what information is needed by the translator. 2.2.5 Example of Application of the Translation Algorithm To clarify and expand on the previous discussion, an example which uses most of the features of the algorithm is presented. In discussing the example, special cases will be mentioned only for the first occurrence. The example is meant to cover many cases in a relatively short space. Rather than give a prolific explanation of the example, tables are given as they would appear during translation. Figures 31 and 32 show five statements as they would be given to the translator. For each statement the quintuples and part of the symbol table are given. These tables are as they appear at the end of processing the statement. The column entitled ROW in the quintuple table is for the reader only and exists implicitly during computer translation. The operands are denoted symbolically by variable names, constants, or row references. Row references are of the form, 98

R number. During computer translation operands in the quintuple table would be of the form S number or C number where the numbers would refer to the position in the symbol or constant table. The dotted lines in the columns for heights (START, END, OCCUR, AVAIL) separate the merge number from the height in that merge number. The merge number which is to be left of the dotted line is left blank if it is zero. For quintuples the starting height, START, indicates the machine execution cycle during which computation of an intermediate result begins. The ending height, END, indicates the first machine execution cycle during which the intermediate result may be used. For variables in the symbol table the availability height, AVAIL, indicates the first machine execution cycle during which the symbol may be used. The occurrence height, OCCUR, indicates the highest numbered machine execution cycle in which a symbol has been used to date. Figure 33 is discussed in conjunction with Fig. 32 because it shows that status of the LIST before each quintuple is generated. The two additional symbols F and -appearing in Fig. 33 denote left and right termination operators and are inserted by the translator before processing of an expression. The set of precedences assigned to operators and punctuation for this example are: Operator Precedence 0?() 1 2 + - 3 * / 4 Figure 40 is the flow diagram for the precedence scan used in this example. The first line on Fig. 33 shows operands and operators added to the LIST 99

by the normal rules of the precedence scan. The "(" is recognized as having lower precedence than the leftmost element "+", on the LIST. There is no choice of ordering operands in the general height scan which is applied to "B + C". The quintuple is generated with designation Rl implying the first quintuple is a block. The starting height of Rl is zero since the AVAIL height of both B and C are zero. The OCCUR height of C is set to 1 as per the height scan riles. The END height of Rl is 2 because the operation requires one execution cycle and neither operand is a row reference thus requiring another execution cycle to load a temporary. The height selection is illustrated by the third IIST of Fig. 33. The partial result "B +R2 +D" is to be converted to quintuples. "B" and "D" have lower heights than R2, thus the R3 quintuple "B + D Q2" is generated. The row reference, R3, replaces the "B" on the LIST and "+D" is deleated from the LIST. The partial result "R3 + R2" is denoted by R4. The START height is the maximum of the END heights of R2 and R3, i.e. 2, and the END height is one greater. The last LIST in the first group of Fig. 33 indicates a substitution quintuple is to be generated. Since the chain of row references R7, R6, R5 ends with R5 whicbh has a greater starting height than the occurrence height of A, the value for A will be computed by R7. The START, END height of R8 are thus the same as the END height of R7. The OCCUR and AVAIL height of A are also set to 4. The second LIST in the second group of Fig. 33 indicates a quintuple is to be generated involving D and E. Since the operator to the left of D is -, the operator in the quintuple is the dual of the operator between D and E on the 100

LIST. This is necessary to compute the correct result in the form -(D + E) in place of -D-E. A constant appears in the third statement as a reminder that constants always have zero AVAIL and OCCUR heights. The fourth statement, Fig. lb, involves subscription and function calls. Note that the first subscript quintuple, R18, requires one execution cycle. R19 also requires one execution cycle and begins at the same time but a fictitious END height is needed since R19 will appear elsewhere as an operand and the value of A is not available until cycle 5. R20, which is a function call quintuple, has a START height equal to the maximum of the heights of its arguments, 5. The END height is 1:0 indicating merge number 1 and height 0. The next quintuple, R21, has as first argument the location of the returned value (which later will be a temporary). The quintuple R22 indicates a computation follows from results with different merge numbers. Since this will ultimately be an instruction one cycle will be required making the END height 1: 1. Note the START height of the next quintuple is where the 1: 1 is used. In R32 the substitution operation requires one cycle, thus the END height is one greater than the START height. R36 shows a multiple subscript to be computed by a function call. In this case, the temporary assigned to R36 as an operand will contain the resultant machine address of the subscripted variable, 101

3. MACHINE ORGANIZATION 3.1 A MULTI-LAYER ITERATIVE CIRCUIT COMPUTER 3.1.1 Introduction A study of the organization of the latest large scale computers shows a trend to an ever increasing complexity from the system design point of view. This evolution towards complex systems has been dictated by the desire to increase the power of the computers, sometimes in the productivity aspect, and in a few other cases in the computational capability test. Most machine designs have had as a goal the maximization of the use factor of the computer, or at least of the most expensive units, generally the fast memory. This has been achieved by resorting to input-output buffering, by incorporating multiprogramming facilities reducible in the last analysis to time sharing procedures, by the inclusion of partial multiprocessing capabilities, and in some cases by creating a programmable structure organization as in the polymorphic machine. It must, however, be recognized that most of the available commercial machines tend to maximize the productivity, that is, they try to minimize the cost per instruction, which is proportional to the ratio of speed to cost per operation. On the other hand, very little has been achieved with respect to increasing the bounds of practical computability. Thus the problems encountered in the fields of pattern recognition, games, simulation and adaptation still need a computer capable of handling them in an efficient manner. The iterative circuit computer has been considered as the most suitable solution for these types of problems which have in common the characteristic 102

that the spatial distribution of the modules is an homomorphic image of the relations governing the interaction of the variables. The undisputed suitability of this class of computers for these problems has relegated to a second place some other properties of the iterative structure that are not exclusive of this organization, but which are much more easily implemented in it than in a system with specialized units. The iterative circuit computer provides the possibility of true simultaneous multiprogramming, plus the powerful resource of infinite interaction between the programs. Neither of these characteristics is present in any of the more sophisticated systems now available. Furthermore, an individual module can function at various times as an accumulator, register, memory cell or simply as a connecting link, and can be activated in any of these functions at any time during the execution of the program. Therefore, we are in the presence of an organization even more flexible than that of a polymorphic machine. 2 Although it would seem inappropriate to apply this term when there exists a lack of specialized units, it must be remembered that the modules are structurally alike, but their instantaneous functional behavior is different and is defined by the current instruction. The polymorphic system is a programmable structure machine, but the changes in structure are performed on the interconnections between modules and not on the internal organization of the computer modules. Therefore, the full advantage of a changing structure is not realized, although a great increase in the use factor of the system is obtained. There are two 103

factors affecting the effectiveness of the system. The first is due to the specialization and non-convertibility of the units, that is, there is a fixed number of components of each type available resulting in a limited number of combinations that can cover only a limited class of problems. Many problems cannot be handled efficiently because of the lack of more units of a certain type, while at the same time there may be a certain number of idle units that cannot be put into service because they perform different functions. The second factor is closely related to the first, and is connected with the problem of priority assigrnnent. It has been shown13 that an attempt to obtain a high utilization factor for the computing modules increases the mean queue length. If there exists a number of programs with low priorities, then a high use factor can be obtained, but usually a compromise must be reached between efficient utilization of equipment and length of waiting lines. In an I.C.C., however, any number of modules can be performing any one of the possible functions in one step of the program, and entirely different ones in the following operations. Also, the polymorphism of the machine is a function of the current instruction, not of the maximum requirements of the programn The available literature on I.C.C.'s is surprisingly scant. References 1,2,3 cover the design considerations, both for uni-dimensional and two-dimensional networks. Reference 3 includes also the treatment of the problems of stability andr equivalence of iterative networks. References 4,5,6 cover the specific problem of embedding a computer in the logical iterative netl0h

work. These are practically the only proposals for a computer based on this type of networks. Unfortunately, reference 4 covers only a special-purpose computer intended for pattern recognition and allied spatial problems. The paper by Holland5 has been the starting point for a number of projects, but its title has mislead many into believing this was a proposal for a practical machine While most of the ideas are worthwhile, they are by no means 8 unique or optimum, as S. Amarel has clearly pointed out in his review. Holland only describes a mathematical model of a space in which a simulation of the physical laws governing the interaction of a system with the environment can be set up. As such, the model possesses all the uniform properties and generality necessary for its use as a simulator in which the process of adaptation can be studied. While it still retains the power of an ordinary computer, its use as such would imply a wasteful employment of its potential capabilities while its performance would be hindered by an excess of nonessential features for this particular roleo Especially criticizable are the following features which affect characteristics that are fundamental in any I.CoCo: The scheme used for selecting operands suffers as a consequence of both the method used for addressing and the path-building procedure necessary to reach them. The addressing method employs a "floating" reference, that is, all the addresses are relative to the address of the module active at that moment, and therefore an operand address assumes a different representation in every instruction that refers to ito 105

The path-building procedure has the disadvantage of being essentially sequential, resulting in a long effective access time, and therefore assigning great importance to the problem of data allocationo These difficulties can very well be attributed to the lack of organs of command and to the circumstance that both control and information channels flow through the same network. In Newell's paper,7 however, we find the first reference to a multi-layer iterative structure, and furthermore, he suggests solutions to the problems of grouping modules to function as single entities and for the simultaneous selection of operands. It seems therefore logical to try to specify the organization of a multi-layer machine having each of the layers fulfilling some specialized function, yet being in itself a complete iterative structure The purpose of this paper is to present one possible example of such an organization, in which the following new characteristics are incorporated: (a) A path-building procedure having the short-time access advantage of the common-bus system, but which also allows simultaneous multiple path building with no mutual interference. (b) Three-phase operation, with specialized networks operating simultaneously in different phases on three consecutive instructions. (c) A specialization in the functions performed by the stacked networks. (d) Inclusion of geometrical operations in addition to the arithmetic and logical ones~ 106

3.1.2 Description of the Computer The computer is composed of three stacked layers, each consisting of an iterative network of m x n modules. The three layers are exactly alike in size, shape, and type of modules used. One layer is called the "program plane." This contains at the start the original program or programs, and later the modified programs resulting from the interaction of the original ones. Fig. 41. The intermediate layer is called the "control plane" and its function is to interpret the instruction following the one being executed at the moment, determining the operand(s) and storing in them the full instruction. These "image operand(s)" in the control plane will in turn generate activation signals which will be transmitted on the wires connecting correlative modules and will determine which modules in the third plane will act as operand(s) II in the next phase The third layer is the "computing plane" where the actual arithmetic, logical and geometrical operations are performed. The number and distribution of the modules active at any time is determined by the signals transmitted from the "control plane" in response to an "operation complete" pulse from the computing plane Communication between the three layers is provided by the following sets of connections Fig 41: (a) A set of connections from every module in the program plane to every correlative module in the control plane o 107

PROGRAM PLANE CONTROL PLANE Fig. 41. Inter-layer and wrap-around connections. 0o8COPUTN PLANE

(b) A similar network of connections from the modules in the control plane to those in the computing plane o (c) A similar set of connections from the modules in the program plane to those in the computing plane. (d) A common bus line connecting all the modules in the computing plane, and transmitting the "operation complete" pulse to all the modules in both the program and control planes. Fig. 42. The connection lines described in (a), (b), and (c) are called activation lines. 3.1.3 Description of the Planes Each plane consists of a network of m x n modules, the modules being connected by a line called the information line running in each row and column through the normally conducting gates in each module Fig. 44. Besides these inter-modular connections, there exists an end-around connection between the first and last module of each row and similarly for the terminal modules of the columns. Therefore, there is a separate information line for each column and row, which is closed on itself by the end-around connectio These end-aroud connections provide the spatial continuity of the structure, transforming the planar distribution into one where a uniform neighborhood relation holds for all the modules, with no constraints due to physical boundaries The resulting continuity provides the same behavior as that of a network spread over the surface of a torus 109

3.1.4 Description of the Modules All the modules in the three planes are exactly alike. They communicate with each other in the same plane by means of the column and information lines running through them, and with the correlative modules in the other layers by means of connections called activation lines. The internal structure of the modules includes an accumulator, a register, a decoder and several switching matrices to connect the former to the information lines. These units can be described as follows: (a) An accumulator capable of performing addition, whose input is supplied from the output of a switching matrix connected to the four possible inputs to the module. The accumulator is connected through parallel gates with a register of the same length, which is described in (b). (b) A register of the same length as the accumulator, and which is connected with it through parallel gates. This register simply copies the contents of the accumulator every time a load-type operation is completed. At the same time, it supplies the only output lead over which the contents of the accumulator can be read out. This means that every time some instruction requires the transmission of the contents of the accumulator, the information is actually taken from the corresponding register. The output can be directed to any of the module's four terminals by the switching matrix (d). (c) A switching matrix with inputs from the module's four input terminals, and whose output is connected to the input of the accumulator. The 110

PROGRAM PLANE CONTROL PLANE COMPUTING PLANE Fig. 42. Three-plane structure and common buses. 111

setting signals for the matrix are supplied by a decoder, described in (g). (d) A similar switching matrix, whose input is the output of the register, and whose output feeds any of the four terminals. (e) A pair of gates, normally conducting, that connect the terminals belonging to opposite sides of the module, thereby maintaining the continuity of the vertical and horizontal information lines across the module, with no connection between them. (f) A switching matrix that can connect any input terminal to its immediate neighbor. In this way, a corner in the transmission path can be formed. (g) A decoder, which receives the complete instruction on the normally continuous information line, and which compares the address in the instruction with its own address, producing output signals that govern the setting of (c), (d), (e), and (f). (h) A similar decoder for the other information line. (i) A gate connecting the output of the register with the activation line going to the input of the correlative module in the computing plane. 3.1.5 Word Format A word is composed of four field.s the operand I field, the code field, the operand II field, and the successor field. The operand I field contains two pairs of symbols; the first pair indicates the rows to which the first and last modules in the pattern or group of modules 112

belong; that is, it gives an indication of the extension of the pattern of operands I. The second pair does the same with respect to the initial and final column coordinates. Example: Operand I Code Operand II Successor field field field field ( 36;22 ) Load ( 22;33 ) ( 33;77 ) Since we are dealing with linear patterns, that is modules that are all in one column or row, at least one of the pairs in the operand I or operand II fields must contain a repeated number to indicate that only one column or row is involved. Example: (36;22) indicates the pattern extending from row 3 through row 6 and belonging to column 2. Therefore, the operand I field indicates which modules will be operands I when the instruction is executed. Similarly, the operand II field gives the address of the group of modules which will become operands II. The successor field specifies the address of the location of the next instruction, and therefore is always of the form (XX;YY) since it must necessarily refer to a single module o 3.1.6 Path-Building Procedure The method used for communicating between modules in an iterative circuit computer is one of the key factors that determine the efficiency of the machine. The method described here is not strictly a path-building procedure since the 113

connections are permanently established as row and column information lines. The procedure only determines the operator and operand locations, and sectionalizes the corresponding row and column information lines into segments that are connected together at the cross-over point. The general structure of the switching arrangement within each module is shown in Fig. 43. The gates are shown as bi-directional to simplify the diagrams. The row and column information lines run through all the modules in the corresponding row or column forming a closed loop since all the switches in the path are normally closed. Fig. 44. The sequence of operations is as follows: The current instruction is stored in the active module, in this case the module at the upper left corner of Fig. 45. The instruction word is transmitted over one of the two information lines, the choice depending on the shape of the pattern of operand I. If the operands I are all in one column, then the information is transmitted on the row information line and vice-versa. Since we deal with linear patterns, one of the pairs of coordinates in the operand I field will always be of the form XX; the repeated number indicating that the pattern extends linearly over the X column or row. The instruction transmitted on the information line that spans the whole row or column where the operand I is located is received by the decoders in all the modules in that row or column. Each decoder checks for coincidences between the operand I and II addresses contained in the instruction and the corresponding addresses of its own module. This includes the module originating the informationo In Fig. 45, the operand I has the address (33;22) and the operand (55;66). When the decoders in the modules of the f1irst column compare these addresses 114

/\^ bi-directional gate, normally non-conducting 4, bi-directional gate, t' ~t ~ |~ normally conducting Fig. 43. Information-line switching in a module. Fig. 44. Colun and row information lines. 115

02 03 - - - - o6 R 03 I I I I I 05 Fig. 45. Path connection for the instruction: (33;22)(store)(55;66). L16

with their own, two types of coincidence may arise: (a) Double coincidence between the row and column coordinates belonging to one of the operand fields in the instruction, and the corresponding ones in the module address. (b) Double coincidence between one row address in one field, one column in the other field and the corresponding ones in the module address. It is evident that case (a) occurs only when checking the addresses of either the operand I or the operand IIo In the first instance, the addresses in the operand I field will coincide with the addresses in the corresponding field in the module address. When the operand II is checked, the operand II field addresses will coincide. The second case will occur at the module situated in the intersection of the column and row to which the operand I and operand II belong. In Fig. 45, the following situation will arise: Instruction: (33;22) Store (55;66) Intersection address: (55;22) The double coincidence is between row and column addresses belonging to different fields in the instruction word. The different results of the coincidence checking procedure are used to trigger two different sequences of events: (A) If case (a) occurs, then eithi r an operand I or II location has been reached, and the decoder activates one unit (e) and either (c) or (d), as described in Section 3.1.4. As a consequence, the following operations take place (i) The switch in the information line is opened, isolating the rest of the lineo 117

(ii) Either the input or the output of the accumulator is connected to the information line. The operand I is always the source of information and therefore the transmission is from the operand I location to that of operand II. (B) If case (b) occurs, then a corner in the path has been reached, and the decoder activates both (e) units and the (f) unit. As a consequence, the following operations take place: (i) Both switches in the row and column information lines are opened, completing the isolation of a piece of line from the terminal module to the corner in each information lineo (ii) One of the switches connecting adjacent sides is closed, connecting the two isolated pieces of line and forming a continuous path from operand I to operand II. The above procedure takes only two pulse times because all the decoding takes place simultaneously in all the modules in a row or column. Furthermore, it doesn't depend on the relative position of the modules to be connected, but only on the addresses of the operandso In the case of instructions with multiple operands I and/or multiple operands II, it is possible to connect them in a one-to-one, one-to-many, or many-to-many wayo 3.1.7 List of Instructions It is very common to speak of the operand. I and the operand II when referring to an instruction, and usually no further distinctions are needed because there is only one accumulator. However in the case of the iterative circuit computer, where both operands are stored in modules that have the same capabilities, the daistinction is no longer adequateo It then becomes 118

necessary to specify the direction in which information must flow since both modules can process the instruction and store the result. Actually, any of the two modules could perform the role of accumulator and then we would have a left and a right instruction of each type, depending on which module executes the instruction and stores the result. In order to simplify the list of instructions, it is arbitrarily agreed that the operand II will always be the accumulator. Following this convention, the list of instructions for elementary arithmetic and logical operations is reduced to the following: LOAD: Loads the contents of the location specified by the operand I into the location specified by operand II. The result appears in the location of operand II. ADD: Adds the contents of operand I to the contents of operand II. The result remains in operand II. COMPLEMENT: The contents of operand I are complemented. TRANSFER ON NON-ZERO: If the contents of operand I are different from zero, control is transferred to the instruction located in the module specified by operand IIo If the contents are equal to zero, the normal sequence of instructions is followed, that is, the next instruction executed will be the one specified by the successor field 119

3.1.8 Operation of the Computer The execution of an instruction takes place in three phases, and each phase is performed in a different layer. Once a plane has executed its phase on an instruction, it waits until the next operation complete pulse from the computing plane causes the transfer of a new instruction to be operated on. Thus, the execution of the three phases proceeds simultaneously in the three layers, but with a different instruction in each layer. As a result, the effective operation time is one instruction per phase; the duration of the phases being determined by the computing phase being executed at the moment. Fig. 46 shows the sequence of phases and the transfer of each instruction from plane to plane after the execution of each phase. The roman numerals indicate the phase, and the subscript the particular instruction being operated on. Thus, III indicates that the second instruction is undergoing phase III. Phase III is the one that takes the longest time to perform and therefore is the one that generates the operation complete pulse that triggers the initiation of all phases in the three layers. The sequence of operations that an instruction undergoes during the three phases is as follows: PHASE I: The initiation of this phase is triggered either by an operation complete pulse or a start pulse~ During this phase, the instruction following the one already in the control plane is made ready to be copied from the program plane onto the control plane in the same relative position. The net effect is to choose the successor to the current instruction already in phase 120

I4 I 16 PROGRAM I 5 6 PLANE CONTROL II3 I14 II5 PLANE COMPUTING I2 III3 I4 PLANE Operation complete pulse Fig. 46. Overlapping of phases. 121

II. The successor is specified by the address in the successor field and can be any location in the plane. In other words, it is not required that it be a contiguous neighbor. PHASE II The instruction activated in the program plane is copied in the same position in the control plane In this plane, the operand I]field of the instruction is interpreted to determine which modules are to be active (operandsII) during the computing phase. Once the positions of the future operandsII are determined, the whole instruction is transmitted and copied in these "image" positions in the control plane. The location of the "image" operators is found following a path-building procedure, as described in Section 3.1.6. Furthermore, the necessary data are copied from the correlative modules in the program plane In this way, each future operand II position is now loaded with the instruction and the operandII itself. PHASE III: The next operation complete pulse from the computing plane bus line initiates the third phase. The instruction and data now stored in the module or modules in the control plane are now transferred to the correlative modules in the computing plane, and each of these modules initiates a path-building procedure to connect itself to its operand or operands I, At the end of this process, the modules containing the instruction are connected to their respective operands I. This connection can be from one module to another, from one to many, or from many to many, depending on the operation specified by the current instruction contained, in the operands II, The locations thus selected receive the necessary data from the correlative positions in the program plane 122

Once the connections have been established, the instruction is executed with information flowing in the correct direction. Operand I is always the source of information and therefore the output of its register has been connected to the information line. Similarly, operand II acts as the accumulator and the information line is connected to the input of its accumulator. The result is then transmitted to the correlative module in the program plane, where it is stored. Simultaneously with this activity in the computing plane, the control plane is now executing phase II on the next instruction, since it remains free once the "image" operators have been transferred to the computing plane. The completion of the execution phase is signalled by an "operation complete" pulse which is transmitted over the common bus from the computing plane to the similar buses in the program and control planeso This pulse initiates phase II in the control plane and phase I in the program plane. This sequence of operations can be visualized following the transfer of instructions between the layers, in Figs 47 through 50, while the computer executes the following sequence of instructions, supposedly part of a program: 1. (77;88) Load (44;55) (33;44) 2. (57;77) Add (57;3) (33;55) 3e (56;77) Load (66;23) (33;66) 4. (33;24) Load (66;22) (33;77) 5. (00;00) Clear(56;33) (22;77) 6o (55;66) To^ (33;99) (22;88) (Transfer on non-zero) 7. (oo;oo) No op(00;00) (22;99) (No operation) 123

Figure 47shows the computer at the moment the first instruction is undergoing phase III. The state of the machine can be indicated by: III; II; I 1 2 3 The program plane is executing phase I on instruction 3, that is, it activates instruction 3 as the successor of instruction 2o In the control plane, both instruction 2 and the data corresponding to the operands II are being loaded into the locations asignated to the operands IIo In the computing plane, instruction number 1 is being executed, with the contents of module (77;88) going into module (44;55). Figure 48 shows the machine in the state III; II; I. The control plane 4 3 c-2 is interpreting instruction 3, locating the positions of the image operands II, in this case the modules in row 6 and columns 2 and 3. Both instruction number 3 and the contents of modules (66;23) in the program plane are now copied into the correlative positions just determined in the control plane. The computing plane is executing instruction number 2, in this case, adding the contents of (57;77) into (57; 33) In a similar way, Figs 49 and 50 show the execution phases of instruction numbers 3 and 4. All instructions except the Transfer on Non-Zero instruction are treated in a similar ways The execution of instruction number 6, which is a Transfer on Non-Zero, gives an opportunity to explain in more detail the sequence of operations for this type of instructiono When instruction number 4 is executed, an Operation Complete pulse is sent back to the program plane, and instruction number 6 is activated. It has to be remembered that instruction number 5 is already in the control plane. 124

Fig. 47. Execution phase of instruction 1: (77;88) Load (44;55) (33;44). 125

Fig. 48. Execution phase of instruction 2: (57;77) Add (57;33) (33;55). 126

Fig. 49. Execution phase of instruction 3: (56;77) Load (66;23) (33;66). 127

-- rl10 Fig. 50. Execution phase of instruction 4: (33;24) Load (66;22) (33;77). 128

The situation is that illustrated in Fig. 50, and again in Fig. 51, but this time in a lateral view. The operation complete pulse changes the situation to that illustrated in Fig. 52. Instruction number 6 is transferred to the control plane, and a path is built there connecting the module containing the instruction with the operand II, in this case module (33;99). The instruction is then stored in this module which receives the alternate address from the correlative module in the program plane. The next operation complete pulse, signalling the termination of instruction number 5, produces a copy of module (33;99) in the computing plane. Fig. 53. This module is then connected to the operand I module, in this case (55;66). The operand I module contains the word of data on which the result of the transfer instruction depends. The active module (33;99) then determines if the number in (55;66) is equal to zero or not. If the number is equal to zero, an operation complete pulse is emitted and the normal sequence of operations is resumed, If the number is not equal to zero, the active module (33;99) sends an activation signal to the correlative module in the program plane, activating it as the immediate successor and overriding the active status already obtained by instruction number 8. After a suitable delay, an operation complete pulse is emitted, and the normal sequence of operations is resumed. The delay is necessary in order to allow the newly designated successor to activate its own successor, which may be any position in the plane, not necessarily a contiguous neighbor. Fig. 54, 129

Fig. 51. Operation Fig. 52. Transfer of complete pulse. instruction 6. 1 ~(55-~;F]66) DLELAYEb 0P.C.PULSE ___ — _ _ i Fig. 53. Operation 6 Fig. 54. Delayed operexecuted. ation complete pulse. 130

Therefore, the sequence of instructions resulting from the transfer instruction is: 6, 7, 11, 12... instead of the normal sequence: 6, 7, 8, 9... 3.1.9 Geometrical Operations When an attempt is made to process geometrical patterns in a computer in which the instructions refer to only two operands, it is necessary to divide the pattern into individual elements and operate on them one at a time. In the machine described here, the availability of multiple operand instructions reduces most of the geometrical operations to one of the arithmetic or logical ones. In order to simplify the operation code, it is convenient to establish the relationship between the geometrical and arithmetic operations since most of the former can be interpreted as a particular case of multiple operand I and/or multiple operand II arithmetic operations. Thus, a Store operation can refer to a One-to-One (OTO), One-to-Many (OTM) or to a Many-to-Many (MTM) operation. The OTO Store operation is the normal one, and in the geometrical interpretation would be called a COPY instruction. The OTM Store instruction has two versions in the geometrical case: (i) The pattern of one module is to be repeated contiguously and linearly. Fig. 55. The corresponding geometrical operation is called EXTEND. (ii) The module has to be copied in repeated positions, all consecutive, but not contiguous to the original one. Fig. 56. The corresponding geometrical operation is called REPRODUCE. 131

The MTM Store operation repeats the pattern in a position parallel to the original one. Fig. 57. The corresponding geometrical operation is called DISPLACE and reproduces the pattern in a parallel position~ It implies a simultaneous one-toone copy operation on many modules. Therefore, a correspondence between the geometrical and arithmetic operations can be established, in which the first column can roughly be assimilated to a compiler language and the second one to a machine language. COPY - - - Store OTO EXTEND - - - Store OTM REPRODUCE - - - - - - - Store OTM DISPLACE -------- Store MTM 3.1.10 Conclusion The organization presented here is not intended to be an ultimate design. Rather it presents one possible way of combining the intrinsic capabilities of the iterative structure with the advantages of an organization having some form of specialized control unit. The principal advantage of the proposed. organization resides in the fact that the multi-layer structure makes it possible to include a control plane which acts as a look-ahead unit, interpreting the instructions before the actual execution takes place. This disposition provides the capability of dealing with instructions 132

i I I I Fig. 55. EXTEND operation. I I Fig. 56. REPRODUCE operation. FIg5 I SLC operat__ -~I Fig. 57. DISPLACE operation. 133

that operate on any number of modules simultaneously, yet retaining in every step the possibility of true simultaneous operation of several programs with an unlimited degree of interaction between them. Moreover, the introduction of the look-ahead feature doesn't detract from the effective speed of computation, since the delay introduced by the pre-interpretation phase is compensated for by the overlapping of the sequence of phases which process consecutive instruction in different places simultaneously. Furthermore, the method used for path building provides communication between the modules in the network with a very short access timeo Therefore, the combination of features given by the pre-interpretation of instructions and by the overlapping of phases can be regarded as a net advantage, with no penalty in time or complexity of the individual modules. The sole and inevitable penalty is the inclusion of two more layers. While this increases the number of modules by a factor of three, it must be remembered that the whole feasibility of this type of machine organization depends on the availability of components whose cost depends very weakly on the internal structure, and. whose easy reproducibility assures a low cost per unit when used in large numberso 134

3.2 PHYSICAL AND LOGICAL DESIGN OF A HIGHLY PARALLEL COMPUTER 3.2.1 Introduction There are a number of features each programmer would like to have in a large-scale digital computer. The desires are as numerous as the programmers and often serve opposite purposes. Thus, one of the foremost problems in developing an improved computer organization is to recognize the fundamental requirements of the programmer. It can be safely said that, in general, programmers desire computers to be large, fast, versatile, and easy to program. We are presenting a machine organization for a general-purpose computer that could be superior to existing computers for some problems and would extend the range of problems solvable on computers. 3.2.2 Objectives Our primary objective will be to increase the amount of computation that can be done in a given time by means of a new organization rather than faster components. To this end, our organization provides for simultaneous execution of many instructions, each by a separate processing unit. The machine's ability to do parallel processing leads to the desire for providing unlimited interaction among processing units, i.e., each processor not only has circuitry that interprets an instruction and causes control action, but also is able to communicate with every other processor and to operate on any data. In a parallel computer these abilities are needed for the efficient running of large programs and for programs well suited to parallel computation. But the 135

organization must also provide for a partitioning of the machine so that a number of small programs could run simultaneously without interaction. This is usually referred to as "interprogram protection" and would need to be under program control to be completely versatile. Since processing units can interact directly with each other there is no need for a central control. In fact, any distinguished or superior processing unit would unreasonably complicate programming. The organization should allow the size of the machine to be flexible, a variety of I-O devices to be provided, and a powerful set of operations to be available for the benefit of the programmer. In particular, there should be complete flexibility with respect to the number of instructions and number of data; the only restriction being that their sum does not exceed the storage capacity of the machine. Further, it would be convenient to the programmer and conservative of storage if one instruction could cause an operation to be performed at a number of locations simultaneously. For example, a single instruction could cause one number to be added to the contents of many memory locations. In addition, the hardware should be able to accept an arbitrary number of instructions for execution at any given program step. If all instructions can not be processed simultaneously, then the computer should process them in groups. When all instructions for a program step have been completed, the next program step should begin executing all instructions designated as successors by the instructions of the immediately proceeding program step. And finally, in the light of previous requirements, it would be unrea136

sonable to cause any computation bottleneck due to a shortage of arithmetic units or delay while accessing data. Therefore, every memory location should be directly accessible by an arithmetic unit. The computer, being equally limited by computation and memory access, could employ lookahead efficiently, thus enabling the greatest overall computation speed. These objectives form a basis from which a very powerful computer organization can be developed. We realize, of course, that there are other objectives which might be added or substituted to meet other criteria. Our choice of objectives is based on the fact that a machine organization fulfilling these objectives could substantially reduce average computation time for some problems as compared to a computer with a single processor constructed from similar components. As might be expected, any computer fulfilling these objectives would require many times the number of components in existing computers. Potentially inexpensive components and useful construction techniques are presented in Section 3.5.2 of this report. A brief look at cost versus problem-solution time indicates such a machine would be uneconomical in the next year or so. Yet, technological improvements that reduce the cost of logical components without necessarily increasing their speed, plus reasonable development of parallel programming techniques, could make such a machine economically competitive in the near future. 3.2.3 Organization The following computer organization meets the objectives outlined above 137

and has some novel features for logical design and physical construction. The computer consists of many identical modules, blocks of logical circuits, imbedded in a passive connecting network. Each module contains a basic arithmetic unit, storage for one word of data or one instruction, and some control circuitry. There is a central timing and synchronization but no other common memory or control units. This is sufficient to form a complete general-purpose computer minus input-output equipment. Our main consideration is the description of a module, of module connection, and of program execution within this computer. Input-output devices are to be connected directly to modules, with, at most, one per module. In this way, arbitrarily many I-0 devices can be operating simultaneously without slowing down computation in other modules. Since we are considering a highly parallel computer we wish to allow arbitrarily many instructions to be executed simultaneously. Rather than having instruction counters hold the locations of the next instructions to be executed, an additional bit position, the execution bit, is appended to each memory location. At the time when execution is to begin, the contents of each memory location having an execution bit equal to 1 is executed as an instruction. A 0 then replaces the 1 in the execution bit of those locations from which instructions were just executed. Thus an instruction specifies its successors, if any, by setting the execution bit to 1 in the memory locations of the instructions to execute next. To avoid the priority problems of assigning instructions to be executed to processing units, each memory location has an instruction processor directly connected to it. 138

The instruction processor operates, or is active, only when the execute bit is 1 and the execute signal is received from the central timing circuits. The function of the instruction processor is to route operands to an arithmetic unit with information as to what operation is to be performed. To avoid the problem of assigning arithmetic units to active instruction processors, each memory location has its own arithmetic unit. The memory register itself serves as the accumulator, the register that contains the operand which is to be used and then replaced by the result of an arithmetic operation. Thus, by using the arithmetic unit at the location where the result is to be stored., there will never be a time when an instruction processor must wait for an arithmetic unit. An attempt by two instructions to store information in the same location at the same time is considered a programming error. An additional feature of having many arithmetic units is to allow one instruction to specify that an operation is to take place at many locations simultaneously. The addressing of many locations by a single instruction is accomplished by indirect addressing. As might be expected, in this machine the data accessing for arithmetic operations is considerably different from conventional computers. No fetching of instructions is required, thus the data accessing circuits can be simplified and computation speed is increased. Even with this simplification, far too much circuitry would be required if each instruction processor needed the ability to access every memory location directly. To cut down the number of components and yet keep flexibility of accessing the following or139

ganization is used: For each memory location there is a module containing the memory register, instruction processor, arithmetic unit, and what we will call pathconnecting circuitry. Each module has direct connection, by wire without gates, to a few other modules. For one module, say X, to gain access to a module not directly connected, say Y, the destination (address of Y) is gated onto the wire that directly connects X to a module closer to Y. As soon as a path has been completed from X to Y, X has access to the memory register and arithmetic unit in the Y module. In this way every module can have access to every other module while having a direct connection to only a few modules. One of the most significant factors in the design of such a machine is the logical organization of path segments (directly connected modules). The two extremes of path segment organization are: (1) every module connected to just two other modules (geometrically the modules could be placed in a line with wire connecting adjacent points on the line and the two end points), and (2) every module connected to every other module (geometrically, k modules would form a k-l dimensional simplex). Neither of these seemsacceptable for the general-purpose computer being considered by this report. The line of modules uses less components than any other but relatively few accesses could be made simultaneously, e.g., several short paths could isolate many modules from others to which access is required. Let the machine under consideration have 2n modules. Now, as a compro140

mise between the number of components and expected number of simultaneous accesses, let each module have a direct connection to n other modules. Thus the number of direct connections is a function of the size of the machine. For a 32,768-word machine each module would be connected to 15 other modules. The logical organization of these connections would be to have the modules as the vertices of a 15-dimensional cube with the edges of the cube being the direct connections between modules. Since each module can be represented by a unique 15-bit number, the direct connections correspond to a wire from each module to the 15 other modules whose numbers differ in one bit position, i.e,, unit Hamming distance. There is a physical construction whereby the wires for the direct connections can be laid out in n layers for a machine with 2n modules. The modules are laid out in a two-dimensional square array as shown below. Each layer contains exactly one connection for each module and no connections cross within a layer. The layers could be made by deposition or printed circuit techniques. Only n/2 masks would be required and each mask would have 2n-' lines on it formed from 2n/2 repetitions of a 2~2 line pattern, e.g., for a 4096-module machine n = 12; thus each mask would have 2048 lines formed from 64 copies of a 32-line pattern. The mask, layers, and composite view of a machine with 16 modules are given below: There are several interesting measures of accessibility which depend on the logical organization. First, the maximum length, in number of segments, that any minimal path may be is n in a machine with 2n modules, i.e., maxi141

-MASK #I~ LAYER I LAYER 2 LE [L]L n[ L [ [ [5~ ^~LI]~1- MASK# 2 - LAYER 3 LAYER 4 joolol 100171 p~l lono MODULE REPRESENTATIONS AS BINARY NUMBERS (ADDRESSES) Fig. 58. Detail of path. segment wiring. 14P

mum Hamming distance between two n-bit numbers is no Second, the number of different paths between two modules differing in k bits is k! i.e., all permutations in the order of reducing the Hamming distance by 1 each step for k steps. Finally.statistically the expected number of simultaneous accesses that could be made in a 4096-module machine is over 300, assuming random storage assignment of data and instructions. Of course, instructions and data are not randomly assigned storage. Considering the timing factor on accessibility that each module is directly connected to only a few others, it is not difficult to see that clever programming could yield many more simultaneous accesses than the random case, while intentionally poor programming could yield many less. Due to the inherent limitation on parallel accessing as the number of paths increases, it seems advisable to remove all path connections when the access has been completed. In this way each step in the execution of a program starts with an uncluttered machine. Actually, by allowing the machine to have some paths still connected when the next execution step begins, there can be a path-connecting lookahead which could, in general, speed up computation more than no lookahead and an uncluttered machine. Preliminary logical design of a module indicates that clever logical circuit design could make the average path-connecting time about the same as the longest arithmetic operation time. Thus the average time to perform an operation becomes equal to the time required by the slowest operation, but there is no accessing time required during a sequence of execution cycles. To allow simultaneous path connecting from an active instruction to the 143

first operand (also arithmetic unit), to the second operand, and to the succeeding instruction, three independent path-connecting circuits are provided. There are no index registers (relative addressing) as exist in conventional computers. This is.necessary due to the unconventional scheme of accessing. In place of index registers, operations are provided so that one instruction can do arithmetic directly on the address part of another instruction, i.e., the address part of every instruction is essentially an index register. To further supplement addressing an indirect address can be specified. When a path has been connected from an instruction to some module, say X, and if the instruction specified indirect addressing, the path is extended according to the address in the memory register of X. The address in X may also be designated as indirect. Since the memory register of X is large enough to hold several addresses, each address position is interpreted and each can start an extension of the path into X. In this way one instruction with an indirect address can refer to a memory location with several indirect addresses, each of which can refer to other locations, etc. Thus, one instruction can control the arithmetic units of many randomly placed modules simultaneously. A more detailed description of this machine's operation is given in the next section which is essentially a programming manual. A more detailed description of the logic follows that section. 3.2.4 Instruction Code The programming of an iterative circuit computer must be flexible enough 144

to justify having a highly parallel computer rather than a number of singleprocessor computers. Since it is possible that hundreds of instructions could be executing simultaneously and these instructions could be using the same data, the hardware must provide some basic synchronization of instructions. Therefore, in order to simplify programming, the computer execution cycle proceeds as follows: a number of instructions are being executed simultaneously; each specifies locations of instructions to be executed next; when all instructions have completed execution all of the "next" instructions start executing simultaneously; and so on. Thus the execution of individual instructions is asynchronous, but the execution of sequences is synchronous. Even if the programmer specifies more instructions than the machine can execute simultaneously, the hardware is set up to process all of them in several bunches. Then, when all have been executed, the "next" instructions are started. 3.2.4.1 Instruction Format Every instruction has the same basic format: an operation code and three addresses. The first address, a, may designate the location of one operand for arithmetic and logical operations. The result of the arithmetic and logical operations replaces the contents of a. With conditional transfer operations, a may be used to specify the location of the next instruction. The second address, (, may designate the location of the second operand for arithmetic and logical operations, i.e., multiplier, divisor, etc. For some conditional transfer instructions, 5 is the location tested for the con145

dition. With shift instructions, P is the shift count rather than an address. The third address, y, designates the location of the next instruction. For some conditional transfer operations, the condition determines whether a or 7 specifies the next instruction. The number of bits and relative positions of an instruction are shown in the following figure: (n might range from 10 to 20 depending on the number of locations, 2n, in the computer). Operation Code I a address address address ~' _EA, v. j k_ v_. A_ _ g,. j k ~. YJ..._ _ j,__.. y y n + 1 n + 1 n + 1 n + 1 bits Fig. 59. Instruction format. The word length is 5(n+l) bits. If the leftmost bit of ac,, or r is 1, then that address is indirect. The remaining n bits specify the location to be used. The three-address scheme allows flexible arithmetic and control instructions to aid parallel programming and spatial program organization. 3.2.4.2 Execution Bits There are three more bits at each location to control execution instructions, called el, e2, e3. The e2 bit of a location is set by any instruction referring to that location as a successor by a 7 address, i.e., the contents of the location where the e2 bit is set will be active, or execute as an instruction, during the next execution cycle. The e3 bit of a location is set if the contents of the location are to be inactive during the next execution 146

cycle. The e3 bit is set rather than the e2 bit depending on the operation code, At the beginning of each execution cycle, the el bit is set if the e2 bit was set and the e3 bit was not set during the previous instruction cycle. If both e2 and e3 were set (by different instructions), the el bit is not set. Once the el bit is computed, both e2 and e3 are reset. These three bits influence execution in the following way: once the el bit has been determined at every location, the instructions in all these locations become active. Those instructions which successfully completed paths for a, 3, and y accesses reset their el bit and perform their operations. Some of these instructions will be setting e2 and e3 bits of other locations, While operations are being performed by these instructions, the others with el bit set but paths not completed try again to connect their a, P, and y paths. This process repeats itself, possibly requiring a number of attempts for some instruction to complete its path. The execution cycle terminates when all el bits have been reset. At this time, the next execution cycle begins with the computations of el bits as specified by the instructions of the preceding execution cycle setting the e2 and e3 bits. The hardware has been designed so that a large number of instructions can simultaneously have their paths connected. After an instruction has completed its operation, its paths are removed (disconnected). A priority scheme has been developed which allows any number of paths to be forming simultaneously, and which also guarantees that at least one path will be connected on each attempt. Therefore, in the worst possible case, the time re147

quired to execute a group of instructions activated during a given execution cycle will never exceed the time required to execute them sequentially. 3.2.4.3 Interprogram Protection To provide isolation of instructions and data, an additional bit is required at each memory location. If this'isolation' bit is set in some module, the hardware will not allow a path to be built through the module, but the module may still be a path termination. The ability to be a termination is necessary in order to allow for the resetting of the isolation bit. To isolate a program, those locations containing instructions on data which form a spatial boundary must have their isolation bits sets. If all programs in the machine have their boundary isolation bits set, there will be a barrier that prevents any program from accessing any other program,* 3.2.4.4 Indirect Addressing The contents of a location referred to as an indirect address are interpreted as shown in the figure below. There are five possible addresses and any combination may be used. An unused address is recognized by the fact that it is all zero. Any combination of addresses may be specified as indirect by setting the leftmost bit of these addresses. The computer timing imposes a limit of 40, at most, on the length of a sequence of dependent indirect addresses. *The isolation bit can be set or reset by the LOAD instruction with the appropriate modification of the operation code. 148

'.~. i _ IK, - 7 a,....... -— J....'... —_Y..... A... n+ 1 n + 1i n + n+ n+ Fig. 60. Location referred to by indirect address. ThIrough the use of indirect addressing it is possible to have one instruction perform its operation on the contents of many locations simultaneously. This is done by having a be an indirect address. The contents of the location that a refers to can have up to five more indirect addresses, each of which can refer to five more, etc. Thus, a tree structure of paths is connected from the instruction to many modules. Upon execution of the instruction, the operation code followed by the second operand is sent down the tree and all the terminal modules perform the operation simultaneously. Similarly, 7 can specify one successor directly, or many successors, through indirect addressing. 3.2.4.5 Arithmetic Operations The four basic arithmetic operations -- addition, subtraction, multiplication, and division - are available. The normal mode of full-word arithmetic is floating point. The mantissa is shifted to make the characteristic zero whenever no loss of accuracy occurs. In this way the programmer has the benefits of high speed when working with integers, and full accuracy by automatic scaling of non-integers. The format for full-word numbers is given below. The magnitude of a number is the mantissa (binary point to right of loworder bit) times two raised to the power of the characteristic. 149

Characteristic ai' I L iMantissa t I. \\\ - 2 (n + 1) n- 2 4(n + 1) \ \ ^~sign of characteristic \ ^ overflow or underflow sign of mantissa Fig. 61. Format for full-word number. Each number has an overflow-underflow bit which is set if at any time n-2 n-2 the magnitude exceeds 2 2 - 1or is less than 2-[2 - 1]. When the overflow bit is set, the remaining bits of the number are reset, made zero. Any succeeding arithmetic operation on a number with its overflow bit set results in another number which also has its overflow bit set. Normal arithmetic is performed on numbers even if their overflow bits are set. Overflow or underflow can occur only on a full-word addition, subtraction, multiplication, or division. There is also a conditional transfer instruction capable of testing the overflow bit. 3.2.4.6 Byte Modification To facilitate relative addressing and instruction composition, the arithmetic operations add and subtract, as well as load, complement, and, or, exclusive or, and some conditional transfer instructions can refer to any of the five n+l bit bytes. For designating which bytes are to be operated on there are five bits in the operation code which can be used to modify these operations. 150

O 0 0 0 0 specifies full-word arithmetic. Fig. 60. 0 0 0 1 specifies the operation is to be performed on the low-order n+l bits (al) of the operands. 1 0 1 1 0 specifies the operation is to be performed on the a5, a.3 and a2 bytes. 1 1 1 1 1 specifies the operation is to be performed on all five n+l bit bytes. All bytes are assumed positive, a negative result remains in the byte as its i's complement. There is no carry from one byte to the next. Carry-out of the high-order position is lost. These partial word operations may be used directly on instructions or indirect address words as if the locations were index registers. 3.2.4.7 Transfer Instructions Since each instruction has a 7 address to specify the location of its successor, only conditional transfer instructions are assigned specific operation codes. There are two basic forms of conditional transfer instructions: BRANCH and PROCEED. The BRANCH operation tests the contents of the location specified by the ( address. If the condition (specified in the three low-order bits of the operation code) is met, a signal is sent to the modules designated by the a address. Otherwise, the signal is sent as usual to the modules designated by the y address. The PROCEED operation compares the contents of the locations specified by- a and 5. The comparisons such as =., >, etc. are specified by the three 151

low-order bits of the operation code. If the comparison is not true, the instruction is treated as if no successor were specified. Otherwise, the successor specified by the y address is treated in the normal way. 3.2.4.8 Inhibit Modification The use of the word'signaled' rather than'transferred to' is necessary because in this machine many instructions can be executing simultaneously. It is possible for an instruction to specify itself as its successor for incrementing or counting purposes. Another instruction could be testing for a desired value. When this value is reached there must be a way to stop the instruction which is transferring to itself. The ability to stop an instruction from executing during the next execution cycle is called the inhibit modification. Every instruction has a bit in its operation code which if 1 causes the signal to the successor to set the e3 bit (described on page 146). If the inhibit modification bit is 0, the signal goes to the e2 bit of the successor. In either case, the effect of a signal applies only to the next execution cycle. Any instruction can be a local HALT instruction by having an all-zero y address. i.e. no successor, 3.2.4.9 Input-Output Instruction The 0a address of the input-output instruction refers to a module which is directly connected to a particular I-O device of the desired type. The memory register of this module may contain control information for the I-O device. The location of information which is entering or leaving the com152

puter is specified by the D address of the I-O instruction. Each I-O device has its own simple buffer between itself and the main computer. For a magnetic tape unit, the buffer may be core storage which holds several blocks of information. As long as there is information available on reading or space available on writing, the main computer uses only a normal length of execution cycle for an I-O operation. If the buffer is empty or full, execution is held up until the I-O operation is completed. Backspace, rewind, skip file, etc. are determined by the information in the location connected to the I-0 device. These require only a normal length execution cycle unless the queue of commands exceeds the buffer capacity, in which case further execution in the main computer must wait. With this type of I-0, the programmmer should give control information as early as possible and do information I-0 at the last possible moment. 3.2.4.10 Operation Codes Arithmetic 1. ADD <, 3, y The contents of a are replaced by a+-. (Byte or full word) 2. SUBTRACT a, 3, y The contents of ca are replaced by a - P or R - a, depending on the high-order bit of the operation code being 0 or 1 respectively. (Byte or full word) 3. MULTIPLY a, y The contents of ac are replaced by a * p. (Full word only) 4? DIVIDE 0, 3, y The contents of a are replaced by a/p or /Ca, depending on the high-order bit of the operation code being 0 or 1 respectively. (Full word only) 153

Logical - Byte modification applies to 5 thru 9 5. LOAD ac, y, The contents of a are replaced by the contents of P. 6. AND a, 3, 7 The contents of 2 are replaced by the bit wise AND of a with P. 7. OR a, 3, 7 The contents of Ca are replaced by the bit wise OR of oc with 3. 8. EXCLUSIVE OR a, y, P The contents of c: are replaced by the bit wise EXCLUSIVE OR, ring sum, of a with 5. 9. COMPLEMENT a, 7 The contents of a are bit wise complemented. Shifting - Full word only 10. SHIFT cz, (, y P is not an address. P is the number of bit positions the contents of a are to be shifted. The first and second bits of the operation code being 1 and 0 respectively determine left or right and end around or linear. Vacated positions on linear shifts are filled with zeros, A shift instruction with 5 = 0 is a NO OPERATION that requires 1 execution cycle and can specify a successor. 11. SCALE a, P, 7 The contents of ac are treated as a floating point number. The low-order n-l bits of 5 are treated as a sign and magnitude of a characteristic. 5 is not an address. If the first bit of the operation code is 1, then the mantissa in a is shifted so as to make the characteristic equal to 3. If the first bit of the operation code is 0, then 5 is added to the characteristic in C. and the mantissa in c. is shifted accordingly. The second bit of the operation code being 1 or 0 specifies rounding or truncation respectively. Transfer 12. BRANCH if (P) (z, 5, r If i (3) is true,a signal is sent to location a, otherwise the signal is sent to 7. 1L(n) may be any of the following: a) D = 0 (Byte or full word) b) 3 negative 154

c) 3 has overflow bit set 13. PROCEED if ac a,, y If a.' is truea signal is sent to location 7, otherwise no signal is sent..may be any of the following: a) a > b) a > c) a = d) a < e) a < 3 f) ao All relations above may apply to byte or full word. g) the Pth bit of a is a 1 h) the 3th bit of a is a 0 If ac refers to more than one location through indirect addressing, the logical OR of the a's will be used to test the relation. Other - Full word only 14. INPUT-OUTPUT a,, 7 a is the module which controls the I-O device. The memory register of a contains the command for the I-0 device while 3 specifies the location into which information is read, or out of which information is written. 15. SENSE PANEL a, y The contents of the display panel is the address of the last module where an unresolvable programming error was detected. e.g., trying to execute an undefined opeation code. The contents of the display panel replace the 3 address position of location a. 16. SET ISOLATION a, y The isolation bit is set to 1 at location a. a may still be referred to by other instructions but no access can be made which would use a path through a. 17. RESET ISOLATION a, y The isolation bit is reset to 0. 18. ERROR MODE y Depending on the first two bits of the operation code being 1 or 0, this instruction sets the mode of operation to: 155

continue or stop executing instructions of type 1 thru 18 and activate or inhibit ERROR START instructions respectively. The mode remains set until changed. 19. ERROR START y If there is an error and the computer is in ERROR START activate mode, all instructions with this operation code become active during the next execution cycle. This concludes the description of instructions available in the hardware of the computer. Because many operations have bits which further qualify them, an assembly language distinguishing the various operations would be useful to programmers. The operations are meant to be convenient to the general-purpose programmer. Many special instructions, symbol and list manipulation, etc., have purposely been omitted to keep the amount of hardware to a minimum. This should cause no loss of speed since special instructions can be achieved by clever programming using simultaneous application of those instructions given above. 3.2.5 Physical and Logical Design Now comes the problem of determining how much circuitry would be required by a machine as described in the earlier sections of this report. The most accurate way to determine the required number of components is to do a complete logical design. Even then, the cleverness of the logical designer and the choice of component types could affect the result by a factor of two or three. Considering the time involved to do a detailed logical design and considering that we are far from the cleverest logical designers available, the following approach was taken: The part of this machine not found ill con156

ventional computers, the path-connecting circuitry, was designed in some detail. The logical circuitry for arithmetic operations, timing pulse generations, etc., was not designed. Instead, their requirements are given with estimates for the number of components required based on current technology. We will first consider the somewhat conventional hardware that must be in each module. Even here the logical design would not really be conventional. Where, at most, hundreds of computers of a given type may have been built, we are talking of building thousands of modules for a single machine. Although a module is versatile when embedded in an I.C.C., it is far from being a complete computer. Thus, due to the greater importance of economical design and lesser requirements, a greater effort could be justified for a fully integrated, clever, logical design. A number of trial modules could be built, tested and perfected with the goal being low costing mass-produced modules. There are several components which could be used in the construction of modules. For example, RTL circuits can be produced fairly inexpensively in quantity using low-speed, low-power transistors. The RTL circuits which are currently being manufactured by deposition techniques have a density of about 100 transistors and 400 resistors per square inch. Circuits such as these used in an I.C.C. have the advantage of less noise pick-up since the physical size of a module can be small and the connecting leads between modules can be correspondingly short. A second component potentially useful for module construction is all-magnetic logic. Again, this is not the fastest possible logical component but it is reliable and potentially can be manufactured by automated equipment. Multi-aperture cores and other types of all157

magnetic logic require fairly close tolerances, thus careful design. Moreover, this type of design is well suited to modules which have relatively few external connections to other modules. A final example of a potentially inexpensive and fast component is the cryotron. Again, automated production may be possible, and making a large number of identical modules should reduce considerably the cost per module, At first glance, everyone considers an I.C.C. impractical, even with inexpensive construction, since it could have thousands of modules which seem to be simplified versions of processing units in conventional computers' Although an I.C.C. requires many times the number of components in conventional computers, one cannot expect to get simultaneous accessing, simultaneous arithmetic,and simultaneous instruction processing without more components. To show that a module is far less than the processing unit in conventional computers we will list all the circuitry that is not in a module but is in conventional processing units. First, there are several obvious registers that are not required in a module. There is no sense (storage) register or address register since there is no store to access. There is no instruction register or instruction counter since execution, not instructions, moves from module to module. The one-word store in the module corresponds to a conventional accumulator. By having numbers in the integer form with scale factors, the multiply and divide operations can be performed in a single-length accumulator. The next major block of circuitry, not within modules, is for timing. There could be one timing unit which would make all the required sequences 158

of control pulses available to all modules. By having more specific timing sequences available than in conventional computers, the amount of logic in a module can be greatly decreased. The central timing unit becomes correspondingly larger, but the component saving in one module multiplied by the number of modules should be far greater. Considering that instructions are all the same basic format and are relatively stationary, a scheme exists for having various bits in. the memory register of a module directly control gates when the module is executing an instruction. This would eliminate most of the instruction decoding circuitry existing in conventional machines. There would have to be a basic adder and the arithmetic control logic in every module. Here, a decision between serial and parallel arithmetic would have to be made based on the differences in speed and cost. The remaining circuitry in a module is the path-connecting logic. In place of drivers, cores, and sense amplifiers, the path-connecting logic closes gates in various modules, forming a path to access information in other modules. To give an idea of how much circuitry is required for path connecting, a fairly complete logical design of this follows: The basic segments from which paths are formed are conductors from the periphery of one module to the periphery of another. Fig. 62 shows the top view of an I.C.C. with the modules appearing as squares. Fig. 63 shows how each module passes through a number of thin layers on which the path segments (the conductors) are placed (by printed circuit or deposition techniques). The addresses 0a, A, and y are shown here to have completely isolated path 159

structures. Correspondingly three times as much path-connecting circuitry would be required in each module. The choice of separate layers for each address can stem from the fact that this is well over three times as fast from computation standpoint and yet requires less than three times the circuitry since each layer can be specialized to its particular address bit positions. Before describing the logical circuitry for path connecting, we will explain the function the circuitry must perform. Basically, the problem can be stated as follows: There is a binary number in the memory register of some module. This binary number refers to another module. The circuitry must close gates to form a path between these two modules. The path must allow information to flow in both directions. A path need not be a single wire; it could be physically a bunch of wires for transmission by bytes or in parallel, and there could be two separate circuits for transmission in each direction. For convenience of explanation and simplification of logical circuitry, a scheme with one wire for each direction will be used. See Fig. 64. The connections to the module labelled P1 through P6 are physically n wires (where there are 2n modules in the computer). Information can flow in P1 out P2, in P3 out P4, and in P5 out P6 without affecting the operation of this module. This module may initiate paths along a wire of P2, P4, and P6 as determined by the a, 3, and y addresses. Other modules may access this module by having their paths terminate in the P1, P3, or P5 lines of this dule. When this module is used as the a address of some instruction, the operation code o.t the instruction enters before the ( operand. The operand code enters via some wire of P3 and is placed in the arithmetic control ope160

Fig. 62. Top view of the I.C.C. / ~_i i~ i. i _ i EXAMPLES OF CONDUCTORS _-!!~!' ~y IN LAYERS THAT DIRECTLY i~ ~ ~ ~ ~ ~~!_I i iCONNECT PAIRS OF MODULES. TIMING AND SYNCHRONIZATION FROM CENTRAL CLOCKS TO EVERY MODULE Fig. 63. Side view of the I. C. C. 161

ration register. The control of arithmetic operations in this module comes from the operation register and not from the memory register. No addresses need be sent to the module acting as an arithmetic unit since the module containing the active instruction is doing the required switching to set up the operand accesses. Figure 65shows the significant information flow when an instruction executes, In this example, the contents of the memory register of module Y are being added to the memory registers of modules X1 and X2. The ADD instruction is in module R, and indirect addresses are in module X. Execution proceeds as follows: Step 0 This el bit (described on page 146of this report) is set assuming an activate signal was sent to module R over one of its P5 paths. The e2 and e3 bits in R are reset. Step la A path is connected from R to X. (The prime indicates that X is an indirect address.) Then two paths connect from X to X1 and X2 respectively. lb A path is connected from R to Y (second operand). lc A path is connected from R to S (next instruction). Step 2a The e2 bit in module S is set. Removal of the path between R and S begins at S. 2b The operation code from the memory register of module R is sent to the operation registers of X1 and X2 via X. Step 3 Module R controls the gating of the contents of the memory register of Y to the path into X. Module X controls the gating from its P1 input to the two lines to X1 and X2. X1 and X2 set their gates to send the contents of their memory registers and the incoming path from X to their adders respectively. Step 4a X1 and X2 gate the resultant sum back to their respective memory registers. 162

MEMORY REGISTER..,-_ --' - -...P2 d, -P d2 PATH 3 P CONNECTING -_ CIRCUITRY P4.f BASIC ARITHMETIC, LOGICAL, AND SHIFT CIRCUITS'' P5 P6 ~C I. i OTHER MISC. MODULE CONTROL CIRCUITRY ARITHMETIC CONTROL OPERATION REGISTER CENTRAL TIMING AND SYNCHRONIZATION Fig. 64. Function and flow block diagram of module. 163

4b Removal of the paths into X1 and X2 is begun at X1 and X2 respectively. 4c Removal of the path from R to Y is begun at Y. Step 5 When all paths have been removed to R, the el bit of R is reset. Step 6 When all el bits are reset the central synchronization emits a signal to all modules which compute the new el bits and Step 0 begins again. This completes the description of Fig. 65involving the overall path structure, We will now concentrate on one type of path, say C. (For convenience of construction, all three types of paths, a, 5, and y would probably be the same logic, or all three could be operating simultaneously in the same circuitry if some restrictions were placed on programming.) The decision procedure for connecting a path that must be performed in each module requires two pieces of information. Each module must know its own binary representation as an address, called'HERE (This can be wired into the layers shown in Fig.58 thus allowing all modules to be identical and interchangeable.) Also, each module must know which of its accessible n path segments are busy. (This we will call the'BUSY' register.) The n bit address of the termination of a path can come from n+5 places, i.e., n from the n path segments connected to this module plus 5 from the 5 byte positions of the memory register. For instructions, only the three loworder bytes are addresses which could initiate paths but an indirect address can cause all 5 bytes to initiate paths. Suppose an n bit address has reached a module. This is the address of the termination of a path. By taking the bit-wise exclusive or of this n bit 164

ACTIVATION SYNCHRONIZATION TO ALL MODULES FROM CENTRAL CONTROL r ~. _ LX, y LR LS MEMORY REG. ADD XYS INSTRUCT. F, A DDER, Pa P,P, i12____~ L__,_ MEMORY REG. 10 0 0 XIXI MEMORY REG. F: Fig. 65. Information flow during execution.

address with the An bit representation'HERE', those positions of the result which are 1 denote the possible path segments which can serve as extensions for the path. This is just a reduction of 1 in the Hamming distance since each neighbor of a module differs from it in exactly one bit position. We will establish the convention that the lowest 1 bit resulting from the exclusive or will be tried first as an extension of the path. It may be that the desired segment is already being used by another path, in wiich case the BUSY register has a 1 in that position. To eliminate busy segments from potentially useful segments, the complement of BUSY is added to the result of the previous exclusive or. This result is retained in a'GOING' register. To connect from the n possible incoming path segments to the n possible outgoing segments an n x n switching matrix is used. Five more inputs are appended to the switching matrix to allow for path initiation, and a diagonal pair of wires allow for path termination at a module. The operations of path connecting are staggered such that all modules witl an even number of l's in their addresses,'HERE', extend (or remove) their paths one segment during alternate times with modules having an odd number of l's in their addresses. In this way, priority problems are avoided which involve two adjacent modules trying to connect to their common segment. It is possible to have two modules connect a path to the same module at the same time and have the same destination for both paths. This priority decision is made by the circuitry just prior to setting the gates of the switching matrix. The circuit for this two-dimensional priority selector is shown in Fig. 67. The composite of the logic just described is shown in Fig. 66. The 166

one-directional segments are shown as B1, B2, B3,.6,~Bn grouped under the P1 designation. The one-directional outputs are grouped under the P2 designation. For a path to pass through a module two inputs will be connected to two outputs with reversed subscripts, thus forming a piece of a two-directional path. If a path cannot be extended due to complete blockage by other paths, a NO-GO signal is sent back towards the origination of the path. Upon receipt of a NO-GO signal, a module selects the next (higher order) potentially useful segment from the'GO' register. To further explain the logic involved, an example of the progression of a path connection is given in Fig. 68. Here we have a machine with n equal to 4. Only 8 of the 16 total modules are shown and only the values of'HERE','DESTINATION','BUSY' and'GO' are shown in boxes. The path-segment connections B1, B2, B3, B4 each correspond to a pair Pi and P2 shown in Fig. 66. We will concern ourselves with the path originating at module 0001 with the destination 1010. We assume two other paths, indicated by..... and --—, are already present. The path connecting proceeds in two phases. During phase A, the modules with an odd number of i's in their address perform the logic to compute the contents of their'GO' registers, and the modules with an even number of l's in their address transmit the'destination' over the path segment specified by the lowest 1 bit in their'GO' registers. During phase B, the roles of the two sets of modules are reversed. To get the phasing started there is no transmission between modules on the first step and no logic on the last step of path connecting. Thus we have for Fig. 66: 167

\n 5 SETSI nOF THISPA _ LOGI I 1 II MMOUY' TO b. EV REGISTERSWI CJ BIT MATRXPRIORITY SELECTOR ~~~~~~~~~NOTATION Pz GO IF X, a O ING MATRIX EXCEPT FOR THESIGN IHEREi IS TER- BACK TO I HERE FT^S MINATI PREVIOUS NOR MODULE \ 1/\ \ \MODULE OR FROM THIS I-JC-I,__ ONE CONNDESTINATI ON THE DIAGONAL Fig. 66 Functio n PATH for pthconnecting circuitry. 6INPUTS 5 MEMORY REGISTER d, X ^ SWITCHIING MATRIX NOTATION Pg IF Xz l, a IS THERE ARE TWO INPUTS TO CONNECTED TO b. EVERY GATE IN THE SWITCH - IF X x O, a IS NOT ING MATRIX EXCEPT FOR THE CONNECTED TO b. GATES ON THE DIAGONAL 168

I 0 I 0 all 12 a13 07, 022 0 23 i _^^^^ -- "' 0a21 - 022 a031 7 a032 x ^0 033 0 1 I, to — A 31a a3. an +5,1 ~ ~ ~~ an +5,n Fig. 67. Two-dimensional priority selector. 169

Step 1A The module 0001 is initiating a path and therefore computes the contents of its GO register. (HERE $ DESTINATION) " (BUSY),+ GO. is (0001 ~ 1010) - (0000) 1011 Other modules could simultaneously be initiating or extending paths but for simplicity of explanation only one path is being considered. The lowest 1 in the GO register of 0001 determines wi:ich segment will become a part of the path. This segment connects to module 0000 which is one step closer to the destination than 0001. Step 1B The BUSY bit for the B1 segment is set in both 0001 and 0000. The destination is sent along the segment k1' to O000. The GO register of module 0000 is set to (0000 ~ 1010) (0111) - 1.000. The second and third BUSY bits had been previously set when the path coming in B3 and going out B2 was built. The lowest 1 in the GO resister of 0000 specifies that the B4 segment, 2 is to become a part of the path. Step 2A The 4th BUSY bit is set in 0000 and 1000 and the destination is sent from 0000 to 1000. the GO register of module 1000 is set to (1000 & 1010) (1010) - 0000. Step 2B Since the GO register is all zero a NO-GO signal is sent back along (2. The 4th BUSY bits in 1000 and 0000 are reset. Upon receipt of tie NO-GO signal the module 0000 sets the lowest 1 in its GO register to zero. (In this case making it all zero.) Step 3A Since the GO register is all zero a NO-GO signal is sent back along;1. The first BUSY bits of 0000 and 0001 are set to zero. Upon receipt of the NO-GO signal, 0001 sets the lowest 1 in its GO register to zero. Step 3B The lowest 1 in the GO register of 0001 is now in position 2, thus a segment 5 is added to the path. The 2nd BUSY bits are set in 0001 and 0011 and the destination is sent from 0001 to 170

NOTE 1: 0001'HERE' THE BUSY BITS IN MOD- DESTINATION' GO' ULES O000, 0010, 1000, 8 1010 WERE SET BY OTHER 010 101 1 PATHS BEFORE THE 1-2-3- 4-5-6 PATH REACHED BUST ACTIVE THESE MODULES. 000 1 BUILD 0-0=0 PATH 1-2-3-4-5-6 PATH O MEANS THE BIT WAS INI- TO 1010 STARTS HERE TIALLY 0 BUT WAS CHANG-(~ B3 ~ lB21 B ED TO A ONE BY THE -- TIME THE PATH WAS FINISHED. O MEANS SET I THEN O. 0011 I 1001 0000 1010 1001 1 101 1000 0000 0000 I n 1 _ ~ _~~.~i 1 lt....PREV!OUS.TH FROM HERE 11 I 00 1 10 ooo 10 10 0001 10 lo 0000 1o0101 0000 ~-oo~r I0 II I:loo I O00j B4B3 82 B 4 B4 B 43 B 2 B, Bi ~ ~L - - ~~~~I, NOTE 3; ( SOLID LINES ARE i010 (TERMINA- TWO DIRECTIONAL TION FOR I PATH SEGMENTS NOTE 2:' _________ 3 PATHS) CONNECTING PAIRS 1010 2: OF MODULES. NoT0 000I I'U:o THE LOGICAL CIR( FINAL PATH CUITRY WITHIN TRIAL11 SEGMENTS1 THE MODULES DEB TRIAL SEGMENTS I 0 TERMINES WHEN V BUILT AND ERASED SEGMENTS ARE IN B4B03 B2 Bi S -.~~'PATHS THAT GOT -— JTHERE FIRST...... L iL_ Fig. 68. Progression of a path connection. 171

0011. The GO register of 0011 is set to (0011 $ 1010A- (0010) 1001. Step 4A The segment (4/) is added. The GO register of 0010 is set to.(0010 @ 1010 (1011) = 0000. Step 4B A NO-GO signal is sent back along ( Thle lowest GO bit is set to 0 in 0011. Step 5A The segment 3 is added. The GO register of 1011 is set to (1011 $ 1010), (1000) 0001. Step 5B The segment y6 is added. The termination is detected since (1010 a 1010) = 0000. Execution using this path can now take place. The logic and transmission properties could be designed to perform one phase per basic clock time. Thus the example given above would require 10 basic clock times to complete the path. Lookahead could be accomplished by having paths connecting by successors while the arithmetic operations of the predecessors are being performed. It seems that the average path-connecting time will require about the same time as an average arithmetic operation. 3.2.6 Conclusion The basic question of the economics of an I.C.C. is: How much is fast computation worth? We have yet to hear a concrete answer to this question, and indeed there will probably never be a simple answer. The consensus seems to be that a computer twice as fast in every respect is hot worth twice the 172

cost. To determine how much speed is worth, the reliability, type of problem, qualifications of the programmers and numerous other factors must be considered. For a computer such as being described here, thereis one further factor to consider. That is: How parallel is highly parallel? There are examples of problems that could be done on this machine in 1/1000 the time required by a conventional computer built from the same components. There are other examples where this machine could barely cut the computation time in half. We h'ave i..o accurate measure of the average parallelism possible in this machine. Based on our experience in considering a few problems, we estimate that on the average between 10 and 100 instructions could be executing simultaneously on a medium-sized computer. This is to be contrasted with our educated guess of a cost 10 to 100 times that of a conventional machine. A medium-sized I.C.C. is certainly within engineering feasibility. Perhaps the first machine of this type should be designed for a user with much computation suitable for parallel processing, i.e., on problems involving matrices, solving systems of equations, inverting matrices, finding eigenvalues; or in other ssecific problems such as solving boundary value differential equations by the relaxation method. In these and some other problems, hundreds of calculations could be made simultaneously. By specificall-y choosing the command structure and size of the machine for a few specific problems, an economically competitive computer could be built today. The rather powerful machine described in detail in this report is tailored to a Jieed: ot yet fully developed. Until some good frogralmmers and numerical analysts have such a machine in their ilands, it is difficult to pres 173

dict how much potential a computer of this type will have. We are optimistic that the iterative circuit computer organization is one of the methods that will enable computers to do much more computation in a given time. 174

3.3 HARDWARE REQUIREMENTS FOR MACHINE AS DESCRIBED The path-connecting logic and registers require the following hardware in each module: Quantity Bits of Storage Logical Elements Description.n+ n'GO' storage memory 1 5(n+l)+4 storage'BUSY' storage 1 n 3(n+5) n input logic, NOR logic n2 switching logic Switching matrix* n(n+5) one stage priority priority circuit For a 4096-module machine n would be 12. The number of bits of storage would be (204 + 69 + 12) 4096 = 1,167,360. (This is a few less than the number of storage bits in conventional memory of 32k 36-bit words). There would also need to be about another 1.6 million simple logical elements. We estimate approximately 400 logical elements for the arithmetic unit in addition to 26 bits of storage for the operation control register. Another 100 logical elements would be needed for miscellaneous module control. These would be for computing execution bits, signaling successors, and routing operands to the appropriate paths, etc. Assuming the central timing and synchronization to be less than 10% of the mac hine, the total number of storage bits and logical elements would be less than five million. * For serial two-way transmission, multiply by the number of bits to be sent in parallel. 175

3.4 MATRIX INVERSION PROGRAM FOR AN I.C.C. INSTRUCTION OPERATION ADDRESSES,, 7 LOCATION * THE NXN MATRIX IN THE'A' REGION IS * INVERTED BY A GAUSS-JORDAN METHOD. * THE INVERTED MATRIX REPLACES THE * ORIGINAL CONTENTS OF THE'A' REGION ENTRY PROCEED= 0,0, SET1' (1) * FIRST EXECUTION STEP OF A THREE-EXECUTION * STEP LOOP THAT WILL BE PERFORMED N TIMES. SET1'(1)...SETL'(N) INDADR LA'(1),E(1),Q'(1)... LA' (N),E(N),Q'(N) SETi' (N+1) INDADR EXIT E(1)...E(N) LOAD AKK, A(1,1),F(1)...AKK,A(N,N),F(N) * LA' (1)...LA'(N) INDADR L(1,1)...L(N, 1)...,L(1,N)...L(N,N) L(l, 1)...L(1,N) LOAD B' (1),A(l,1),M(l,l)...B' (1),A( 1,N),M(N, 1) L(N,1)...L(N,N)... B'(N),A(N,1),M(1,N)...B' (N), A(N,N),M(N,N) B' (1)...B'(N) INDADR AT(1,1)...AT(1,N),...,AT(N,1)...AT(N,N) * Q'(1).. Q (N) INDADR U(1,2)...U(1,N),,U(I,1)...U(I,I-1),U(II+1)... ETC U( I,N),, U(N,1)... U(N,N-1) U(1i,1)...U(1,N) DIVIDE A(1,1),A(1,l).. A(1,N),A(1,1),T(1) U(N,1)...U(N,N)... A(N,1),A(N,N)...A(N,N),A(N,N), T(N) * * SECOND EXECUTION STEP F( 1)... F(N) DIVIDE A(1,1), AKK, G( 1)... A(N, N), AKK, G(N) * M(1,1)...M(1,N) MULTIPLY C'(1),A(1,1),S(,l)...C' (N),A(1,N),S(1,N) M(N., 1)...M(N,N)... C'(1),A(N,1),S(N, )... C'(N),A(N,N),S(N,N) C'(1)...C'(N) INDADR AT(1,1)...AT(N,1),...,AT(1,N)...AT(N,N) T(1)... T(N) LOAD V' (1), ZERO,Y(1).. V' (N), ZERO,Y(N) V'(1))...V' (N) INDADR A(2,1)...A(N,1),,A(1,I)...A(I-1,I),A(I+1,I)... ETC A(N, I),, A(1,N)... A(N-1, N) 176

INSTRUCTION i LOCATIN OPERATION ADDRESSES, _, LCATION * THIRD EXECUTION STEP G(1)..G(N) DIVIDE A(1,1), AKK, SIET' ( 1+1)... A(N,N),,A, SEKS' (N+1) * S(1,1)... S(1,N) SUBTRACT A(1,1),AT(1,1l)...A(N,1),AT(N,1) S(N, 1)...S(N,N).... A(N,1),AT(N,1)...A(N,N),AT(N,N) * * ON THE ITH PASS THROUGH THE LOOP * THE ITH ROW OF SUBTRACT INSTRUCTIONS * I IS INHIBITED. Y( 1)....Y(N) INHIBIT — Z 1)... (N) Z'(1)... Z' (N) INDADR S(l,)...S(l,N),... S(N,l)... S(N,N) * ~ * i END OF COMPUTATION LOOP * ~* I: STORAGE ASSIGNMENT A(1,l1)...A(1,N) DATA A(N,1)...A(N,N) AT(1,1)... AT(1, N) TMPSTR AT(N, 1)... AT(N, N) *. i AKK TMPSTR ZERO DEC 0 END 177

To illustrate the Gauss-Jordan method the following ALGOL program is given. procedure INVERT (N,A); value N: integer N; real array A; comment The N by N matrix in the A region is inverted by a Gauss-Jordan method. The inverted matrix replaces the original contents of the A region; begin integer I,J,K; real AKK,AIK; for K:=O step 1 until N do begin AKK:=A[K,K]; A[K,K]:=l.; for J:=O step 1 until N do A[K,J]:=A[K,J]/AKK; for I:=O step 1 until N do begin if I#K then begin AIK:=A[I,K]; A[I,K]:=O.; for J:=0 step 1 until N do A[I,J]:=A[I,J]-AIK X A[K,J]; end skip reduction of the Kth row; end Kth column finished and matrix to left reduced; end all N columns finished; EXIT: end INVERT A few additional comments should enable the interested reader to understand the details of the T.C.C. progrsf. First, the sequence of instructions which form the "loop" are: ENTRY -SET( 1) E(1) +F(1) -G(1) ET ( E(2) F(2)... F(n)+ G()-SET(N+) EXIT. Thus, the number of execution steps is 3N+1. 178

Next, the instructions are the 3-address type described in Part 4. The first address is one operand and the location of the result; the second address is the second operand; and the third address is the next instruction to be executed. The last address being omitted implies it is not used, while a "-" implies an intermediate address is not used. An address with a prime,, is indirect. Indirect address words may refer to more than one other word indirectly, thus enabling one instruction address to refer to many locations. Fu'rther, the "*" at the front of a line implies that the line is a comment. Finally, the.. notation has the usual meaning: to generate all intermediate subscripts. 179

4. DETERMINATION OF ACCESSIBILITY 4.1 DESCRIPTION OF AN ITERATIVE CIRCUIT COMPUTER Only a brief summary of the I.C.C. concept is presented here. For additional details the reader is referred to References 8, 30. Basically the I.C.C. consists of a large number of identical modules. The complexity of the module may be high or low. For example, each module could be considered to be a general-purpose digital computer. At the other extreme, the module could consist of the logic required for some logical prirritive. Furthermore, the communication paths between modules are established in a uniform manner. In general every module may communicate directly only with the modules which are immediate neighbors. The number of neighbors is determined by the geometry of the I.C.C. In the Holland I.C.C. concept, data are accessed and instructions are sequenced by the construction of paths to and from modules. A module which is excuting or interpreting an instruction is termed an "active module." The module from which a path originates is called a "P module." Any number of modules may be active simultaneously. Thus, within a given I.C.C. many different programs may be executed simultaneously. Since all modules are identical every module is required to have the capability for instruction interpretation and execution, data or instruction storage, and path interconnections between modules. The nature of the Holland I.C.C. concept is shown by the example presented in Figure 69. In this example, the simple problem "Add X to Y and store in Z" is executed by an I.C.C. consisting of eight modules. The machine operation is divided into two phases: path building and execution. These two phases are not permitted to overlap in time. 180

D1 l Dl* This is the program as initially ADD CLA stored in the computer. The pro2 3 i gram starts in the path-building R1i phase with the module marked'' STO (X) active. The symbol <- indicates ^_Lo ____ ____ -the next module to be activated. The symbol -o indicates the path (Y) (Z) from the active module to the 7__.~ 1 _accumulator (A). The circled numbers are for module reference only and are not part of the program. D1 D1 The path is built down 1 (D1) ADD ICLA from'P' module under the control.._____ of module 2. The operation, clear and add (CLA), clears the accumuR1 () lator of the'A' module and adds STO the contents of module 6 to the accumulator. (Y) (Z) D1 D1 Module 1 is the successor, denoted ADD S CLA - by - of module 2. Module 1 be2 *____ comes active when the CLA instrucR1 tion of module number 2 is completed. STO (X) Under the control of module 1, __L5 ____ ____ the path is extended down 1 (Dl), and the contents of module 7 are (Y) (Z) added to the accumulator of the _1'A' module. The successor is now module 5. D1 D1 The path is extended (R1) right ADD CLA P A one to access module 8 under the ~i ~~ control of module 5. The store R1 operation causes the contents STO (X of the accumulator of the'A'..... ~module to be placed in the storage (Y (Z) register of module 8. Figure 69. Simple I.C.C. Program 181

Note that computation of this program could be halted temporarily from storing the sum in module 8 if some other program had a path passing horizontally through modules 7 and 8. Although two paths can cross in a module, there is only one path segment (communication channel) permitted between any pair of modules. Thus if some segment is needed by two paths at the same time, one path must be deferred or altered. In the I.C.C. concept, the term accessibility refers to the ability of active modules to complete the path building phase. This criterion of path interference is used in the analysis of path building which follows. 4.2 EVALUATION TECHNIQUES In general the evaluation of the effectiveness or the efficiency of a given machine design is difficult. Specific machine parameters which contribute to machine utility are known but the relative weights which should be assigned to each are usually impossible to determine. The difficulties in the evaluation of general-purpose machine are due in part to the fact that the class of solvable problems is not defined in a manner which facilitates a discussion of such things as efficiency of effectiveness. In the absence of a general measure for machine utilization two specific types of studies are conducted in an attempt to evaluate a given machine design: (1) The machine's performance can be analyzed with respect to different types of problems. The analysis can be conducted by direct simulation or by means of analytical techniques. (2) Often it is possible to isolate basic characteristics of the machine design. Such basic characteristics frequently may be evaluated independently of problem characteristics or other machine characteristics. 182

Studies of the first type become meaningful only when a Large number of different problems have been considered. In fact it is difficult to obtain sufficiently large samples except over a long period of actual computation and evaluation by machine users. Thus, when only small samples of actual computation experience are available, studies of the second type may be of great value to the machine designer. Both types of studies are presently being conducted to determine the relative effectiveness of the Holland I.C.C. concept. 4.3 THE MATRIX INVERSION PROBLEM The material in this section is a comparative study of three different machines applied to the general problem of matrix inversion. The first computer considered is the conventional computer containing a single processing unit. The second machine is a two-dimensional I.C.C. of the Holland type. The preliminary studies indicated specific difficulties due to path interference. Thus the third machine considered is an I.C.C. of the Holland type constructed on an N-cube. This type of geometry greatly facilitates path building since each module has N immediate neighbors. The interconnections required between neighbors in an N-cube geometry are obtained by N connection planes. Each plane has a uniform connection pattern. There exist many different numerical methods for matrix inversion. The degree of local control required is different for each method. The Gauss-Jordan method was chosen since this appears to permit the greatest degree of parallel computation. The comparative results are given in Table 2. Only the relative order of 183

TABLE 2 Comparison of Three Computer Organizations For the Matrix Inversion Problem (M x M Matrix) Conventional 2-Dimensional N-Cube Computer I.C.C. I.C.C. Number of 2 data wordsM 2 MM Number of instruction 150 5 to 2 72 words and tem- to porary storage Total words M + 150 6 to lOM2 8M2 Minimum possible length of 5M M3M + 1 the program Maximum number of instructions 1 2 executed sim- 2 ultaneously Total words times length 5M3 18 to 30M4 24M of program 184

I s' —'-' u' 1, AI UB P A....SU, o.,I A P I SjB I A =. B.i=.-. i 1".. e-..i!P- I I I ~-~ J-~ J-.1- ^. -s, L JL 2 -,.P ISUB 1F' -.i A - (Gauss-Jordan. mto),',;~ 5....: -' — 140 A I A I i i> —riXJs i P S J; { | i( ui-?I0 I., I r- - ~~ ~ 3 0 S;~s -.> -- >o P r,_.......:-:...:, I Lh I —B 6Partf r If P arV2 — ~,., 1g p 0 I | I li t 1! 1 -I^'J'.. -....... 1-1 - i' I I~ — Numzbers -'Lc"4irc les refer to) Pa-rt iri t0 Part SX --- 185

^ I I ^ II - I po Sa t18^ iet a ma ti'i E j' -w<^:;2,t.....,..?.!, "i i $ S - I, J j0 I I OLA I is ^ C- js', f g *e-C' I " P p:! I P II: I i Is-c~ Iscl P I I1 r ft %, I| i i:~,' It I: E'!ri i t. tr x an Crd2 I i^"^r'^''' ^ ^i 1 i'i r i i --- r- f i i i.1 1 ^ - I~c-t~-~ri I " ft 1 5 I P I I ^. I I I'. ^I I P I IC I IAI I 1 1::-L. i~ IJm'r;. r-;e -. P^''""'i.. f\; fy -1 E w t-' Co -: i86 <vwt rW;~Ni

the magnitude of these figures is significant. Observe that an order of magnitude decrease in the length of the program exists between the conventional machine and the two-dimensional I.C.C. and the N-cube I.C.C. At the same time, the number of instructions executed in parallel changes by a factor of M between the conventional machine and the two-dimensional I.C.C., and by another factor of M between the two-dimensional I.C.C. and the N-cube I.C.C. The twu-dimensional I.C.C. performance is limited because of path building conflicts. Often it is impossible to access data, and instruction cycles are wasted until the desired data can be accessed. This situation occurs less frequently in the Ncube I.C.C. Path building conflicts could be removed by the multiple storage of data but this is an expensive solution in the I.C.C. structure. Frequently additional instructions must be used in the two-dimensional I.C.C. to alleviate accessibility problems. Because of this, the two-dimensional I.C.C. programs may require more modules than the N-cube I.C.C. In both I.C.C. structures, a work requires a module. Thus, a good estimate of the number of modules required 2 to perform the inversion of an M x M matrix is 8M. The matrix inversion program for a 5 x 5 matrix using a two-dimensional I.C.C. is shown in Figures 70, 71. In this specific case, " 215 modules are used. The I.C.C. has no unprogrammed central control between modules. The central control program required for the matrix inversion is shown in Figure 72. Notice that the I.C.C. organization generally requires more instructions and temporary data locations. This is due directly to the increased degree of parallel computation. Ordinarily, modules active at the same time period are not executing identical instructions. Therefore different instruction sequences are required for different module as187

-8) T- ~^) -.3 i - ln^^ial r^'.^ ~^- "-: -, "I " ~ s"I — - ~ ~ r'"' -" """ ^U FT "')f -^ ~-' - I I I A ~ I I I i, s-.^ 1 "'' CLa I ~:B; iH "^r:~T I "^:LAs^B I I I " x- -^i - " I I I: t ^ ^ " ^ ^ I I ^ yi S gt r- F Xq~ } r r;5 i~ Iv. _ A- I-; |, *; I I. 4 | yX - (X) 1 - I -';'-W ~,'Vi 3i!. t2 1 7 Fig. 72. Master control timing program for a 5x5 matrix inversion 188

semblies. The minimum possible programlengthis 3M + 1. This limitation is due to the nature of the Gauss-Jordan algorithm and cannot be lowered. This program length can be achieved by the N-cube machine but is difficult to achieve with the two-dimensional machine because of the limitation in the accessibility of data. The results of other programming studies are similar to those presented here for the matrix inversion problem. Specific types of problems are certainly non-ideal for an I.C.C. For example, the finite difference solution for Laplace's equation in two dimensions requires 17 modules per node if each node is locally controlled. This is obviously inefficient since each node can be controlled from a global control program. However, this approach leads to path building conflicts. The study of the performance of specific I.CC.'s for specific types of problems has shown that path building is a major factor in the design and progranming of an I.C.C. A statistical approach has been pursued in an attempt to evaluate the path building capability inherent in different I.C.C. designs. 4.4 ANALYSIS OF THE PATH BUILDING PROBLEM In this section two intuitive measures of path building are considered and comparisons of the two-dimensional and the N-cube I.C.C. are made. Finally, the results are presented of a statistical analysis and simulation of the path connection problem. One estimate of the ability to build a path from module A to module A' is the number of possible paths there are from A to A'. Only paths that are of minimal length (non-regressive) are considered. 189

In a two-dimensional machine, a path between two modules consisting of d1 horizontal path segments and d2 vertical path segments has a path length equal to d1 + d2. It makes no difference in which order the horizontal and vertical segments are chosen. No path can connect the modules with less than di + d2 segments; therefore dl + d2 is the minimal length. The number of minimal, but not necessarily independent, paths between two modules is given by (dl + d2) di! d2! Notice that dl is the number of horizontal path segments which is one less than the number of modules in a horizontal row. These numbers grow very fast. For a 5 x 5 array of modules there are 4 = 70 connecting paths between opposite corners. For a 10 x 10 array, there are 48,620 and for a 20 x 20 array there are about 010. This is a very large number of potential paths but they are not disjoint; there are only two disjoint paths. The path structure of an N-cube I.C.C. is most easily visualized by considering each module as an N-bit number. There are the same number of modules N and N-bit numbers, i.e., 2. Again, only nonregressive paths are considered. For these, it is possible to state reasonable path connecting algorithms in the N-cube machine. Further, with only minimal paths definite limits for maximum data access along a path can be determined. A minimal path is generated by choosing path segments to other modules such that the Hamming distance between the current module and the termination module is reduced by one each step. Thus, we can easily note two facts about paths. The maximum path length is N since two N-bit numbers can differ at most in N positions. And, the number of ways of connecting two modules at distance k is k!, i.e., all permutations 190

in the order of reducing the HTamming distance by one for k steps. The k! pos-,k sible paths between two modules are different but not disjoint and involve k modules.5 Intuitively, the more paths that can be built in a given area, the better. In other words, there is less limitation by accessibility. Therefore, the number of possible paths divided by the area used by the paths is a possible figure of merit of accessibility for a particular I.C.C. configuration. Thus for a two-dimensional I.C.C. this figure of merit is given as (di + d2)! dl! d2' (dl + l)(d2 + 1) and for the N-cube I.C.C. k!/2K. If an I.C.C. consisting of 4096 modules is considered, the number of ways of connecting two modules at a distance k = 6 is 20 for the two-dimensional I.C.C. and 720 for the N-cube machine (N = 12). The accessibility figure of merit is 1.25 and 11.25 for the respective machines. Another measure of accessibility is the average distance between module pairs. This shall be called the module distance. A low average path distance is desirable. The average path distance and the distribution of path distances can be determined. Note that W(p), the number of paths of length p for a twodimensional I.C.C. with D modules on a side, is given by: p-l W(p) = 2 Xmax(0,D-p+j) max(o,-j). j=0 191

W(p) is the number of distinct pairs of modules at Manhattan distance, p, and does not refer to the- number of ways of connecting a pair of modules. The range of p is from 1 to 2(iD-1), The average module distance A is given by: 2(D-1) 1- pW(p) A = 2(D-1) I W(p) pal For comnputation purposes 2(D-1) W(p) p=l can be expressed as the total number of possible paths T in the machine. Thus D-1 D-1 D2(D2_1) T - 2 (D-i)(D-j) 2 i=o j=l Also, the numerator can be written as D-1 D-1 S = 2 (D-i)(D-)(i+j) i=O 1=l The resultant expression for the average module distance simplifies to: S 2D A ~ - = T 3 192

The average module distance in a two-dimensional I.C.C. is proportional to the number of modules along one side of the machine or to the square root of the total number of modules. For an N-cube I.C.C. with 2N modules, the range of p is from 1 to N. The average module distance, A, can now be computed from: N IW(p)p p=l A =N wW(p) p=l W(p), the number of non-regressive paths of length p, is given as W(p) =( NN For computational purposes 7 r cmputational purposes W(p) can be expressed as the total number of posp=l sible paths, T, in the machine. T = 2N(1) = 2 (2N_1) N is the dimension of the N-cube. The numerator can be simplified to N N S = 2pW(p) = j)2N-I= 2N-1N2Np=l p=l Thus the average module distance S, N A -- T 2 193

Here the average module distance is proportional to the log2 of the number of modules in the machine. The average module distance for the two-dimensional I.C.C. consisting of 4096 modules is 43, while for the N-cube I.C.C. (N=12) the average module distance is 6. The maximum module distance is then 126 and 12, respectively. General results for the average module distance have been calculated and the results are presented in Figures 73 and 74. One interesting aspect is that the maximum number of module pairs are separated by the average module distance. Note the values exist only for integer values of distance p greater than one. The points are connected by a smooth curve to give a better presentation. Interesting comparisons can be made between the graphs for the two-dimensional I.C.C. and the N-cube I.C.C. since the total number of modules is equal to D2 and 2 respectively. D 2 for the values of D and N, chosen for respective curves in Figures 73 and 74. We now wish to get an estimate of the number of instructions that could be executed simultaneously. Since each module which holds an active instruction is capable of executing the instructions, the limiting factor is the accessibility of the operands. In order to get an estimate of the accessibility, two assumptions are made: (1) instructions and data are randomly located, and (2) all path lengths are equal to the average module distance. In general, prograimmers should be able to do much better than the statistical estimates by a planned placement of data and instructions to keep operand paths short, and by dispersing simultaneously active instructions as much as possible, For the two-dimensional machine the average module distance is 2D/3. If 194

DISTRIBUTION OF MODULE DISTANCE FOR THE 2-DIMENSIONAL I.C.C. (Normalized graph) 1.00 -- D = 4, w - =J 34, 6 modules Normalized number \ — of module pairs N at a given dis- D = = 64 modles tance p. p D (M16, anh 2,2d, 256 modules D = Figure64, = 144,744, 496 modules o. 5o - W(p) \1O N = I o.25 0.10 0.00 2D O A 2D 2(D-l) 3 Module distance p (Manhattan distance between modules) Figure 73

DISTRIBUTION OF MODULE DISTANCE FOR THE N-CUBE I.C.C. (Normalized graph) N\ N2N -1.00|o ^ ^^ / ~ N = W (2= 48, 16 modules Normalized number / of module pairs N / \\ N = )= 640, 64 modules at a given dis-N]) = W /J \~^ X= 1,892,352, 496 modules W(p) 0.50 0.25 0.10 0.00 0 A = N' 2 Module distance p (Hamming distance between modules) Figure 74

i instructions are executed simultaneously, then i.2D/3 path segments are active. The probability that a given segment is used is segments used by i paths _ i.2D/3 i total number segments 2D(D-1) (D-1) An approximation is made. It is assumed that connectedness of segments can be ignored and only the number of segments are considered. The probability that a given segment is available for the i+l path is: 1 -5i) The probability that a given set of 2D/3 segments is available is: 2D/3 ( 3(D-1)) Finally, if there were k independent ways of building the i+lst path, the probability that it could be built is: Pi+l - (-i/3(D-l))2D k is approximated by 2(m(m+l))+(N-m)(2m+l) i+J=A k = - A(A+l) where m = minimum (iJ) N = maximum (i,j) i and J are positive integers A = [2D/3] 197

There are two statistical measures of accessibility which use Pi+l* E is the expected number of paths of average length that can be built; and H the number of average length paths that can be built with a probability of 1/2 of building all paths. These are computed as follows: 3(D-1) E X l(i+l)pi+l = X(i+l)(l {l-(l-i/3(D-l))) ) i=0 i+l H = max r such that PiP2-**Pr > 1/2, i.e.; r X(l-l-(l-i/3(D-1)) ) > 1/2 The significance of these measures will be discussed after a similar development has been presented for an N-dimensional I.C.C. For the N-cube I.C.C. the average path length is N/2. The total number of path segments in the machine is needed to compute Pi, the probability that the ith path can be built successfully. This is N i9 = N,2Ni=l Assume that i average length paths have been built. This requires i-N/2 segments. The probability of a given segment being used is: used segments i N/2 i total segments N2 -' T The probability of a given segment being available is 198

The distribution of segments is assumed to be random; thus the probability of N/2 segments being available is i N/2 If there are k independent ways of building the i + lst path, then the probability that it could be built is: Pi+L - -l-( 1-i/2 )N/2 The number of independent ways of building the i+l path is approximately N/2 g___ 22N/2 N/2+ N/2+1 The expected value, E, is a statistic which is an estimate of the mean number paths simultaneously connected in an infinite number of experiments. Each experiment consists of selecting random pairs of coordinates at Hamming distance N/2, then attempting to connect the paths. The expected value is E (i+l)i - (i+l)(l-l-(l-i/2 )N/) i=O i=O A more stringent requirement, that there is a probability of 1/2 that all H paths of average length could be built, is given by 199

H = max r such that P1.P2,..Pr > 1/2, i.e., (1(-i/2 )2)) > 1/2 Where the probability of building both the first and second is P1-P2 etc. until the probability of building all r + 1 paths becomes less than 1/2. Simulation of the accessibility problem was performed to supplement the statistical analysis. This was desirable because of the approximations used in the statistical analysis. In particular, the approximation on the number of independent paths k is critical to the analysis. For comparison, a program was developed to be run on a 7090 which simulated the path connection (1) in an Ncube I.C.C., and (2) in a two-dimensional I.C.C. The program represented each module by a storage location and used the bits of the words to keep track of paths previously connected and being connected. The simulation results are more accurate than the calculations obtained from the analysis. The only difference between the simulation and actual running conditions is the use of random starting and ending points of the paths. The statistics for the distributed path length case were considered too complicated for analytical study. However, the distributed path length situation was not difficult to simulate. The distributed path length situation is a closer approximation to the actual I.C.C.'s operating conditions. For each path built, a random starting location was generated; then, either an average length was used or a length was determined by a weighted random selection from the distribution of lengths. Bits were then randomly inverted in the starting 200

location until an end location, the proper distance from the starting location, was generated. Each simulation started with no paths in the machine; the statistics were compiled both on the basis of the number of paths built (until the first path that could not be built was encountered) and on the basis of the number of paths built allowing for many paths to be unsuccessful. The results obtained from the simulation and the analytical studies are presented in Table 3. The estimates for the maximum number of active modules are obtained by dividing the total number of path segments by the average path length. The results from the simulation using distributed paths yield slightly fewer connections when compared to the simulation results using average path lengths, as would be expected. However, the percentage difference is small. This verifies that the assumptions used in the definition of an average path length are valid, so that the more complicated distributed path case need not be used in future investigations except for occasional verification. The estimate of the probability of successfully connecting the I + 1st path, Pi+l, was used in the analytic derivation of E and H. This estimate for Pi+l considers the path segments of all paths in the machine as well as the I + 1st path to be randomly scattered, i.e., the connectiveness of a path is neglected. To compensate the analytic expression for E, a factor would have to be introduced to make the probability higher that a segment is available if all its neighbors are also available, and lower according to the number of neighbors not available. The simulation runs produced statistics which accounted for connectedness, and are thus more accurate. The simulation results indicate that connectedness does not seriously affect H, the number of paths built before the 201

TABLE 3 ANALYTIC AND SIMULATION RESULTS ASSUMING ALL PATHS TO BE AVERAGE LENGTH USING DISTRIBUTED PATH LENGTHS Std. Std. Maximum No. Std. Std. N HA* H** Der. EA ES Dev. of Active HS Dev. ES Dev. T of HS- of HS Modules of ES of ES N-CUBE I.CC. 4 6 5 2 34 14.6 16 4 2 15 1 32 6 19 6 5 506 51 2 64 13 5 49 5 192 8 69 66 21 7,925 175 4 512 55 23 173 6 1094 ro 12 1151 986 158 2,022,879 2368 23 4096 549 160 2117 62 24576 2-DIM. I.C.C. 4 4 3 1 11 16.5 12 4 2 15 2 24 8 5 5 2 20 14.8 24 6 3 24 3 112 16 6 8 3 39 29 1.5 48 8 3 38 4 480 64 12 15 9 114 93 -- 192 19 3 118 -- 8064 *The subscript A is for analytic results. **The subscript S is for simulation results.

first unsuccessful try. The simulation statistics, HS, are from 5 to 10% lower than HA obtained by analysis. On the other hand, ES disagreed by as much as a factor of 1000 with the corresponding EA found analytically. Notice that EA, (col. 5, Table 3) is even larger than the maximum possible number of active modules (col. 8, Table 3). The disagreement is primarily due to neglecting connectedness and a better estimate would be, heuristically, about 1/2 of the maximum possible number of active modules. For this case, the simulation is accurate, but the analysis must be refined. 4.5 CONCLUSIONS The simulations and the analysis results of the path building problem are consistent with the results obtained from the study of the matrix inversion program for the I.C.C. structure. The relative decrease in path conflicts accounts for the superiority of the N-cube I.C.C. over the two-dimensional I.C.C. for the matrix inversion problem. The limiting factor in parallel computation, at least on matrix problems, is accessibility of operands. The results show conclusively that path building is a limiting factor in the performance of an I.C.C. Attempts to find other configurations for path structure which could do better than the N-cube have not been successful. Generally, the amount of hardware increases exponentially as the accessibility increases. In the upper limit every module is able to access every other module directly by a unit length path segment. This is approaching the point where every module is a conventional computer and thus far too costly. The N-cube I.C.C. requires over twice as 203

much circuitry as the two-dimensional I.CC.,, but the increased accessibility justifies the increase on problems such as matrix inversion and others where the advantage of highly parallel computation can be used. 204

5. RELIABILITY IN ITERATIVE MACHINES 5.1. REDUNDANCY AT THE MODULE LEVEL The reliability problem in an iterative structure machine can be thought of as composed of two different solutions: a) Each module has some computational capability, but the failure of a module does not necessarily influence the behavior of the rest of the machine, if there exists some way of checking and marking those modules which are not in working order. It is still desirable, of course, to have the reliability of each module as high as possible. b) The functional redundancy of the machine, and its ability to execute more than one program at a time, allows the introduction of a checking program, whose mission is to periodically conduct tests on the modules and to indicate their condition by means of tags on the modules themselves. In this section we are concerned with the first problem. It is very similar to the reliability problems in standard machines, with the introduction of a few particular characteristics. Since the whole concept of an iterative machine depends on the availability of components and fabrication methods capable of producing large numbers of identical modules with a low unit price, it is almost mandatory to assume that the technology employed will be some variation of the vacuum decomposition techniques. Also, since the module is here the basic unit, it is imperative to keep 205

it as simple as possible, and therefore, any redundancy scheme with a high penalty ratio in number of components is not suitable. It is usual to compare the penalty in equipment for a given redundancy scheme by referring to a redundancy ratio defined as the ratio between the number of components in the redundant circuit to the number of components in the basic original circuit. While this gives an idea of how much the complexity of the circuit will increase, it is completely misleading to think of this ratio as an indicator of relative cost. The relative cost is a weak function of complexity or number of components and depends strongly on the technology used to manufacture the components and also on the technique used to put together the individual components in stages and then the stages in units. This fact is clearly shown in the following actual case. In an optimal design, two-out-of-three majority voting requires triplication of the logically active networks, plus the addition of triplicate voters, plus some additional complexity in the wiring. Roughly, this is a six-fold increase in complexity.17 If micrologic techniques are employed, as described in Ref. 19, the majority voters can be obtained by the simple addition of only four resistors to the original circuit. Furthermore, even the triplication of flip-flops doesn't produce a corresponding increase in cost. The influence of the increased complexity in wiring is negligible. Thus, we see that the technological aspect influences the cost in two ways: a) The fabrication and assembly procedures can make the cost very 206

lightly dependent on the complexity of the circuit. b) The type of logic used can allow the introduction of majority voting in a very simple and economical way. As in most physical systems, there exists in information processing and transmitting networks a natural balance between performance and cost. These two words are taken here in a generalized meaning, but no matter what parameter is used to measure them, it will be found that an increase in one will produce a correspondent increase in the other. The ratio performance/cost will not be a constant, and at least in computing systems, it seems to be a monotonically increasing function of complexity or size. The opposite is true for transmission or contact networks, in which very soon a point of diminishing returns is reached. In summary, there seems to be an "ethereal" nondefinable constant that warns us to expect to pay more or to find restrictions or degraded performance in some part of the system when improvements are made in some other aspect. While it is true that one cannot get anything for free, sometimes a bargain can be found. This is especially easy where the system has one or more characteristics that are not important or relevant to our problem. The designer's ability should then be dedicated to locating these more or less elusive areas containing parameters onto which he can "discharge" the inevitable increase that will follow as a consequence of the improvement forced on the relevant or useful characteristics. In reliability problems this is especially difficult, since there are basically only two factors to be traded for increased reliability: time and complex207

ity. Inevitably, an increase in reliability will bring about an increase in either the time needed to perform the operation or in the amount of hardware needed. Since we must be resigned to pay one price or the other, one can proceed to a finer inspection of the factors and determine in what ways time or complexity can be used to introduce redundance. In general, time is traded for reliability when the following methods are used: repetition of message or error correcting codes. It is true that some time will be lost too when methods like majority voting are used, since the number of stages through which the information has to pass is increased, but this is a second-order effect, introducing only timing problems that are easily solved. Furthermore, the increased time delay is a consequence of the increase in complexity, and not the original cause or factor which we chose to produce better reliability. The rest of the methods produce an increase in the amount of hardware, that is, in components plus wiring and they differ only in the levels at which they are applied, in the size of the "boxes" which are replicated, or in the introduction of new "boxes" whose function is strange to the main purpose of the network. It is therefore, timely to try to extricate the basic factors in these approaches and to present them in the form of a classification. Due to the very nature of the problem, it is very difficult to present a picture of mutually exclusive methods, since even at the very first level of generality, some methods imply the use of some others. Table 4 presents a summary of these ideas. It is to be remembered that 208

these are not mutually exclusive methods; since the use of one will most certainly force the use of other method or methods at some other levels. TABLE 4 METHODS OF APPLYING REDUNDANCY 1) In time: Repetition 2) In the informationt Error correcting codes. Transmission netwks. (Shannon) (8,19,27) 3) In the components Activel h tLogical nets (Quadruplication) (10) ~~~T~Whiere LtStand-by. Where Redundancy Active (Von Neumann) (32) is Applied 4) In the circuitry Stand-by 5) In the presentation.See next graph Self-repairing (Variable logic) See 6) In the organization <'next Detection and switching(Fixed logic graph 7) In the outputs {Majority voting on the outputs (7),(33) This is even more evident in the expanded Table 5, where lines 5 and 6 of the Table 4 are presented in detail. A comparison between the corresponding horizontal entries in columns 2 and 5 will show that for the most of the methods used, introducing redundance in the presentation, makes it necessary to introduce a replication of units or networks. 209

TABLE 5 DETAIL OF LINES 5 AND 6 OF TABLE 4 IEUNiRDANCY MTHOD APPROACH ELEMEW PICEDIURE GENERAL STRUCTURE TYNES OF ERRORS CORRECTED SPECIAL REQUIREMENTS AND CCOMENTS FEFEI CES Alaloag~ voate taker~ Multiple lines Inputs within the range Optimal if all errors are equally (1,2) I possible.,Fixed rue Fixed ulight criteria hlBTtAO \ Threshold *elemnt * \ Multiple nodes - The sensitivity to noise increases(,2) 14 1 17 RESTiiiresholdaeleOe1e \ ^I j for large number of inputs. \ *,/ Quantile element \tDeterministic Dist.Multi-nodal Inputs outside the range * Handles a larg percentage of input Adp.v/ W ht a st / I types than the fixed rule approh 2 26,27 29 Transor element Bayesian / Dist. Nodal Steady errors 0 No accurate probability distribuI tion of errors is needed. p Fault-proof nets -____ No algorithmic procedure ________ Single errors corrected For transmission networks Average 22 Component J known. redundancy ratio: 4/1. / repliction I )Ato I Parallvel operation- N-replication of similar The failed element ust not hinder.; l ( T SWAMPIIG (i ~-J PRESENTATION SWAMPINGG~etwosk units. I the performance of the rest. 2 ~, 2 7 3... THE pty I1 PRESENTATION \ etwork 0 Qu adedwork ( aed logic __ _ Qudruplicated logic with Doesn't correct errors All single errors are corrected, replication u crisscrossing betw.stages generated in consecu- and also multiple errors generated 32 tive stages. in stages separated by two other stages. Some of the replicated units Majority voter taken as perfect. Coposition Logical Networks _ are comon to two lines tran- Ma ity voting Corrects all single errors,1/10 of rlictn / i mitting different information. with common re- the double errors and 1/10 of triple 2 FUNCtagerrors. MANIPUTIO N N Ues overlapping intermediate plicated units. Decoposition on f having a specified Multiple line betw. Requires coder and decoderch ick property I coder and decoder. are not checked. Com positionof Logical netwks. _ | Triangular netwks. of Triangular,tree- All errors generated by Mapping technique used. 25 LOGICALLY var.functions unstable elements like netwk. variations in the funcSTABLE NETWKS Interaction ofThreshold ele. Netvk. of threshold elem. ighly onnectedon per.by the elem variable fcts. with indep. thresh. variatons. network. - Errors generated by va- No formal procedure known. 29, 33 riation of the thresh. E plesh. sE known. of the indiv. elem. / Kinematc Reactivation of spare units, _ Replacement and checking circuits 6 20 21 37 <|I R Tesselation I Self-reproducing elements with generating error. models specified normal behavior. Possibility of an i ortal 20 ITD THIE automata wave Is shown. ORGAINIZATI T:' Component | Error detection and switching Doesn't correct the Shelf-aging of inactive units is IETECTION AND / level to stand-by replacement compon. first error. taken into account. SWITCHING I, 5, 15, 18, 35'H \ Systea |__nj Ditto with stand-by units. _ Ditto. The failure rate of the swtching element \level | det inesetermine her this system or siple parallel connection is best.

5.2. CEECKING PROGRAM APPROACH The second possibility stated in 5.1 refers to the capability of the iterative machine to be able to afford running an extra program in the form of a checking program. In order to avoid interference between the main program or programs and the checking program, it is necessary to think of a machine organized in the shape of a torus or a cylinder. In these geometrical configurations there is continuity of the medium in which the programs move. Therefore, both the main and the checking program can incorporate in themselves the necessary instructions to move themselves in a uniform way, for example, one position to the right after each instruction. If one thinks of the main program as a "stationary" program, then the net effect would be that of having a machine whose basic array is constantly circulating under the pattern occupied by the instructions. A column of modules is taken as a reference starting point for each of the cycles or revolutions of the programs. Under this condition the failure of a module can occur at three different times during a cycle: I) A module fails after the main program has used it and before the test program. II) A module fails after the test program and before or during the main program. III) A module fails while it is a part of the test program. In case I, the test program merely detects the failure and "tags" the failed module to prevent its further use. In case II, the failure will not be detected until the next cycle, and the test program not only has to tag the failed module but has to 211

generate a "repeat" cycle signal. This signal produces an interruption of the present program, and the instruction counter and all register are reset to the condition existing at the beginning of the cycle in which the failure occurred. This back-tracking of the main program is a very stringent condition since it means that the state of the machine has to be reproducible, and therefore, the contents of the accumulators, registers and memory, have to be restored to the original values at the starting point of the previous cycle. Since the starting instructions for each cycle are well defined and come spaced a fixed number of positions apart, it is a matter of storing the contents of the registers and accumulator only for those instructions which are starting instructions, and to use temporary storage locations for all the store instructions in the cycle. The use of the test program has reduced the reliability problem from one of keeping n x n modules in working condition during the whole length of the program to one of assuring the good condition of the modules during the time it takes to complete one cycle. Actually, the situation is even better since the modules left behind by the program can fail with no ill effects, even before the cycle is completed. Therefore, the last column has to be active during the whole cycle, the (n-l)th column has only to be active during the [n - l]/n steps of the cycle, and so on. An even more critical analysis will show that when a module fails between the test and main program, or within the main program itself, the probability of producing an error in the computation is zero, since the condition will be detected and the whole cycle will be repeated, if the test program is able to survive until the end of the cycle. Survival 212

of the test program implies that no module should go bad while part of the test program, but it otherwise could go wrong before the test program uses it, since it will then be detected, "flagged" and avoided. Therefore, the probability that an error will occur in the computation is equal to the probability that a module goes bad between the test and the main program, or while it is a part of the main program and that a module goes bad while part of the test program at any time between the occurrence of the bad module in the main program, and the moment the test program reaches the reference column to initiate a new cycle. The only possible combination that can generate an undetected error in the computation is the one described above, with the specified sequence of errors. If the sequence is reversed, that is, if the error occurs in the test program first, the same failed module will introduce an error in the main program, but the change introduced in the test program will induce a "repeat cycle" condition, since the test program is compared after each cycle with a stored copy of the same. 213

6. A MATHEMATICAL MODEL OF AN I.C.C. 6.1 INTRODUCTION The preceding sections have presented studies in the areas of programming, accessibility evaluation, machine organization, and reliability. In order to determine the full capabilities of this novel organization in the form of an iterative array of modules, it is necessary either to resort to simulation procedures or to construct a mathematical model. Once the behavior and the language of the model are determined, the model serves as a suitable tool with which one might expect to be able to answer questions relating to the computational capabilities of the machine, to the types of problems in which this power is more evident, and to the dependency of computational speed on the path connection mode. Developing a model of an I.C.C. in the more general sense is a formidable task. We may have to simplify slightly the problem, without losing any valuable properties by establishing a valid analogy between the I.C.C. and an ensemble of Turing machines acting on a multi-dimensional tape. The next step is to keep the tape-space multi-dimensional but to replace the set of Turing machines with finite-state machines. Even then the problem is far from being simple, as shown by the fact that except for an occasional instance, the literature is barren of discussion dealing with multiple head automata. The subject is not without merit for multiple-head automata possess capabilities beyond those of single-head machines-capabilities yet to be thoroughly explored. 214

This section extends the results of automata theory beyond the usual limit of one-dimensional one-tape single-headed non-halting finite state machines to encompass, in the most general case, multi-dimensional multitape multi-head self-halting finite state machines. A familiarity with the material contained in the papers by McNaughton and Yamada,(41) E. F. Moore(42) Minsky,(45) and Rabin and Scott,(^3) will be necessary and sufficient for an intelligent reading of this section. Established results of other persons will usually be stated and used without proof. The author has attempted to give proper credit to the work of others. Thus, all theorems and remarks contained in this section which are not credited to others are, to the best of the author's knowledge, original. The material presented in this section is arranged into seven subsections. Section 6.1 is the introduction. Section 6.2 introduces the concepts of alphabet, tape, and n-head machine. The operation of n-head machines on tapes is defined and the manner in which n-head machines accept and reject inputs is described along with the notion of how n-head machines define sets of inputs. Section 6.3 presents a number of operations on alphabets, tapes and sets of tapes which constitutes a language by which, beginning with primitive alphabets, one can represent certain sets of m-tuples of tapes. The language developed in this section includes as one of its parts the language of regular expressions. Section 6.4 contains a set of six pairs of analysis-synthesis theorems relating the sets of inputs defined by n-head machines to expressions in the language of Section 6.3* The theorem pairs are ordered according to the complexity of the machines involved, beginning with 1-way 1-dim i-head machines and terminating with 2215

way D-dim n-head m-tape machines. Section 6.5 consists of a collection of algorithms and theorems pertaining to n-head machines. In particular, algorithms are given to decide if any given n-head machine is 1-way and to decide if any given regular expression is realizable. The section also develops theorems dealing with the questions: 1) Does a given machine accept a given input (the "particular input decision question+)? 2) Does a given machine accept any input (the "emptiness decision question")? 3) What is the relationship between state and transition accessibility and the emptiness decision question? 4) What are the Boolean properties of n-head machines? 5) What is the relationship between the number of heads a particular machine possesses and the speed with which this machine reacts to inputs? Section 6.6 suggests several topics for further study. The topic areas are described and some partial results pertinent to each area are given. Section 6.7 is the concluding section and the results are summarized and discussed. 216

6.2 n-HEAD FINITE STATE MACHINES-A DESCRIPTION 6.2.1 Alphabets Def. 2.1 An alphabet is a finite collection of symbols. By convention alphabets will be denoted by some variation on the letter Z. Thus E = {B, O, i}, Z = {B,a,b,c} and =- {#,'?,~} are all examples of alphabets. 6.2.2 Tapes Def. 2.2 If D is a positive integer then D-space is defined as a space of dimension D in which a Cartesian coordinate system has been embedded, each coordinate ranging over the integers from -oo to +oo; around each coordinate point is centered a unit D-cube called a cell. Thus D-space consists of a D-dimensional space divided and covered by an orderly array of unit D-cubes (or cells) where each cell is labelled with a unique coordinate point. Def. 2.3 Let Z be a alphabet; t is defined as a D-dimensional (D-dim) tape over E if t consists of a D-space in which each cell contains precisely one element of E. We adopt the convention that a cell in which no symbol is written will be called empty; a cell containing a symbol will be called filled. It follows from the definition of tape that if t is a tape in some D-space then every cell in that D-space is filled. In this paper the symbol B will be used exclusively to denote the blank. B is a legitimate possible symbol in any alphabet. Any 217

cell of any tape will be considered blank if and only if it contains Be Def. 2,4 Any tape t will be a finite tape if and only if t contains a finite number of non-blank cells. If t is a finite tape of dimension D,it is equivalent to say that the non-blank portion of t can be enclosed in a rectangular D-dimensional parallelepiped of finite dimensions. In this section we will limit consideration to arbitrarily large but finite tapes. Therefore whenever the term "tape" is used it will be understood that "finite tape" is implied, Def. 2.5 The initial cell of any tape will be that cell located at the origin of that tape's coordinate system. It will be convenient to omit explicit representation of the coordinate system of a tape; in such cases the initial cell of the tape will be indicated by a double boundary and the coordinate directions established by prior convention. In this paper for all 1-dim and 2dim tapes the up, down, left, right directions will be respectively the coordinate directions +2. -2, -1~ +1. For example, Figure 75 gives an. illustration of a 1.-dim tape over Z = {B,a} and Figure 76 gives an illustration of a 2 dim tape over Z = {B,0,1}. B B a a a a a B B 3 Tape tl Figure 75 218

,' B B B B B B B B B _.. 13 O O 1 B BB B ___' ~ B B 1 BB B B B ~ e_ ~BB B lBB BIB. ___' B B B 1 B B B B B _' * B B B 1 B B B 1 B B B DB 1 B R B O O B B 1 1 O B B i3 B B B O B E B 3 13 B B.. 1B B B B B 1B B Tape t2 Figure 76 Def. 2.6 Let Z be an alphabet; t? is defined as a D-dim partial tape over Z if t' consists of a D-space in which a finite number of cells contain precisely one element of EZ all other cells being empty. Def, 2.7 If t is a tape.. t is defined as a subtape of t if and only if ts is a partial tape of the same dimension as t and for each filled cell in t5 the corresponding cell of t contains the same symbol. Thus, for example, t2s given in Figure 77 is a subtape of t2 (t2 is given in Figure 78) 219

_ _I' I _______B lJB BC _ ____ B 11 Subtape tT Figure 77 Def. 2.8 If one is in any cell of a D-space with the coordinate axes identified from the 1-st to the D-th, to move d where d is some integer in the range -D < d < D is defined as moving one cell in the Idl direction, negative d meaning backward, positive d meaning forward and zero d meaning no move. 6.2.3 Machines Def. 2.9 An n-head finite state machine (or just n-head machine) is a system CO= < C, S sisM > where C: the characterization of the machine is a list of a) the set of heads H = {hi},i = 1,2,...,n b) a partitioning of H into disjoint subsets H1l H2',... Hm (m < n); O works on m tapes, the heads of Hi reading tape ti c) two sets {Zi} and {Di}, i = 1,2,...,m where Zi is the alphabet that all the heads in Hi read in common and where Di is the dimension of the space (tape) in which the heads of Hi move. 220

S: a finite non-empty set which together with the states "ACCEPT" (abbreviated A) and "REJECT" (abbreviated R) which are not in S make up the set of internal states of 0o sI: an element of S designated as the initial state of OC. M: a mapping from s X X 2 o X x...... x X H1 Hm to + + + + + + 1 S x Di x. D- x D X o X Dl x D- x. x J ({AR} "X _ H1 Hm where H. = the number of elements in H. and Di = jdld is an integer in the range -Dito +D-; M constitutes the table of transitions of O. Def. 2.10Oa = < CS,sIM > accepts or rejects any m-tuple of tapes t = (tl, t2,.)., tm) in the following manner [it is understood that for i = 1,2,.o.,m ti is a Di-dim tape written over Ti in accordance with C of O]: 1) OL starts in state s with all the heads of each H. resting on the initial cell of each tio 2) If OL is in state sk and the heads read the n-tuiple of symbols a ( E 1X a Z x * x xel X S X o) H H l m and M of OL. has the entry 221

(sk' a) - (s,dald2,.. n) where (dl,d2,..., dn) eD1c x D D1 x...x x x Dlx D2 x. D 21 ni2 then L goes to state sQ and each head hi of fL moves dio 3) 01 continues to repeat step 2 above; the heads of 01 move back and forth on their respective tapes and the machine passes through a sequence of internal states, If in a finite number of cycles C goes into the A(R) state then the machine stops and is said to accept (strongly reject) t. If OL never goes into A or R then OC is said to weakly reject to Example 2.1 l2.1 = < C' S iM > where C1 = 0 2.1is a 2-head machine operating with both heads reading the same 1-dim tape written over Z1 = {B,0,} Si = {(SS{2} S1 = S1 M1 - 1 l M1 B B BO Bl OB 00_ 01_ l1 10 11 ___ S1 s1,0,0 s,-i,0,-1,0 1 1s'l1, 31'1,0' isl'i,O0i~:i*0....... 1._........ B1.[0..O0.i.O1. _ 2............ 2 S2 A IR -1R R R I R s -1, 1 I i O2 accepts the tapes 2.1 *Y B Ii0 1 o IB l B.. 3l 11BB O 1 I Orl B l.. iB 222

while strongly rejecting B 0 B jlj_ ~ and weakly rejecting B I l T 0 I 1 B IB __ Def. 2,11 Given any internal state s of machine L which works over m-tuples of tapes, s is said to be accessible (an accessible state) if and only if there is some input m-tuple that takes \ from sI to So Defo 2.12 Given any transition T of 01 (a transition of OL is an entry in the M table of aC) corresponding to reading the n-tuple of symbols a while being in state s, T is said to be accessible (an accessible transition) if and only if there is some input m-tuple that takes L from sI to s and presents OC with input oa Note 2.1 If T is an inaccessible transition of machine L (as is, for example, the transition on sl,O,B in Mi o of example 2ol) then the destination state and the head movement of T can be left unspecified without affecting the behavior of OL o Def. 2.13 OC, an n-head machine, is called 1-way if an only if for each head h. of O0 on all accessible transitions of (L h. moves a 1 1 fixed direction dio If O is not 1-way it is 2-way. Note 2.2 It is sufficient but not necessary that O. be i-way if all transitions specify the same head movements, Clearly inaccessible transitions can have any head movement at all and never affect the operation of 0X o 223

Def. 2.14 The set of all m-tuples of tapes accepted by any n-head finite state machine t is denoted by T(0t)> Defo 2.15 If Ot is any n-head machine working on single tapes and t any tape in T(Ot) then g L(t), the generator of Ot in t. is defined as that subtape of t in which the filled cells are precisely those cells of t that Ct actually scans while accepting t, If CO works on m-tuples of tapes and t is any m-tupie in T(Ot) then gt(t) is the m-tuple of subtapes derived by retaining as filled only the cells actually scanned in accepting to For example t = B B 1 0 B 0 0 1 0 o 1 I B' oo is in T(L 1) [see example 2.1] and goI(t) = ~ 1 J 0 1 0 1 1. 1: I B l0 0 I1 0 10 1 Defo 2.16 The set of all generators accepted by any n-head finite state machine Oa is denoted by G(CX)o G(O ) = {g t)tcT t)} 6.2.4 State Graphs As in example 2.1 any n-head finite state machine Om = <C,S,s,M > can be described by listing the set of states 3, mentioning the initial state s, and by giving the table of moves M in tabular form. There is, however, a convenient graphical representation of any finite state machine known as the state graph. In it the set of internal states is represented as labelled circl-s. 224

The initial state is indicated by an inscribed square. Transitions are represented by labelled arrows such that if an arrow emanates from state sk and impinges on state sX and is labelled with the symbol j/d then O when in state sk and reading n-tuple a will fall into state si with head movements according to n-tuple d. For example, the state graph of machine 21 is given 2.1 in Figure 78 below. Si S \ State Graph of 021 Figure 78 Note 2.3 If in any machine IM several inputs a%,a2,..., cp all causes Ot to go from state sk to state s2, with associated head movements dd2,..., d then only one arrow will be drawn from s 225 /" t I~~~1, k.//',~~~~~~q BB~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 229

to sp in O's state graph and it will be labelled l/dl, a2/d 2,..,^ p/ dp If d1 d2= ooo = dp = d one may further simplify the arrow label to C12 o o2,~., 7p/do Note 2o4 In order to simplify the drawing of state graphs this paper will adopt the convention that the R state and all transition arrows to R will not be represented explicitly. One will understand that given any machine C in some state sk and reading the input n-tuple of symbols a, if no arrow with the input label a leaves sk then OL will go to REJECT. This convention in no way alters the behavior of any machine for T(Ot) and G(Ot) remain unchanged as does the ability of Ot to strongly or weakly reject any tape. Applying the conventions of Notes 2,3 and 2.4 to yields the state graph given in Figure 79. (0,0), (ll)/1 (,0),(1)/1,0 O iB Simplified State Graph of 2 Figure 79 Note 2.5 Since this section is concerned only with finite tapes it follows that all tapes to be considered must contain the symbol B an infinite number of times. Because of this we will require all heads of all machines to include B in their alphabets. 226

Note 2.6 Observe that in the definition of any finite state machine m H. = n i=l 1 Note 2.7 We will adopt the convention that if H = {h, h2, o hn} then the first HI heads of H will constitute Hi, the next H2 heads of H will constitute H2, etc.... One in no way limits the class of n-head machines by doing this since any machine can be put in this form by judicious labelling of the heads. Note 2.8 In the definition of n-head machine it is required that each head begin on the initial cell of its respective tape. One may ask if the power of n-head machines is increased by allowing the heads to adopt some other fixed but not initial cell starting configuration. The answer is negative: if OL is any n-head machine in which each head starts on some fixed but not necessarily initial cell then there exists an n-head machine O which has all heads starting on initial cells and which is equivalent to Ot (i.e., T(,') = T(Ot)). The construction of 0' from Oa consists of adding a set of states sT, sij OOs, S to S the set of states of 0t s' is the initial state Oo~ Sl~ ooo~ Sp 0 of (' For all inputs QC' has the transitions so -_ Si..o - s5 - s p is made sufficiently large and appropriate movement n-tuples are associated with each transition such that after p + 1 cycles Qt' is in state si and the heads are in the desired starting position; from then on (t' acts precisely like M.o Note 2.9 In the definition of n-head machine it is required that each head movement be either a stand still or a unit jump along one of the coordinate axeso One may ask if the power of n-head machines is 227

increased by allowing each head movement to be a finite determined jump but not necessarily unit or along a coordinate direction. The answer is negative: if OL is any n-head machine in which each head movement is afinite determined jump then there exists an n-head machine (X' which has all head movements unit jumps along coordinate axes and which is equivalent to Oo The construction of L' from OL consists of adding a number of states to (X such that each non-unit jump is decomposed into a chain of unit jumps, each chain replacing a nonunit jump transition. Note 2o10 Readers familiar with the work of Kleene, Rabin and Scott, McNaughton and Yamada, et al. may wonder at the relationship between the machines defined by Rabin and Scott (RS machines) and the n-head machines we have defined in this section. RS machines and n-head machines are both finite state deterministic machines; they do, however, differ in several essential ways: 1) An RS machine has one reading head. An n-head machine has n reading heads; each head may read a different alphabet and one or more heads may be placed on a tape, 2) An RS machine works only on 1-dim tapes. An n-head machine can, in general, work on tapes of finite but arbitrarily large dimensiono 3) The method by which n-head machines accept or reject tapes differs from that of RS machines. One of the internal states of any n-head machine is the ACCEPT state; if the machine ever goes to 228

ACCEPT the machine stops and is said to accept the tape; the tape is rejected if the machine goes to the REJECT state. An RS machine, on the other hand, can only decide on accepting or rejecting a given tape precisely at the moment that the reading head leaves the filled portion of the tape and "steps off" the tape in some manner, Note 2.11 In a real sense, given any n-head machine Ot, G(OL) is a better parameter of the behavior of 01 then T(Ot). For all t, T(Ot) = 0 or o, This is clear since if Ot accepts no input then T( O) = 0 if, however, T(Ot) + 0 then there is at least one tleT(Ot). Consider goL(tl); got(t1) has an infinite number of empty cells; therefore by filling these cells of g (tq with elements of Z, the alphabet of t, we can generate an infinite number of distinct tapes all in T(OL), so T(Ot) = 0. G(Ot) is not limited to 0 or oo, but can be any integer value depending on Otl Further, if g is a generator of a then any tape t containing g as a subtape is accepted by aO whether t contains symbols out of the alphabets of 01 or not (in other words the empty cells of g are "don't care" cells whose contents do not affect the behavior of a )o Thus given G(Ot) we know T(Ot)o This section is concluded by an example, machine oCt which 22'wi demonstrates that 2-head machines are more powerful than 1-head machines. Ot22 is a 2-head machine reading 1-dim tapes over the alphabet Z = {B,0,}1) O 2 o2 will accept any tape which starting at.the initial 229

cell and moving right has p O's followed by p l's followed by B where p = 1,2,3,.... It is an established fact that such a set of tapes cannot be represented by a i-head machine. (6 o(j.)-/(,l (0o,y1,1 1,B Machine 2.2 Figure 80 For.2 observe that G(0 o2)= {gg = I~i o I.. I 0 l1 1 I... 1 B p 1,2,.} P P T(L 2.2) = {tt is 1-dim tape and g.3. geG(OL2.2)1 and g is a subtape of t. 230

6.3 THE LANGUAGE 6.3.1 Operations on Alphabets Def. 3.1 If *1' *''' are alphabets then the column alphabet of Fi,,Z2...., n, denoted by is defined as the alphabet consisting of all column m-tupibs over the alphabets Z,2,..., * m; i.e., For example, if 1 = {B,O} and 2 {a,b,e} then ^2 C- - - Def. 3.2 If Z is an alphabet and D some positive integer then Z indexed by D, denoted by E/D, is defined as the alphabet consisting of all doubletons of the form c/d where cZe and deD; i.e. /D = {(a/d)l ac., d is integer in the range -D to +D}. 231

For example, if Z = {0,1} then Z/2= {0/-2, 0/-1, 0/0, 0/1, 0/2, 1/-2, 1/-i, 1/o, /11, 1/2}. 6,3.2 Operations on Partial Tapes Defo 3.3 If t is a partial tape existing in some D-space and written over the alphabet then t will be understood to have m channels where the i-th channel of t will be the tape existing in D-space and written over Z. and obtained from t by replacing every occurrence of an element ~1 in T 0 with the single element a.o For example, if t = a b a a 0 B 1 0 is a 2-dim partial tape written over rL 232

where = {a,b} and Z = {B,O,1} then the 1-st channel of t is b and the 2-nd channel of t is 1 a a e a —, i! 1 0 Def, 3.4 If t is a partial tape over ~ = then the separation of t, denoted by t/, is defined as them-tuple of partial tapes, (tlt2, o.o, tm) where ti equals the i-th channel of to For example, if a b c B 0 1 0 B 1 t lm t = a' a 0 B 1 1then t4 =- (n m b c m [ j O j1 ). then t ( a b I 0 ) ~ ia FI z a 0 I1 1 m 1 1 Note 3.1 If t is a tape over [ and m = 1 then t = t.m Def, 3.5 t will be said to be an initial partial tape if and only if t is 1-dim and all the cells to the left of the initial cell are empty. For example, tl = I a jB -b ic I and t2 = LiE \1 I BI ] are initial partial tapes and t3 = J O IB I is not. Defo 3~6 t will be said to be a connected partial tape if and only if all cells of t are empty or if the initial cell of t is filled and for any two filled cells in t there exists a string of adjacent filled cells connecting the original two cells. 233

For example, t1 =a IiI~]\i B and t = 0.... Lo 0. are connected while t3 = a ILaQI Ib iBI and t4 = b II are not. Def. 3.7 If t is an initial connected partial tape over an alphabet of the form Z/D then the fold of t (or t fold), denoted by tf, is defined as the D —dim partial tape obtained from t in the following manner: 1) t = oo/do idl/dl 2/d2 I * p* i/ ap-l/p-l p/pij where aiej2 and die D 2) read t from left to right, one cell at a time, and simultaneously write out the following partial tape t' in an originally all empty D-space... a) let i = 0, b) write ao in the initial cell of the D-space and move do, c) augment i by 1, d) write ai in the cell under consideration and move di, e) repeat c,d until i = p at which time one writes ap in the cell under consideration and then stops. The resulting partial tape t' will be finite (since t was finite) and each cell of t' will contain a finite number of elements of Zo 234

3) Examine the cells of t' that contain more than one element of Eo For each such cell a) if the elements of Z that it contains are identical, erase all but one of the elements; f the resulting D-dim partial tape is t b) if any one of the cells of t' contains nonidentical elements of Z then there is no partial tape that equals tf and tf is defined as 0, the null set. For example, if t = LB/O IB/O B/-1 0/-1 1/1 |0/1 then t{ =j r1 0,0,B, and -t 1~ IIB 1 1 A if t = a/l b/l a/2 a/2 a/-l b/-l b/-2 b/-2 a/-2 c/-2 b/-l ra/2 b/ then t b b a b a aa b a b c a b and t = b b a b a a b a a b 235

if t 0 then t o =- 01 f and t3 = o Note 3.1 If a /d is the last symbol of some initial connected tape p p f t, then t is independent of dp. Therefore we can omit dip if we wish and still define the fold operation without introducing any ambiguity. Defo 3~8 If t1 and t2 are partial tapes of the same dimension then the cover of t1 and t2, denoted by t1 i t2, is defined as the smallest partial tape that contains tl and t2 as subtapes, if no such partial tape exists then tlC t2 = 0o B a b i For example, if t a and t =b then tlc t 2 a b but if t3 l but if t3 = ba then tl. t3 = Note 3.2 tl C t2 can be defined operationally as follows: 1) let D be the dimension of tl and t2, 2) start with an initially empty D-space; copy t1 into it, 3) copy t2 into the space; the result will be a finite partial tape t' each cell of which contains at most two symbols (one from tl, one from t2), 4) consider the cells of t' that contain two symbols; for each such cell if the symbols are identical, erase one of them..o the resulting partial tape is tcl t2; if any cell contains non-identical symbols then tlC t2 = 0o 236

Note 3.3 If we define 0 t = t C 0 = 0 for all partial tapes t then the cover operation becomes commutative and associative, i.e., tl t2 = t2 C t and tlC (t2. t3) = (tl t2) t3" Def, 3.9 If tl and t2 are initial connected tapes (therefore 1-dim) then tl concatenated by t2, denoted by t1 t2 or just t1 t2, is defined as the tape obtained by copying into the empty tail of tl (the empty cells of t1 that most immediately follow, and perhaps include, the initial cell of tl) the contents of t2 beginning with the initial cell of t2, The initial cell of t1 t2 corresponds to the initial cell of tio For example, if t1 = 1ol I 1 I and t2 = 1 1 then tl t2 =It 11 I l1 - Note 3.4 The "null partial tape" (not to be confused with the null set) is that partial tape in which every cell is empty. The 1-dim null partial tape is denoted by Ao Observe that A is an initial connected partial tape and that for any initial connected partial tape t, tA = At = to Note 3.5 Observe that the concatenation operation is not commutative but is associative. 6.3.3 Operations on m-Tuples of Partial Tapes Def. 3.10 If t = (tl, t2.oo.,tm) is an m-tuple of partial tapes such that tf is defined for i = 1,2, ooo, m then the fold of t (or t fold), denoted by tf, is defined as the m-tuple (ti, t,., t if for some i = 1, 2, o.om tf = then tf =, 237

For example, ( I/ I11 a / II l b/2 | b/-l) = T |b| ) and ( 1/1 a/o b/l | ) - i. Defo 3011 If t = (tl, t2, o, tm) is an m-tuple of tapes and rr an — tuple of non-zero positive integers whose sum equals m,(E1 r. = m),then the cover of t with respect to rl, r2, oo ri, denoted. by t( r2, is defined as the i-tuple O1 t\ ot 1 r (t lG 2G e t2 o C tr r, +1 r t +2... tC-r+ 2 to -r+i tm-r,+2C..Ctm) r1 ~1 if any element of tC 1 is 0 then t Cr = 0 0 For example, if then tC = ( a b laIb ) 2 and t. = o 2 6.3.4 Operations on Sets of m-Tuples of Partial Tapes Def 3012 If T is a set of partial tapes (ioe,, a set of 1-tuples of tapes) then the separation of T, denoted by T', is defined as the set of all m-tuples obtained. by taking the separation of each element of T 238

(i.e., T { = {tltET})o f Def. 3.13 If T is a set of m-tuples of partial tapes such that t is defined for all teT then the fold of T (or T fold), denoted by Tf is defined as the set'of all m-tuples obtained by taking the fold of each element of T (i.e., Tf {tflteT}) rl Def, 3.14 If T is a set of m-tuples of partial tapes and.2 r an 1-tuple of non-zero positive integers such that 1 Lr is defined for all teT then the cover of T with respect to rlr2, o., rp, denoted by r1 r2 Tc is defined as the set of all 1-tuples r obtained by taking the cover with respect to rl, r2,.o.., r of each element of T (i.e., T C r1 = L t C -eT ).2a~~ 2 Def. 3.15 If T1 and T2 are sets of initial connected partial tapes then T1 concatenated by T2, denoted by T1 T2 (or just T1 T2), is defined as the set of initial connected tapes obtained by concatenating all elements of T1 with all elements of T2; (i.e., T1T2 = {tlt21tle T1, t2e T2}). Def. 3.16 If T is a set of initial connected partial tapes then T star, denoted by T*, is defined as the set AAU T U T2 U T3U...... where Ti = T T ooo T concatenated i times and U denotes the conventional union of sets. 239

Note 3.6 T* is the smallest set that contains T and is closed under concatenationo 6.3.5 Regular Expressions A regular expression (RE) is a symbolic means of representing certain sets of initial connected l-dim partial tapes. The union and intersection of sets of 1-dim partial tapes will be indicated by U and.\ respectively. If T is a set of l-dim partial tapes written over Z'then the complement of T, denoted byrJ T, will consist of all l-dim partial tapes written over Z and not in T. Defo 3.17 If E is an alphabet then 1) all elements of E are simple terms and all simples terms are RE's over E; if croe then a denotes the partial tape 2) A and 0 are RE's over A, 3) if a is an RE over Z then r a and Q* are RE's over Z 4 ) if a and 3 are RE's over 7 then a0U, a n a and a3 are RE's over E, 5) no expression is a RE over Z unless it is obtainable by 1) to 4) above. For example, o - {'-!I I I El i eI } (olUlo)* ol = {El!j i lll oi.i iilJlol.ll } 240

Note 3~7 In any partial tape represented by a regular expression the leftmost symbol of the partial tape is in the initial cell of the tape and all filled cells are connected, Note 3~8 Any finite set of 1-dim initial connected partial tapes can be represented by a regular expression simply by taking the finite union of the enumerated tapeso Not all infinite sets of partial tapes can be represented by RE's; for example the sets 0O1 0 (or Onln) can(6) not be represented by a REE 241

6.4 EQUIVALENCE THEOREMS 6.4.1 1-Way 1-Dim 1-Head Machines Theorem 4.1 If Gy is a 1-way 1-dim 1-head machine working on tapes written over Z then G(OL) = P where 3 is an RE over S. Proof: An effective procedure exists to determine if any Otis 1-way (see Section 6.5). Without any loss of generality we can assume OC to be 1-way in the +1 direction in which event all the accessible transitions of t will carry labels of the form a/1 where acz. Since one can remove all inaccessible transitions from the state graph of Ot without alterning G(OL) one finds that the state graph of t is precisely the state graph of a "one-input, one-output automaton" as described by McNaughton and Yamada,(3) Ot having a single output state, namely the ACCEPT state. Therefore, using the procedure given in Part II of the McNaughton-Yamada paper one can construct 3 the RE over Z that represents all 1-dim partial tapes taking Ot from sI to A; i.e., 3 = G(Ot)o QED Note 4.1 If a0 is a 1-way D-dim 1-head machine then G(Ot) consists of a set of partial tapes, each consisting of a D-space empty except for a finite line of symbols along one of the D-coordinates. This line of symbols can be represented as a RE over Z, the alphabet of OL. Example 4.1 Figure 81 gives 4Ot a 1-way 1-dim 1-head machine working on tapes over 7 = {a,b,B}; find G(0t41). 242

s2 ab/\ Bb/ Machine 4t 4.1 Figure 81 243

Using the technique of McNaughton and Yamada one finds that G(Ot) = BU (aUb)a U (a U b)(B Ub)[a(aU b)(BU b)]*[bU aBU a(aU b)a]. Theorem 4.2 If 3 is a RE over Z then there exists a 1-way 1-dim 1-head machine Oc working over Z such that G(Ot) = ). Proof: Construct, via Part III of the McNaughton and Yamada paper, the state graph of Ot' the "one-input, one-output automaton" that represents o. will in general have more than one terminal state (output = one); merge all terminal states of t' into one state labelled ACCEPT and delete all transitions from this state; call the new machine thus obtained Co If t is a tape accepted by 0L then t must have a subtape that takes Q{' from sI to a terminal state, i.e., t has a subtape in P; conversely if t has a subtape in P then t will be accepted by 0 o Thus G(t) = =o QED Example 4.2 Let P = (aUb)*bBUabBB be a RE over Z = {B,a,b}. Find a i-way 1-dim 1-head machine O 42 such that G(0t4 2) = P. Using the McNaughton and Yamada technique one first constructs O 2 (Figure 82) the one-input one-output machine that represents P [terminal states of A4 o2 are represented by double circles]. By merging the terminal states of 0 42 into a single ACCEPT state and by deleting all transitions from ACCEPT one obtains the desired machine Ot 42' (Figure 83). The head movements for all transitions in 01 4~2 are understood to be +1o 244

Fln~o a \J1 B Machine OL. 2 Figure 82

a ab a~~~~ PO 01\0~.b B A Machine OL). Figure 83

6.4.2 I-Way 1-Dim n-Head n-Tape Machines Theorem 4.3 If OU is a 1-way 1-dim n-head machine operating such that each head hi works on a distinct tape written over Zi (i = 1,2, oo., n) then G(O) = where 3 is a RE over Proof. An effective procedure exists to determine if any 0Q is 1-way (see Section 6.5). Without any loss of generality we can assume AO to be I-way in the +1 direction for all headsin which event all the accessible transitions of 0M will carry labels of the form Cc2. ooo n/ll, o. 1 where aieEi. One can remove all inaccessible transitions from O( without alterning G(Ot). Since the heads of Ot move in synchronism, one can imagine the input to Ot to be either a set of n single channel tapes or a single n-channel tape (or more precisely the separation of a single n-channel tape). If one adopts the latter point of view then the n reading heads hl,h2, Doo, h reading over l,2 ooo. respectively can be considered as one reading head reading over the alphabet E Therefore, via Theorem 4.1, 3, the set of single n-channel generators accepted by the 1-head machine reading over ri would be expressible as a RE over 0 Taking the separation of 3 one gets G(OL)o i.eo, G(OL) = TWo QED 247

Example 4.3 Let OC be the 1-way 1-dim 2-head machine given in 4.3 Figure 84. Head h1 works on tapes written over: = {B,0,1} and h2 works on tapes written over,2 = {B,al} Find G(0t4 3). (BB)/,l 2l y /,~ ~(:B,B), (B,a) Machine Ot4 3 Figure 84 Considering Ot43 to be 1-head reading over LJ one finds via theorem 4.1 that B I a a B al ___ LJ L & [a] LJ [: and that G(UOL3) J - U U 4 0 3 L0MB] W1 lE] [LJ}{ L UT Theorem 4.4 If 3 is a RE over then there exists a 1-way 1-dim 248

n-head machine CL in which each head h. reads over i. and for which G(OI ) =. B* Proof: Via the method of Theorem 4.2 construct C'" a 1-way 1-dim 1-head machine reading over which has G(OV") =.o Each transition of O'" will be labelled o2 | Convert J' to a 1-way 1-dim n-head machine by changing each transition label of QC' as follows: L I — (a1,a2,..., cn)/(1,l,.o, 1) - n The resulting machine O, has as generators precisely the separation of G(Oj'); i e, G(Ot) = G(OL")' = To QED Example 4.4 Given B O0 jB 0' 3- -a a B Uo construct machine CL such that G(Ot4) = Using the method presented in Theorem 4.4 one first derives the machine C"v4 (Figure 85) Applying the mapping Ca to the transition labels of 04o4 one gets the desired machine 04 (Figure 86 ) 249

~i Figure 85B 0 B, 0,1,1 (:E3,Bo), (o,o,)1) 250 Machine 0 Figure 86 250

Note 4.2 If Otis a 1-way 1-dim n-head machine working on m tapes (m < n).then some tapes can have more than one head per tape. If any two heads hi and hj are on the same tape and move in the same direction then their positions will always coincide and they can be replaced by a single head; if such is the case for all heads on each tape then OL can be replaced by a 1-way 1-dim m-head machine''' that is equivalent to L (i.e. G(0t) = G(Ot"') = P where P is a RE with m channels). 6.4.3 2-Way 1-Dim 1-Head Machines Def. 4.1 Let 3 be a RE over L/D1]; ~ will be said to be realizable if and only if 3 or any of 2 2 L /D its equivalent RE's has no well-formed part of the form ^ V~/d, o /d2 A1 U A2, ^ acr/d /d / L~~n/Sn _~ n/JY2 where for some i = 1,2, ooo, n d i d (A1 and A2 are sets of partial tapes over [y2;^] ). _n/Dn 251

Theorem 4.5 If Ot is a 2-way 1-dim 1-head machine working on tapes written over E then G(Ot) = f3 where 3 is a realizable RE over Z/l. Proof: Let Ot' be derived from 0, by considering Otto be a 1-way 1-dim I-head machine reading over Z/l with the head movement of +1 for all transitions of Ot' understood. 3 is the RE over Z/1 representing G(0t'). From the fact that for a given state in %' and for each aeE there is only one transition in OC' one deduces that P is realizable; the assumption that 3 is not realizable would imply that 0, has a state with two transitions for the same input -- this is not allowed. Let tcT(Ot)o The behavior of Ct on g (t) can be described by the sequence p = s, a /d 1; l/dl;..; p-l/dp- where O starts in state sI, reads a (in cell o) moves its head d and o o goes to state sl;..... Ot in state si during the i-th cycle reads ai (not necessarily in cell i) moves its head d. and goes to state si+l.ooooC in state sp during the p-th cycle reads ap and goes to A (Ot accepts g (t)). Consider the partial tape t' = j 0 a1l/dl C1.. extracted from p. Since t' derived from the functioning of 0 on t it follows that ao/do is an initial symbol of 3, o a final symbol of 3 and (ai/di, ai+/di+l)a transition of P for i = 1,2,..., p-l. Thus t'eso The definition of the fold operator exactly parallels the head movement of Ot so that t'f = g(t). But t'cB- t'fef so that g(t) = t'fef.. one has the partial proof ge G(OL) - gef. 252

To complete the proof one must show that gef - geG(OIt). f Take any g in of Therefore there is some t' in 3 such that t'f = g t' is in 3 therefore t'cG(l0t ) Write the sequence p = s, o/do; Sll/dl....; s. o p that describes the behavior of Ot on t' = od al/dl i a;' starts in s, reads ao/do of t', goes to state sl, reads al/dl, goes to s2, o., goes to sp, reads ap, goes to Ao But if OCt accepts t' then 0t accepts t' = g since the fold operation parallels the head movement of t Thus g3f, which completes the proof QED Example 4a5 Let 0t4, the 2-way 1-dim 1-head machine working on tapes written over Z = {B,0,1}, be shown in Figure 87. Find G(Ot)o 0/-1 o/-l o/ Machine O4l Figure 87 253

From,n one gets 3 = (0/-1)(O/-l)*BU (1/1)(1/l)*B or that G(Ot4.5) = [(0/-1)(0/-l)*BU (l/i)(l/l)*B]. Observe that: is realizable. Theorem 4.6 If D is a realizable RE over Z/1 then there exists a 2-way 1-dim 1-head machine 0V such that G(Ot) = 3f Proof: An effective method exists to determine if 3 is realizable (see Chapter V). Construct via Theorem 4.2 the machine Cq' that reads over Z/1 and has G(Ot ) = Po Since P is realizable we are assured that for each aec and each state of OL', OL will have just one transition. Thus if we convert t' to a 2-way 1-dim 1-head machine Ot reading over Z by applying to the transition labels of OL' the mapping (a/d)/l->- a/d we are assured that OQ is in legitimate form (io.e only one transition leaving each state for each input). The proof follows by reversing the arguments of the proof of Theorem 4.5. QED Example 4.6 Let P equal the realizable RE (b/l)*B U (c:/-l)*Bo Find a 2-way 1-dim 1-head machine 04.6 such that G(OL40 6) = f~ 1', the 1-way 1-dim 1-head machine reading over 4.6 {B,b,c}/l that satisfies G(Ot14 6) = 3 is computed via Theorem 4~2 and is given in Figure 88. The machine O 46 which satisfies G(O4\,6)= =f is obtained from Cl 46 by applying the mapping (a/d)/l a/d to all transition labels in 0l 6. Ot4.6 is given in Figure 89. 254

Machine 04. 6 Figure 88 b/l Machine Figure 89 2^5

6.4.4 2-Way D-Dim 1-Head Machines Theorems 4o5 and 4~6 can be immediately extended to 1-head machines working over D-dim tapes; the proofs are essentially the same as in Theorems 4,5 and 4.6, differing only in those places where the head movement goes to D-dimensionso The D-dim theorems are given below without proofs but with exampleso Theorem 4~7 If Otis a 2-way D-dim 1-head machine working on tapes written over E then G(OL) = 3f where 3 is a realizable RE over E/Do Example 4,7 Ot4.7 shown in Figure 90 is a 2-way 3-dim 1-head machine working on tapes written over Z = {B,O, 1 }o G(0t4 7) is derived to be {(B/-3 )*[(0/1)(/2)(1/-l)U (1/1) (/-i)l}f 0/y /1/2 /-1Ma Machine 4o 7 Figure 90 256

Theorem 4o8 If 3 is a realizable RE over E/D then there exists a 2-way D-dim 1-head machine OL such that G(0 ) = f o Example 4.8 Construct a 2-way 2-dim 1-head machine 0Ot 8such that G(0t 48) = f when 3 = (a/0)(a/l)*(b/2)(a/2)*bo OtL8 the 1-way 1-dim 1-head machine that reads over 4 8 {a,b} /2 and for which G( 0t 8) = 3,is computed via Theorem 4.2 and is given in Figure 91. V a/0)/i (b/2 )/lc\ J^ ~ ^ b ) ( (a/l)/l (a/2)/ Machine 0 8 Figure 91 The machine Ot48 which satisfies G( O 8,4) = 3 is obtained fromOl 48 by applying the mapping (a/d)/l->a/d to all transition labels in 0 o8 Ot is given in Figure 92. 4.8' 4.8 a/0 _ / b/2_ b a/I a/2 Machine O4 Figure 92 257

6.4.5 2-Way D-Dim n-Head n-Tape Machines Theorem 4.9 If OL is a 2-way n-head machine with each head hi(i = 1,2, o on) working on a distinct tape of dimension Di and written over SL then G(O) = 3f where 3 is a realizable RE over F_/D] Z2/D2 Proof: The RE 3 is obtained by applying the mapping (la 2, oo an).(d.1d2, oo, d'n)~ (al/dl a2/d2..., an/dn)/ll o 0 1 to each transition label of OL thereby obtaining a 1-way n-head machine 01' whose heads read respectively over 1i/Di, 2/D2, o, n " /Dn; let = G(OL ) Arguing as in the proof of Theorem 4.5 if t' is an n-tuple in 3t then OLI accepts t';and if t'f 0 ~ then Ot when working on t'f will go through the same sequence of states as 0Ot and therefore t'f is accepted by OL (or t'f eG(Ot))o Thus PfC G(Ot)o Conversely if t is some input in G(01) then by examining the behavior of Ot in accepting t we can deduce the sequence t'-. such that tf = to Thus 3~f= G(Ot)o The conclusion then is that G(Ot) = Pfo That 3 is realizable follows from the observation that if 3 were not then one could show 0L must have a state with two transitions leaving it for the same input n-tuple; this is not allowed. Therefore P must be realizable. QED Example 4.9 Let Ot9 be the 2-way 3-head machine shown in Figure 93. Each head of 09gworks on a distinct tape with D1 = 1, D2 = 2, D3 = 3 and. i = 2 = = _ {B,0,1}o Find G(Ot4 9)~ 258

/..... 0,0,B/11. ~ \ B,BB /, 0o,o0/0,0o, 11 s s2, A,, ) S/-1,0, 3 Machine O4 Figure 93 Applying the mapping of Theorem 4.9 one obtains the 1-way n-head machine 0tl' shown in Figure 94. (0/0,0/00 e 1 B/ A )1A i0o ~ o,00 o o s1- s. < -ri~y (1/-1,o/oB/3)/i// l Machine O4 9 Figure 94 Theorem 4.3 applied to 0L* yields G(0O 9) = where 3= F o/pi F o-o *o 0/1 FBi 0/1 j0/0 U 0/0 0/1 B | B/3i IB/3j L I /3 BJ 259

Thus - o/1 i/-l "0/01 * o 0/il B G(OL ) = = o/ 0o/0 U 0/0} 0/1 B~ L1 IB^.B/3 I L/1 I B/3^,B^ / -l L Lo/ j, Theorem 4o10 If P is a realizable RE over ZZ/D1I E /D then there exists a 2-way n-head machine O~ such that G((l) = 3f lD Proof: Construct via Theorem 4.2 the machine Qj that reads over |//D1 and has G((' ) = f'. Obtain CV from Co by applying the 2/.D2 Zn/D n n mapping (dl/dl, ao/d2,., dn/dn)/ll,., l —(ol' a2,, an)/dl d2, ood to all transition labels of L'. Since 3 is realizable we are assured that will have only one transition leaving each state for each input n-tuple. The proof follows by reversing the arguments of the proof of Theorem 4i,9^ QED Example 4.10 Construct a 2-way 2-head machine Ct410 such that G(t4.1)= where 3 = [0/1;0/1 * L1/2 I0I Lo/ Lo/d 1 J 4.10 4 10,' ^"the machine with G(O' 10) = is shown in Figure 95. ~ (O/IO/)/l (l1/2,l/-2)ll 0 A 0 ( (oA,4 o/i')/,z 1A Machine OL o 10 Figure 95 260

Applying the mapping of Theorem 4.10 to O410 one obtains CL shown in Figure 96. 4.10 (0.90)/1,1 0.0 0,0 A \ v Machine 4O 4.1o Figure 96 6.4.6 2-Way D-Dim n-Head m-Tape Machines Theorem 4.11 If Otis a n-head machine operating on m tapes (m < n) such that the first n1 heads work on tape t1 written over Zn in D1 dimensions, the next n2 heads work on tape t2 written over E2 in D2 dimensions,...., the last nm heads work on tape tm written over Zm in D dimensions (n >1;i=1,2, ^, m) then n1 G() = where is a realizable 261

RE over ZL/D1 ~/D1 n1 Z/D 2 2 Z /D m-m m _Zm/D Proof: Let O_' be the same machine as Otbut with each head on a distinct tape jthen via Theorem 4.9 let B be the realizable RE over Z/D1 1 x/DJ n 1 ZJDm m/Dm M n /Dm J such that G(Ot') = fo If t' = (t,t' o, t') e f and if 1 n t = t'C' n then Ot when working on t will go through the same nm state sequence of states as Q( 1 working on t'. Since every filled cell of t is scanned by Ot (since every filled cell of t' is scanned by OL') and t is accepted by Ct, teG(Ot) or in other words f ( Ln1 cG(Ot)> Conversely if t = (tl, t2, o, o tm) G(Ot) then there isni an n-tuple t' = (t2, t o tn) where t! is the generator scanned by 262

by the i-th head of Ot t' musin an must equal t. Thus 5G[ ] G(Ol Lo Concluding then, f n2 G(OL) QED Example 4.ll Let Ct be the 3-head machine shown in Figure 97 4 11 Heads h1 and h2 work on the same tape of dimension 2 written over E1 ={Ba,c}j head h3 works on a tape of dimension 1 written over Z2 = {B,O}. Find G(O0 41)~ 4/, 11 hl ccB/2, -l ccB ca a, c0/1,1 aa,/2,11 Machine t 4 11 Figure 97 c/l a/I c/2 a/2 c Applying Theorem 4~9 one obtains c/2 U7 a/Io c/il a/l c 0/1 o/1 B 0/ B LD JL L / fand thus G(Q ll) ~ [~1- Fc// LI26 a1 0/i 0/1 B/"li 0i0 B J 263

Z1/D Th Theorem 4o12 If 5 is a realizable RE over nl E/D m mm) m Then there exists a 2-way n-head machine OL, (n = Z n.), such that i=l l 4f nl G(OL) = L ] Proof: Let tlbe the machine obtained by applying Theorem 4o10 to (ioeo G(Ot) = 3f )o Instead of letting OL operate with one head per tape alter Ot such that the first n1 heads of OL operate on a single tape tl, the next n2 heads operate on a single tape t2, ooo, the last nm heads operate on a single tape tmo The proof is completed by reversing the arguments of Theorem 4.11. QED Example 4.12 Construct a machine 0412 that works on 1-dim tapes over Z = {B,0,1} and such that G(Ot) = {tl t = 1 110 0 -0 I " 0 1.. Bl. k -,2, One can show that G(Co) = O/l FO/L] * L/1 | 0/1 FB 23 4,12 0o/0 L /J 0/0 0/1 1 1 J Thus OL is the 2-head machine shown in Figure 98 with both heads working on the same tape, 264

\ (0o,)/l1,0 ((0,0)/1,1 0 (oyo)/io / ^ (' / so /'. A N Machine Ot4 12 Figure 98

6.5 ASSORTED ALGORITHMS AND THEOREMS DEALING WITH THE DECISION PROBLEMS AND SPEED OF OPERATION OF n-HEAD MACHINES 6.5.1 Algorithm for Deciding 1-Wayness of Machines The algorithm will be given below assuming that the machine aL under consideration is n-head working on n-tapes (ioeo, one head per tape); the remarks following the presentation of the algorithm indicate how the method may be extended to include machines with more than one head per tape. Algorithm 51l Let Ot be an n-head machine working on n-tapes (one-head per tape) and let the state set of ( be S U {AR} with sI eS. The transitions of OA going to A or R will be assumed to cause no head motion of OL o 1) Let i - 0 and (O) = {sI 2) Pick any transition leaving sI and not going to A or R; let the head motion associated with this transition be the n-tuple d = (dld2, o o dn)o If no such transition exists Ot is trivially 1-way (ioe. OC never moves since all transitions from sI go to A or R)o 3) Consider all transitions leaving states in f(i), all these transitions must either go to A or R or must have head movement n-tuples equal to do If this criterion is not met 01 is not i-wayo If it is met let Q (i+l) =- (i) {all destination states of transitions leaving states in& (i)}o 266

4) If'(i+l) = - (i) halt; Ot is 1-way; if ~ (i+l)3 (i) augment i by 1 and go to step 3). Proof: First of all, transitions leaving s' are accessible since any symbol can be put in the initial cell of each tape. If OL is to be 1-way all transitions leaving sI therefore must go either to A or R or else have the same movement n-tuple do If in the application of the algorithm OC has not been disqualified as a 1-way machine after i repetitions of step 3) then we know for all inputs to Ot, QL either accepts or rejects the input or else has moved to some state sj(# AR) the heads of Ct always moving d each machine cycle and thus after i cycles each head of OL is scanning a previously unscanned cell and so all transitions leaving? (i) are accessible and therefore must go to A or R or also have head movement d. If a transition from g (i) has a head movement not equal to d there is an input to OL for which OL is not 1-way. The algorithm halts in at most S + 2 repetitions of step 3) since S U {A,R}I (i+l)2 M (i). QED Note 5.1 Let = X(i) where g (i) = (i+l) in algorithm 5.1; is then the set of accessible states of OL o T(Ot) = ~ if and only if A 0 o Note 5,2 One can extend the algorithm to the case of many heads per tape by implementing the following step: 267

If two (or more heads), h1 and h2, of OL work on the same tape then during the first machine cycle of 0t we only need to consider those transitions from s in which hi and h2 read the same symbols; if the d associated with any one of these transitions indicates that hi and h2 move in the same direction then in applying the algorithm one observes that transitions leaving g (i) are accessible if and only if hi and h2 read the same symbols (assuming Ot is 1-way) therefore transitions leaving (i) and in which hi and h2 read different symbols can be considered inaccessible and can be ignored in applying the algorithm, If the d associated with the transitions leaving s indicate that hi and h2 move in different directions then for all states in (i) - {sI} transitions for which hi and h2 read different symbols must be considered accessible -- furthermore if for some y (i) there is a transition leaving a state in (i) and returning to sI then all transitions leaving sI and for which hi and h2 read different symbols must be considered as now being accessible. 6.5.2 Algorithm for Deciding the Realizability of Regular Expressions Algorithm 5,2 Let 3 be a RE over L /D 2/D2 n/D 268

To check if P is realizable, attempt to construct via Theorem 4o10 an n-head machine 0o such that G(Ot) ='f When the proposed Ot is obtained check each state of O, to see that only one transition per state is labelled with a given input. If the check is unsatisfactory then it follows that 8 is not realizable. Further, if 3 was not realizable 0( would not pass the check. Therefore D is realizable if and only if CO has one transition per state for each inputo 6.5.3 1-Way 2-Head Equivalents of 2-Way 1-Dim 1-Head Machines Shepherdson(7) has shown that for any 2-way 1-dim 1-head machine Ot if one restricts the inputs of OL to those 1-dim tapes for which Ct never scans cells to the left of cell 0 then there is a 1-way 1-dim 1-head machine which is equivalent to Oto It is impossible in general to construct a I-way 1-dim 1-head machine equivalent to Ot for all inputs. One can construct, however, a 2-head machine that is 1-way and equivalent to O o Theorem 5.1 If Ot is any 2-way 1-dim 1-head machine then there exists a 1-way 1-dim 2-head machine, constructable from Ct and denoted by (5 (Ot), such that G(0t) - G(3 (OL)). Proof: S (at) will have two heads h1 and h2. Initially hi and h2 will both be placed on the initial cell of the tape to be examined. Once a (OL) is operating h1 will move one cell per machine cycle in the -1 direction and h2 will move one cell per machine cycle in the +1 direction; therefore ( (OL) will be a 1-way 1-dim 2-head machineo 269

For all 1-dim tapes the head positions of5 (0t) after k machine cycles will be | k-l |.-k' — k4l L' a1 |- *k-l |k 1 1k+l h1 h2 and the input to ~ (Of) will be (a_k-, Ck). For any tape t let tk be the subtape of t consisting of the cells -k, -k+l, ooo 0, 0.o k-l, ko The crux of the construction of (O) depends on the observation that given 01t < CS sI M > working on tapes over Z then for any 1-dim tape t and any integer k, 2S tk can be put into one of 2 + 2So,(2St + 2) OL equivalence classes depending on the behavior of Ot on tko Furthermore, if [tk] is the equivalence class of tk and a-k-1 and ak+l the contents of cells (-k-l) and (k+l) of t then [tk+l] is uniquely determined by [tk] and (a-k-l' ak+l)~ The state set of O (0V) is made up precisely of these equivalence classes [tk],and the transitions of; (OL) on inputs (c-k-l' k+l): E x Z are determined as follows: 1) If on reading tk, Ot goes to A(R) then Ltk] = A(R); in the event Ot weakly rejects tk without ever leaving tk then [tk] = R; thus we have identified two of the equivalence classes, A and Ro 2) If on reading tk, 01 does not accept or reject tk then QO must step off tk at either the left (-1) or right (+1) end in some state sicSot If one knew the behavior of Ot on tk if (01 started on cell -k and 270

again on cell k beginning in each state of iO then one could find [tk+Y] for all i > 0 without knowing precisely what tk was ioe., one only need know [tk]o Thus for any tk, [tk] can be A, R or a behavior label of the form si, P -l +1 S -1,1 11 a S2.11,2 12 0o SOL ^1-SOt_ (1, S where p = + 1 and sip denotes that when working on tk - steps of the p-th end of tk in state si and where xy denotes the behavior of OL on tk if started on the x-th end of tk in state syo aC =y A(R) if O0 moves to A(R) without leaving tk, A = R if O( x, y weakly rejects tk without leaving tk, xy = sj., if Ot leaves tk on the 9-end of tk in state s j Since for every tk and a given Ot one can put tk in precisely one of the above mentioned equivalence classes one gets that the number of equivalence classes is 271 I i IL O

1) If [tk] = A(R) then for all 2 > 0, [tk+2] = A(R). This means that if G0, accepts (rejects) tk without reading a k-1 or ak+l then a k-l- and ak+l+i can be anything without affecting the behavior of OC or d (0L) on to 2) If [tk] = si' p Sl -'+ ) -1 a s 52 a:1-2 11,2 ~SoL o then one determines [tk+l] in the following manner: (assume p = -1, if p =+1 just alter the following presentation accordingly). a) if Ct0 moves to A(R) on reading ak-l in state si then [tx] (f-k-l.')) A(R); au indicates that ck+l can be anything, even a symbol not in Z, since Ot would never read ak+l; (ioe. cell k+l is not a filled cell in this particular generator of OL with respect to t)o b) if OL on reading a_k l moves back onto tk in state sx then consult cal,x of [tk] to see what OL would do on tk: if a- = A(R) then [tk] (a-k-lR) A(R), if a- = swl then one knows OA will return to read a-k-1 in sw without scanning ak+l; so examine what Ot would do in sw reading ao kl and re-apply b)o 272

if a- = s,+1 then one knows OL will step off -1,X WI tk on the right to read ak+l in state sw, examine what GL would do and re-apply b) or apply c) getting [tx] (.-k.lak+l)_[tkl] (k+l is used in place of c since if ak+l is scanned by Ot then [tk+l] is not independent of ak+l) c) if in applying b) one discovers CL would move -1 to scan -k-2 or move +1 to scan-ak+2 then [t] (ak_-lak+l), p [tk] -1,1^, S2 P-1,2 P1,2 =[t I ~O t k+l 1 where p = -1 if L scans a k-2 or +1 if Ot scans ak+2 s. being the state Ot is in when moving to scan a k2 or ak+2; is also determined from OC and [tk] by using a) b) c) but by starting OL. in state sy on a-k-l if x = -1 and on Ok+l if x = +1o An efficient way of constructing X (O) is to begin with an initial state, I, and let C (OC) start with both heads on the initial cell of t. Thus the only transitions from I that can occur are transitions on inputs of the form (a,a) since h1 and h2 read the same symbol when in I. By applying a) b) c) to I of Q (OL) one finds all the states of 8 (mO) immediately accessible from Io To these states one applies all inputs from Z x Z (all inputs are possible since ) (0L) is i-way) and finds the second rank of accessible states of 273

(OL); one continues in this manner until a closed machine a (OL) is formed. The manner of constructing ( (0c) assures one that G(Ot) = G( ~ (Ct)). Furthermore) (Ot) may Strongly reject some tapes only weakly rejected by OL; if one desires t(OL) to also reject (weakly reject) a tape if and only if 0tdoes it then this can be accomplished by adding a weak reject state, WR to 0 (Ot) (WR=a state that loops on itself for all inputs) and when in constructing 5 (Ot) a weak reject by Otis uncovered do not send (OL) to R but rather to WRo QED Example 5.1 Let Ot be the 2-way 1-dim 1-head machine shown in Figure 99 that reads tapes written over Z = {B,1} and accepts input tape t if and only if t has a blank initial cell and a 1 to the right and left of the initial cell. Find ( (C5,1)o B/1 A B/-i B/, 1/-1 Machine C( Figure 99 274

Let I be the initial state of 5 (05 1)~ When (Ot 1) is in I the only possible inputs to 5( CO 1l ) are (B,B) and (1,1). So considering tk = 7~T and tk - =I l one finds that I \ (B,B) \ (11) [tobe read as:"state I on input (B,B) goes to state 52' 1 R s2, s2,1 s2,1 s2,1 s2, s2 1 s2,1 s 2,1, etc. o" ] s3,-1 s3 -,-11s3l-1 Continuing one gets \^ (BB) \ (B.I) \ (1,B) 1,1) 2) 1 s2,1 s3`-1 s2,1 A s,-l R R s1, 1s21 S2-1 s2,1 1s2 1 s3-1 R R 2'1 s-1 s2,1 s2, s3,-L1 s3-1 As3, s2,1A S3-1 - 1 s-13 -1 A A \ (B,.) \1, ) s31 R s3,-1 s3-1 A s -1 \ ( (1)9(B) \ i(Ll) A A A Thus a suitable state graph for (015 ) is given in Figure 100. 275

I (B,B)/-,l / s2,1 s2,1 s2,1 s3,'-1 R (B,B)/-,1 \ -l 1 (1,1/ / (\/ \/^A A Machine 5 ( Figure 100 276

6.5.4 The "Particular Input" Decision Problem Defo 5ol Let Ct be any n-head machine and t any input to Ot(t is in general an m-tuple of tapes) then T(t) is defined if and only if O, accepts or strongly rejects t and in that event Tt(t) equals the number of machine cycles it takes for CL to accept or reject to Theorem 52 If aO = < C,S,sI,M > is any 1-head machine working on D-dim tapes and if t is a D-dim tape for which the initial cell and all nonblank cells can be enclosed in a D-dim rectangular para.llelepiped of dimensions 21 x 22 x.... x D and if Tot(t) is defined (if Ctaccepts or strongly rejects t) then T (t) < S I (. + 2S) OL i=l 1 Proof: Let P1 be the rectangular parallelepiped of dimensions 21 x 22 x.ooo x iD that encloses the initial cell and the non-blank cells of to Enclose P1 with a larger rectangular parallelepiped P2 such that the corresponding sides of P1 and P2 are S cells apart. P2 therefore has dimensions (21 + 22) x (2 + x..x ( SD + 2S)o Let CO work on t, its head starting on the initial cell inside P1o After S I (e + 2S) = machine -L i=l cycles one of three possibilities must have occurred: Possibility 1) 0 accepts or strongly rejects t; in which event the theorem holdso Possibility 2) O neither accepts or strongly rejects t and the head of 01 never left P20 But T equals the total possible combinations of head position in P2 and state of OL; if after T machine cycles 01 never left P2 nor accepted or rejected t than at must be in a loop and therefore never will accept or reject to Thus the theorem holds. 277

Possibility 3) CLneither acceptednorrejected t and the head of OtLeft P2o Let h (the head of Ot) have left P2 for the first time during the i-th machine cycle. By the construction of P2 and P1 one knows that h has read B for the last S machine cycles preceeding the i-th. Since in reading these S B's Oaneither accepted nor strongly rejected t but instead moved away from P1 we are assured that Ot will continue to read blanks and move further away from P1, never accepting or strongly rejecting t. Thus the theorem holds. QED Note 5.3 In the trivial case of O being a O-head machine the acceptance or rejectance of all tapes is a function only, of S and M of C0 o If To(t) is defined in this case then for all t, To(t) < S. (45 Note 5~4 Minsky has shown no procedure exists for determining if a general 2-head 2-tape machine accepts or strongly rejects a particular input to His results in no way require the heads of the machine to work on separate tapes and so one can conclude that: if n > 2 no procedure exists to determine if a general n-head machine accepts or strongly rejects a particular input to In contrast with theorem 5,2 it is a direct consequence of Minsky's result that there is no function f(Ot,t) of Otand t such that if Ot is a general n-head machine and t an inputr (t) < f(Ott) if T (t) exists. If such a function existed then there would indeed be a procedure to decide if any general n-head machine accepted a particular input to 278

6.5.5 The Emptiness Decision Problem Of the several decision problems one can propose dealing with n-head machines there are three which can be shown to be equivalent~ These decision problems are: 1) The emptiness decision problem: given any n-head machine Oadoes Ot accept any input whatsoever? (ioe, does T(Ot) = 0)o 2) The state accessibility problem: given any n-head machine t and any internal state s of Otis s accessible? 3) The transition accessibility problem: given any n-head machine O and any transition T of Ot is T accessible? Theorem 503 The emptiness decision problem (1), the state accessibility problem(2), and the transition accessibility problem (3) are equivalent in the sense that one can devise a general procedure to answer one of the problems for all n-head machines if and only if one can devise a general procedure to answer all of the problems for all n-head machines. Proof: One can present the proof by showing that a general procedure to solve (3) ~ a general procedure to solve (2) = a general procedure to solve (1) = a general procedure to solve (3) [ or in short notation gp(3) == gp(2) = gp(l) ==gp(3)] a) gp(3) = gp(2): let L be any n-head machine and s any state of O o gp(3) assures us we can determine if any transition of CO is accessible. Consider each of the transitions entering s and determine if each is accessible. s is accessible if and only if one or more of the transitions entering s is accessible. Thus gp(3) ==gp(2)o 279

b) gp(2)_=- gp(l): let Olbe any n-head machine with ACCEPT state Ao T(Ot) 4 0 if and only if A is an accessible state of Oto But gp(2) assures us we can determine if A is accessible. Thus gp(2) =- gp(l)o c) gp(l) ==gp(3) let O be any n-head machine and T any transition of ( o Alter aO by letting all inputs to A go to R and by changing the destination of T to A (if T goes to A originally then leave it) Call this new machine O o T(O' )' 0 z T accessible in aC and gp(l) assures us we can determine if T(O ) = 0~ Thus gp(1i)zgp(3). QED Theorem 5.4 Given any 1-dim 1-head machine 01 there is a general procedure for determining if T(OL) = o Proof: If 0t is 1-way then one can apply the result of Theorem 7 (6) of Rabin and Scott to 01 and thereby decide if T(OL) =,o If Ot is 2-way then Theorem 7 of Rabin and Scott can be applied to; (OL), the 2-head l-way equivalent of OL; T(OL) =, if and only if T(3 (Ot)) = o QED Note 5.5 If Ct is a 1-dim 1-way machine then Theorem 9 of Rabin and Scott can be applied to Otto determine if G(Ot) is infinite~ If Ot is 1-dim 2-way then Theorem 9 of Rabin and Scott can be applied to g (t) to determine if G(OL) is infinite. Note 5.6 Since every 1-way n-head machine reading over,..o. is isomorphic to a I-way I-head machine reading over Xn it is evident via 280

Theorem 5.4 that a general procedure exists to determine if T(Ot) = ~ if (Jis a 1-way n-head machine. Theorem 5.5 There is no effective procedure for deciding if T(OL) = for any general n-head machine O, if n > 2. troof: This result is proved by Rabin and Scott in their Theorem 19. QED Theorem 5.6 There is no effective procedure for deciding if T(0i,) = for any general 1-head machine O1 if O1 works on tapes of dimension D > 2. Proof: Consider the set _ of all 2-way 1-dim 2-tape 2-head machines such that the state set S of each machine in X is partitioned into two sub-sets S1 and S2 and such that on all transitions from states in S1 only head h1 will move and on all transitions from states in S2 only head h2 will move. The set % is precisely the set of "two-way two-tape automata" described by Rabin and Scott. The input to any machine ) in will be restricted to pairs of 1-dim partial tapes of the form (htlh, ht2h) where the initial cell of each tape corresponds to the first cell in tl and t2 respectively and where E, the alphabet of t1 and t2 does not contain h. h is an endmark and in operation ) confines its head movements strictly to the cells filled by htlh and ht2h. Rabin and Scott have shown in their Theorem 19 that in general no effective procedure exists to determine if T( ) = 6. One can show that for any / in ) there is a 1-head machine 0rA working on 2-dim tapes such that T(,)) = - if and only if T(Ot) = 0 281

and therefore no effective method exists to determine if T(Ot ) = 0 since if a method did exist we could determine (contra Rabin and Scott) if T( ) =, for all ~ in % o Let B E ~. Let tl and t2 be any 1-dim partial tapes over A, the alphabet of B. One defines (htlh) x (ht2h) as follows: (ht1h) x (ht2h) will be a 2-dim partial ta.pe written over (Z U {h})2 such that cell (0,0) will be the initial cell of (htlh) x (ht2h) and such that if ai is in the i-th cell of htlh and T. is in the j-th cell of ht2h then J cell (i,j) of (htlh) x (ht2h) will contain (1i,Tj). If Ig(tx) is the number of filled cells in tx then the contents of cell (i,j) in (htlh) x (ht2h) is defined only for - 1 _ i < g(tl) and -1 < j < eg(t2) From one constructs 0 2 such that Ot2 has the same transition structure as h; howeverOt2 is 1-head and reads over inputs in (Z U {h})2 whereas J is 2-head and each head reads over ZU {hJo Thus the input labels to transitions in Ot2 and ) will identical. As for the head movements of ~ 2 if a particular transition of ) had head movement a) (1,0) then OL2 moves its head + 1i b) (-1,0) then OD2 moves its head - 1, c) (0,1) then 012 moves its head + 2, d) (0,-1) then Ol2 moves its head - 2. 282

By the construction of ) all head movements of i must be one of the four listed above; thus 02 is well defined for each. It follows directly from the manner in which 012 was constructed tha.t (htlh, ht2h) cT( ), (htlh) x (ht2h) cT(Ot 2) One can construct a 1-head machine Ot1 that accepts any 2-dim tape t if and only if t has a subtape of the form (htlh) x (ht2h). Furthermore O1X can be built such that it will halt on the initial cell of t if t is acceptedo If one merges and identifies the A state of OtL with the initial state of OL2 one obtains a composite machine Ot such that T(Ot) ^ d ==- T(CL,) f T( () - i -~^ does there exist a t1 and t2 such that (htlh) x (ht2h) e T (012) -44" T( ) * 2 Since T() - - is not effectively decidable one concludes that T(Ot) *= is not effectively decidable. QED Note 5o7 In a manner similar to the proof of Theorem 5o6 one can show that no effective procedure exists to decide if any general 2-dim I-head machine strongly rejects any tape. 6.5.6 Boolean Properties of n-Head Machines Theorem 5o7 If O1L and OL2 are nl-head and n2-head machines respectively, then there exist machines ^1 and ( 2' each with at most nl+n2 heads such that 283

a) T(1) = T(0t1)\n T(0t2) and b) T( 2) = T(C01)U T(Ct). Proof: a) Let ^1 have nl+n2 heads with the first nl heads placed on tapes in the manner of O1 and the second n2 heads placed on tapes in the manner of O2o Let the states of 1 be doubletons of the form (sil, si2) where s i E SO,U {AR} and si2 E S t U {A,R}. Let (s s be the initial state of 011. If 1lis in state (Sil} si2) and reads input (al, ca2,, 0 anl+n2) then goes to state (s jl, s2) with head movements (dl,d2,...., dnl+n2) where S jl sj2 and dl, d2,.... dnl+n2 are determined from the transition tables of 0t1 and 012 as follows. M1 ( il a1 2,. )nl2)- s l al( 1 d2 nl M%2: (si2, anl+l, anl+n2 (sj2, dnl+l,.... dnl+n2) [it is understood that on all inputs 0,- and?aL2 go from A to A]. he ACCEPT state in 1 is (A, A). As constructed ^1 will accept an m-tuple of tapes t if and only if t is accepted by both O1 and O 2; thus T(i1) = T(Ot11) ( T(OC2)~ b) Construct 2 exactly as 1 above and then merge all states of the form (A, A), (A, si2 ) (sil, A) into a single accept state. 2 will accept an m-tuple of tapes t if and only if COL or O2 or both accept t; thus T( 2) = T(01i) U T(0t2). QED 284

Theorem 5~8 If Ot is any n-head machine that strongly represents T(Ot) (i.e., any input to OL is either accepted or strongly rejected) then there is an n-head machine i that strongly represents rT( ). Proof: Interchange the labels of the A and R states of OL One obtains an n-head machine ) that strongly represents r' T(Ot) since if t takes Ot to A then it takes ) to R and if t takes Otto R it takes' to Ao QED Note 5.8 One is obliged to restrict the hypothesis of Theorem 5.8 to machines that strongly represent their sets. The reason for this is that there are some sets which can be weakly represented at best and thus the construction of Theorem 5.8 would not be possible. A case in point: let T be the set of all 1-dim tapes over {B, O} such that at least one cell to the right of the initial cell contains 0. T can be weakly represented by 1-way 1-dim 1-head machine O0 shown in Figure 101. 5.2 Figure 10. Mhine Fi.gure 101o Machine C 285

Any tape t' with all blanks to the right of the initial cell is weakly rejected byO0 o2 therefore any machine ) that purports to represent rT(0t 2) must accept t'o But no such f can exist for we would have to require that ) check all cells to the right of the initial cell for blanks - thereby implying that ) must go through an infinite number of cycles before accepting t o But via Theorem 5.2 one deduces that ) must accept t' in a finite number of cycles. Therefore by contradiction can not exist~ 6.5.7 Speed Theorems Theorem 5.9 If Ot is any 1-dim 1-head machine and t any tape for which T t(t) is defined then TOt(t) > T 3()(t). Furthermore if in accepting or strongly rejecting t, Ot stands still or reverses direction then T t(t) T (t) Proof: For any 1-dim tape t let tk be the subtape of t consisting of the cells -k, -k+l,.o.., -1,0,1,...., ko If t is accepted or strongly rejected by C then there exists a smallest k such that OXaccepts or strongly rejects tk and never leaves tk. Since k is the smallest such number it follows that O must read cell -k or +k of t. Therefore T~t(t) > ko But by construction 0(Ot) is 1-way; thus one deduces that T ( L) k. Therefore T (t) > T(OL) (t) If in addition one knows that Ot stands still or reverses direction in accepting or strongly rejecting t then Tot(t) > k = T ) (O)(t) 286

Theorem 5.10 If OC is any 1-dim 1-head machine and t any tape for which T((t) is defined then there is no n-head machine t for any n such that G( ) = G(Ot) and such that T (t) < T (Oj)(t). Proof: If i is any such n-head machine then as in the proof of Theorem 5.9, P must s.can cell -k or +k of tk in order to accept or strongly reject to Thus T (t) >k = T ( (t). Thus it is not possible that T (t) < T ((0t). QED Theorem 5.11 There exists an infinite collection of sets of 1-dim tapes <\ = {A.}, each set A. representable by I-head machines such that if j,. J J j is anyl-head machine representing Aj (i.e., T(0tj) = Aj) then for any tape t for which T Otj(t) is-defined T j(t) > T (t)( t) Proof: Let A. be the set of 1-dim tapes written over Z= {Ba} J such that A. = {tj there are at least j a's to the right and left of the J initial cell}. Aj can be represented by a l-hread machine C0. which Ji a operates as follows: 1) Ctj reads the initial cell; if B go to 2), if a go to 3) 2) move right counting the a's but not the B's; after j a's reverse and count left for 2 j a's. On the 2j-th a moving left accept to 3) move right counting the a's but not the B's; after j a's reverse and count left for (2j + 1) a's. On the (2j+l)-th a moving left accept t. 287

Thus there is at least one 1-head machine that represents A.. J Let Otj be any 1-head machine that represents A.. CL. cannot J a J strongly reject any tape since if t' is strongly rejected by 0.j then t' J must have less than j a's either to the left or right of the initial cell and no Ot. could check this in a finite number of cycles. Thus for any O0. J J Tj (t) is defined if and only if t e A.. But if t e A then Ot must reverse direction in accepting t since the definition of A. requires that Jj 0.j check both to the left and the right of the initial cell. Thus via Theorem 5.9 Tt (t) > T (O)(t) for all t such that T Cr (t) is defined QED Note 5.9 The final paragraph of the proof of Theorem 5.1 assures one that Tot(t) defined =t T (O)(t) defined for any 1-dim 1-head machine 0L and and any tape t. Furthermore if $ (Ot) is constructed such that g(Ot) weakly rejects t if and only if Otweakly rejects t then T (t) defined- -= Tm (t) defined. Theorem 5.12 For any integer k>O there exists an infinite number of sets of I-dim tapes all representable by 1-dim 1-head machines and such that if A is any such set and O1 any I-head machine representing A then a) for all t in A TL(t) > T (O)(t) + 2k b) for all t in A', A' being an infinite subset of A, Tot(t) > k T (o) (t). Proof: Part b) of the theorem will be proved first. For convenience and without loss of generality one can limit k to the even integers o 288

Let = {B, al, a2,. k+3} Let the set Ak be defined as all 1-dim tapes over E of the form shown in Figure 102 where a*i aoj. and a(Ci B for all i,jilr. Y 4 Y2 -Y3 — i 5 ~, I~ I~,!~ — -. I!~., --- - -,!o. I -,- - j,,! —- -: ~'- I,..' k' |(;' | A \' | At |'' - "-I G 1 | | A 3 1' Figure 102. Form of Tapes in Ak. There exists at least one 1-head machineOt that represents Ako O works as follows: 1) Read initial cell and remember aa; if ao = B reject to 2) Move right to first non-blank cell. This contains a o Check ac l aC~ and 0", B and remember aa1. 3) Move left past ac to first occurrence of aC. Check that Car~ has not occurred more than once. Move left to read cra Check a2 Check a2 B or 2 E a1 or AdO. 4) Move right past cl,...... alto 0.C1.... etc.o o~t aU2.... 1 will finally move left to read a will check for proper occurrences of aCo, Col o.o. and will move left to read oC o Check 289

that o B, a~,.... k a. Then move right passing ckl9 ak2' aI, ac..., ao 11.... al and stop. Accept t. The complete process described above requires only a finite memory and therefore can be done by a finite state machine. Referring to the tape form in Figure 102 let the distance from the initial cell to ai on the left and on the right be xiL and xiR respectively. Any tape in Ak is governed by the relations yi > 1 for all i XlR = Y1 X2R = XLR + Y3 3R X2R + 1 X4R = X3R + 5 X(k-l)R = (k-2)R + 1 XkR = X(k-l)R + Yk+l XlL = Y2 X2L = X1L +1 X3L X2L + Y4 XkL = (k-l)L + 1 Consider any I-head machine (' that represents Ako Consider also the set of tapes in Ak such that yi > S j for all i. Call this 290

subset of Ak by the name A"o (V t)tA, Tt (t) > T (t) since I' if it represents Ak can go at most a distance < S, past each ai before reversing direction and discovering the value of ai+l Consider A' the subset of Ak that contains all tapes of Ak rk2 + k(k-2) such that yl > 2 and Y2' Y3. **' Yk+l = r where r = So + 1. Ak is an infinite subset of A and for any teAk, Tc,(t) > T (t) k k Ck CtV since Ak C Ak. But TO(t) = 2y + 2(y2 +1)+ 2(yl+y3+l) + 2(2+l+4+l) +........ + (yl+y3+l+y+l+... l+yk+i) > kyl + Y1. Thus T, (t) > ky1 + Y1 for all teAko Now T (t) = ma [XkR, XkL] But for all tEA' XkR = XkL + Y1 > XkL Thus 3()(t) YI + 3 + ++.. + Yk+l rk+k-2 = Y + 2 for all teAk. 291

So for all teA' kr2 T(t)-k T0)(X) > kY1+ + Y1 - kY + -2 rk2 + k(k-2) > Y 2 2 But if teA' then rk2 + k(k-2) Yl 2 and so T o(t) - kT ) > 0 or T ((t) > kT which proves the first part of the theorem. If Ak satisfies the theorem then Ak+22 for all & > 0 also satisfies the theorem. Therefore the number of sets of tapes satisfying the theorem for any particular k is infinite. To deduce part a) of the theorem one can argue that if CO is any 1-head machine that represents Ak then OU must at least go out to read ak on one end and then reverse and read out to ok on the other end. Thus for any teAk To(t) > min [2XkL + XkR 2XkR + XkL ] But for any teAk T (O ) t) =max [XkL, XkR1 Thus for all teAk To (t) - T ( (t) = min [XkL + XkR, 2XkL, 2XkR > 2k or TOC,(t) > T () (t) + 2k. QED 292

Theorem 5.13 Let" = {O1b.} be the set of all 1-head machines recognizing some set of 1-dim tapes Ao Let O5 (0o) be the 1-way 2-head equivalent of any machine in Qo o Then for any particular tape t in A there is a machine at in o such that To (to) < 3T (0) (to). Proof: (g0) is independent of which machine in o was used as its basis since all machines in Q0j have the same set of generatorso Let tocA and let T ( )(to) = x. t can be constructed to first check any tape t by reading left x cells and then right 2x cells - this gives OLenough information to decide if t and to have the same generator. If t has the same generator as to then Ot accepts t; if not CO moves x cells left (which returns its head to the initial cell) and then proceeds to examine t according to the procedure of any machine O0Lin jo By the construction of OL it is necessary that COVE 9 and that T (to) < 3x =3T (()(to)QED 293

6.6 TOPICS FOR FURTHER STUDY 6.6.1 Reduction Problems Among the possible criteria one can use as a measure of the complexity of n-head machines are three that arise naturally from the structure of n-head machines; namely, the number of heads, the number of states, and the speed in accepting or strongly rejecting inputs. Relative to these criteria three problems can be formulated:1) Head Reduction Problem: given a set of tapes T produce a machine with as few heads as possible that represents To 2) State Reduction Problem: given a set of tapes T produce a machine with as few states as possible that represents To 3) Speed Reduction Problem: given a set of tapes T produce a machine that represents T and that accepts or rejects inputs as quickly as possible. The above three problems, both in their most general form and in many special forms, constitute an area of almost totally unexplored questions. A collection of remarks and observations on these reduction problems follows below. 6.6.1.1 Head Reduction Two heads, hi and hj, of any machine Otwill be said to be bound if and only if hi and hj are on the same tape and if for all inputs to 0t there is a finite upper bound on the distance that ever exists between hi and hj. The bound property determines an equivalence relation on the set 294

of heads H of OL in that heads are in the same equivalence class if and only if they are bound to each other. It is a consequence of the bound property that if H is divided into p such equivalence classes then OL can be shown to be computationally equivalent to a machine with p heads (one head per equivalence class of H). However, no general method is known to determine if two heads are bound and further there is no guarantee that the p-head machine is indeed the minimum head machine equivalent to CtL One might try to show that for each i = 1,2,..oo there is a set of inputs Ci such that Ci can be represented byamachine with i-heads but no fewer. This is indeed the case if Ci equals some non-trivial set of i-tuples; thus to represent Ci any machine must have at least one head per tape or at least i-heads. In order to render the question more significant one might re-ask the question but restrict Ci to be a set of 1-dim tapes. It is the author's conjecture that the set Ci defined as the set of 1-dim tapes written over E = {B,O,1} and having generators of the form OCx x1 xl " X0 1 01 3.,.1oi-l0 1 O i- can be represented with no machine having fewer than i-heads. Certainly Ci can be represented by an i-head machine It is an interesting application of Minsky's paper that if the initial cell of every tape submitted to a machine is uniquely distinguishable then every set of m-tuples definable by a Turing machine is representable by an n-head machine with at most m+2 heads. This result follows from letting two heads in conjunction with the uniquely distinguishable initial cells of their tapes represent the total state transition of the Turing machine via Minsky and letting the remaining m heads be placed one head per tape and read and move according to the inputs and the state of the Turing machine. 295

6.6.1.2 State Reduction If one confines one's interest to 1-way machines then the classical reduction methods as introduced by Moore (42) suffice to yield the minimum state equivalent of any machine. The general problem for 2-way machines is, however, unsolved. Namely, given a representable set of inputs, no method is known for securing a minimum state machine to represent the set. Some remarks can be made about reducing the number of states in a given machine. All inaccessible states can be eliminated from any machine. All inaccessible transitions can be made "don't care" transitions. Further, given a machine Ot possibly with some don't care transitionsone can ignore the head movement associated with each transition and apply a conventional state reduction procedure to Olthus partioning the state set of O into equivalence classes of mergable states; given any two states in the same equivalence class one proceeds to merge them if and only if for any input the transitions leaving each state on that input have identical head movements. The above technique of state reduction never alters the number of heads in a given machine. In general the head reduction and state reduction problems are not independent - consider the machines 0 6.1 and a6.2 shown in Figures 103 and 105 respectively: 0-6.1 has 1-head and four states while 0C6.2 has four heads and one state (A and R are not counted here). L6.1 is a 1-way I-head machinein reduced form but 0t6 2 is a 2-way 4-head machine with fewer states than L 6.l Careful inspection will show that G( 0L61) = G( CL62) aa*bb*cc*B. 296

abc/c) /0,0,0,1 )i B Figure 103o Machine 0t 6. Otis~theft((rb, cc)/e,0, of!(aa, a, a)/O,1, 1, 1 i Figure 104o Machine OL 6.2 6.6.1.3 Speed Reduction If A is a set of tapes representable by some 1-dim I-head machine C 1 then via theorem 5.10 one knows that (0 1) is the fastest (or one of a set of the fastest) machine that recognizes any tape in Al. If A2 is a set of D-dim tapes representable by some n-head machine Ot2 then if G(OL2) is finite one can construct a machine Ot2 such that G(Ot) = G(OL 2) and such that no machine is faster than 012. [0t 2 will be provided with a 297

suitably large number of heads that will fan out from the initial cell of the tape such that after each machine cycle an increasing region of tape will have been scanned; if g is any generator in G(02) and if md(g) is the Manhattan distance to the cell of g farthest away from the initial cell then 0t2 will recognize g in md(g) machine cycles - no machine could do it faster If A is a set of D-dim tapes representable by some n-head machine a and such thaG(t G( ) is infinite then in general it appears that there 3 3 is no single machine equivalent to O 3 and which detects all gEG(0t3) faster than any ote ainther machine; rather it seems that for any machine st3 computationally equivalent to Ot3 there is another machine 0a3 such that for all inputs L3 is just as rapid as 0t3 and for some inputs Ot3 is more rapid. One might also expect that the state reduction and head reduction problems are not independent of the speed reduction problem. 6.6.2 Representability Problems In the synthesis theorems of Section 6.4 one was required to begin with a realizable RE; failure to do so resulted in an "improper" machine, ioe., a machine in which some of the states had several transitions leaving it on the same input, each transition having a different associated head movement. In general it appears that non-realizable RE's cannot be used as a basis for machine synthesis; however, some techniques can be tried in an effort to procure "proper" machines to represent sets of inputs based on non-realizable RE s. For example: If OL is the improper machine derived in an attempt to represent a set of inputs based on I, a non-realizable RE, 298

1) if one of the offending transitions goes to A then the remaining offending transitions that leave the same state as the transition going to A can be deleted from Ot without affecting T(OL); 2) if any offending transition can be shown to be inaccessible then that transition can be deleted from (O without affecting T(OL); 3) if the number of times the machine will pass through a state s from which offending transitions emanate is finite for all inputs then by expanding the number of heads and states of the machine one can construct a new machine Ot that is proper in regard to all transitions leaving s and equivalent to Ot [ (O) operates by dividing part of its head set every time it embarks on the offending transitions; a part of the set follows each transition; since OLpasses through s a finite number of times OG will have to split its head set at most a finite number of timesl; 4) if Of is finite then one can always construct a realizable RE PI such that P4f = Of, thus the machine Ot' based on I' will be equivalent to O; if Of is not finite one can still search for a realizable RE' such that 3' f = Pf in which case a machine derived from P' will be equivalent to the machine derived from 5o 299

6.7 SUMMARY Section 6 attempts to treat the problems associated with multiple head finite state machines. It begins, in Section 6.2, by (1) defining n-head machines, (2) defining the form of their inputs, and (3) prescribing the manner in which these machines accept and reject inputs. As defined in this section n-head machines are the same as classical single head automata, as understood by say McNaughton and Yamada, with the restrictions and additions that (1) there can be only two final states, namely ACCEPT and REJECT, (2) if the machine enters one of these final states it halts operation immediately, (3) each transition in these machines is specified by the present state of the machine and by the n-tuple of input symbols scanned by the heads, (4) each transition is accompanied by an n-tuple of head movements which need not be identical for all transitions in a given machine, and (5) the inputs are multi-dimensional tapes that in general can extend in all directions from the initial (or starting) cell of each tape. Resulting from these machines, ability to accept and reject inputs is the notion of using them to define sets of inputs depending on whether an input set is accepted or rejected by a particular machine. Section 6.2 develops the concept of generators as it applies to sets of defined inputs and shows that for each machine its generator set is equivalent to its set of defined inputs. It is evident from the examples included in Section 6.2 that n-head machines are more powerful than single head machines. It is further demonstrated that even with the restrictions that (1) n-head machines always 300

start with their heads on the initial cells of their input tapes and (2) all movements are one-cell-at-a-time-in-a-coordinate-direction, nevertheless the computational power of the machines is just as great as with machines that do not start on initial tape cells and whose head movements may not all be unit moves. Section 6.3 introduces a language which is later shown to be equivalent to n-head machines in its ability to define sets of tapes. The language presented includes the already well known language of regular expressions which has been augmented to include the newly defined operations of column alphabets, indexed alphabets, and the separation, fold and cover of tapes. These newly defined operations correspond in a natural manner to the structure of n-head machines - i.e.,column alphabets correspond to multiple heads, indexed alphabets correspond to the movements associated with each head, separation corresponds to several distinct heads working simultaneously, fold corresponds to 2-way D-dim head movements and cover corresponds to several distinct heads scanning the same tapeo In Section 6.4 anequivalence is developed in the form of twelve theorems between the input generators defined by n-head machines and particular expressions in the language of Section 6.3. The theorems constitute six analysis-synthesis pairs which treat n-head machines of various complexities beginning with 1-way 1i-dim 1-head machines and concluding with 2-way D-dim n-head m-tape machines. Aside from their academic value these theorems are useful in that given a desired set of generators if one can represent them by a suitable expression in the language then the synthesis theorems allow direct implementation of a machine possessing the given generators 301

Section 6.5 deals with a number of questions relating to n-head machines. It begins by presenting two algorithms - one to decide if a given n-head machine is 1-way, the other to decide if a given regular expression is realizable; both of these algorithms are necessary for execution of some of the theorems in Section 6.4. Section 6.5 develops a 1-way 2-head equivalent of every 2-way 1-dim 1-head machine. Note that under the assumptions of this paper a 2-way automaton is allowed to scan both sides of the initial cell; under this condition the fifteenth theorem of Rabin and Scott becomes invalid and is replaced by Theorem 5.1 of this section. The work of Rabin and Scott is extended in Section 6.5 to include all n-head machines. The results of Theorems 5.3 to 5.6 can be summarized as follows: The existence or non-existence of effective procedures to answer certain decision questions partitions the class of n-head machines into three categories as shown in Table 6. TABLE 6 THE EXISTENCE OF EFFECTIVE PROCEDURES FOR DECISION PROBLEMS Type of \ Machine 1-Dim D-Dim General \^^ 1-Head 1-Head n-Head Decision D > 2 Problem.._l Particular Particular Yes Yes No Input Problem Emptiness, State Accessibility and Transition Yes No No Accessibility Problems 302

Section 6.5 continues by presenting a number of theorems treating the Boolean properties of n-head machines and concludes with a number of theorems treating the relative speeds of computationally equivalent machines. The speed theorems are developed within the milieu of 1-dim machines. Some but not all of the speed theorem results can be extrapolated to multi-dimensional machines. The speed theorems can be paraphrased as follows: For each 1-head machine O working over 1-dim tapes there is a 2-head 1-way machine 3 (Ot) which is computationally equivalent to OL ~ 0 (0t) is always as fast as Ot and is faster than Ot if and only if O0 reverses or halts its head movement during examination of an input. There are sets of 1-dim tapes Al, A2, oo Aj,... such that if OLj is any 1-head machine defining Aj then (0lj) is faster than Otj for all inputs. Furthermore, the Aj can be defined such that for all inputs in Aj l (Oj) is faster than O. by an arbitrarily large difference and for all inputs J in some infinite subset of Aj a (Otj) is faster than O j by an arbitrarily large factor. For any set A of 1-dim tapes definable by 1-head machines and for any particular tape to in A there is a 1-head machine 0o that defines A and has the property that no machine that defines A is more than three times faster than Oo in recognizing too Section 6.6 contains some suggestions for further study. These suggestions lie in the areas of (1) head-state-speed reduction and (2) representability problems. A number of partial results are included with each suggestion. Some of the partial results are: 303

1) It is evident that the number of heads and states a machine has and the speed with which it recognizes inputs are not independent quantitieso The work of previous authors on these reduction problems has been confined to 1-way 1-dim 1-head machines; expansion of the field of inquiry to 2-way n-head machines seems reasonable and re-opens many questions considered answered for the 1-way case. 2) Given any set of inputs one can ask if an n-head machine exists that defines the seto Using the work of Minsky for direction one can conclude that if the initial cells of all tapes are uniquely distinguishable by machines - as they must be by us - then all sets of m-tuples of tapes definable by Turing machines are definable by finite state machines with at most m+2 heads. If, however, as this paper has assumed, the initial cell is not uniquely distinguishable by the machines then it is an open question in general as to whether one can decide given a set of inputs if an n-head machine exists that defines the seto 304

6.8 SOME PRACTICAL PROBLEMS WITH n-HEAD MACHINES The purpose of this section is to treat in an informal manner some of the practical considerations that came to T. F. Piatkowski's attention while he was conducting gedanken experiments on multiple head automata. The work of this section in no way summarizes nor replaces material presented in Sections 6.1 through 6.7, but rather complements the prior work and can most probably be best understood after a reading of those sections. The topics to be considered are six in number and in the order of presentation are: 1) The finite nature of real-life problems 2) Non-implication of Section 6.4 3) The advantage of end-marks on tapes 4) "Time" as a tape dimension 5) A consequence of touched heads 6) Application of n-head machines (two heads are better than one... sometimesI) 6.8.1 The Finite Nature of Real-Life Problems The abstract development of the theory of multiple-head machines as presented in Sections 6.1 through 6.7, presumes that, in general: 1) input tapes can be and, in fact, are infinite in extent, and 2) the reading heads of n-head machines can maneuver themselves arbitrarily far apart. In real life both of these presumptions are false and the negation of either one of them leads to a precise statement of the minimum number of heads needed 303

to recognize any real-life set of tapes. Consider the following arguments: a) While theoretical problems submitted to n-head machines may be arbitrarily large, real-life problems are finite; i.e., due to the finite life of machines and their operators, we are interested only in sets of tapes in which the size of the input tapes and the deviation of machine computation will be confined to ranges known a priori. Thus, any "real-life" set of n-tuples of tapes will have a finite number of generators and can be shown to be recognized with machine carrying n-heads or less (i.e., at most one head per tape). b) Since each head of any n-head machine is required to keep within communication distance of the central control mechanism of the machine and since practical limitations place an upper limit on such a distance (i.e., telephone wires can only be so long, or radio signals sent so far in a reasonable time, etc.), it follows that in all real life n-head machines, heads working on the same tapes are "bound" and that, consequently (via the results in Section 6.6), any "real-life" n-head machine can be replaced with an equivalent machine that requires at most one head per input tape. The above arguments indicate that the finite nature of real-life problems dictates that any "practical" problem, if solvable by any n-head machine, is solvable by some n-head machine with at most one head per tape. However, such arguments are true in theory only for real life also imposes a priori bounds on the number of internal states any practical n-head machine can 306

possess. This imposes a further restriction on the number of sets of tapes recognizable by practical n-head machines. We are therefore confronted with the result that. of all sets of inputs theoretically recognizable by n-head machines, only an infinitesimal subset are recognizable by real-life machines. Furthermore, constraints on the allowable separation between heads and the total number of integral states together determine the minimum number of heads one can employ to recognize a given set of inputs (in general, for computationally equivalent machines as the number of heads goes up the number of states goes down and vice versa). 6.8.2 Non-Implications of Section 6.4 Following the procedures presented in Section 6.4, one can produce for each n-head finite state machine a closed expression that precisely describes the set of inputs that particular machine recognizes. However, we cannot determine, in general, if the set of inputs any such expression represents is empty or not; that follows from the results of Section 6.5 which established that for 2-way multi-head or 2-way multi-dimensional I-head machines, the emptiness question is not answerable. It should be recognized, however, that our inability to discern if a given expression represents an empty set or not is not a fault of the language; indeed Section 6.5 substantiates that any language used to describe n-head machines must contain this inadequacy. 6.8.3 The Advantage of End-Marks on Tapes End-marks are special symbols, recognizable by some n-head machines, which are used solely to delineate the regions of the input tapes to which the head 307

movements are to be confined. When used correctly, each input tape will consist of a matrix (not necessarily a parallelepiped) of cells containing an input cell and surrounded by end-marks. Any machine recognizing end-marks will constrain its heads to remain within the region the end-marks encompass, i.e., when the head of such a machine encounters an end-mark it retreats backward into the permitted region. From the above remarks it is evident that n-head machines utilizing end-marks form a proper subset of all n-head machines. In contrast to the case for general n-head machines, it turns out that for all end-mark n-head machines, the "particular input" decision problems is answerable. This follows from the fact that for a given end-mark machine and given set of inputs (with end-marks), there is a finite computable number of combinations of internal state and head positions in which the machine can be; thus, one can always set an upper bound on the number of cycles the machine will take to accept or strongly reject a given input. It should be noted that, in general, the emptiness, state accessibility, and transition accessibility questions are not answerable for end-mark machines. This is proved in Theorem 19 of Rabin and Scott, the proof of which was for end-mark machines. 6.8.4 "Time" as a Tape Dimension There are two fundamentally different ways of presenting input symbols to the heads of any machine. The first is to let the heads scan over the cells of a spatial tape; the second is to present an input symbol to each head at regularly spaced intervals of time. In the first instance, the tape exists in a two-way manner in reading it. In the second instance, the tape exists in 308

time only and due to the irreversible manner in which time passes, the input is one-dimensional and the head movements one way. Thus, all n-head machines which read inputs in time (not space) are onedimensional one-way and, therefore, 1) need at most one head per input, and 2) always yield answers to the "particular input," emptiness, state accessibility and transition accessibility questions. 6.8.5 A Consequence of Touched Heads In Sections 6.1 through 6.7 for any machine the heads that work on the same tape oblivious of each other; i.e., any number of heads can occupy the same tape cell simultaneously and not be aware of each other's presence. Such a condition has some very practical drawbacks and we can very easily construct a model in which reading heads can detect each other's presence. In such a case, it turns out that every set of Turing definable n-tuples of tapes can be recognized with a multiple-head machine having at most m + 3 heads. This result follows from the work of Minsky and can be implemented by allowing three heads to simulate the internal state of the Turing machine in question and the remaining m-heads to read the m-input tapes, one head per tape. 6.8.6 Applications of n-Head Machines It is quite difficult to conceive of all the possible applications of nhead machines. Finite state machines are, in essence, pattern recognition devices and in the most general sense, they can be said to recognize the symmetries of regular expressions over column alphabets. These symmetries include all of the simple n-dimensional spatial symmetries such as symmetries with 309

respect to points, lines, planes, etc. This property of these machines suggests that n-head machines could play a useful role in studies concerned with pattern recognition. Piatkowski has not carried out any such studies nor has he any knowledge of others having done so using n-head machines. 310

7. REFERENCES 1. Moore, E. F., "Shortest Path Through a Maze," Annals of the Computation Laboratory of Harvard University, Harvard University Press, Cambridge, Mass., Vol. 30, pp. 285-92 (1959). 2. Lee, C. Y., "An Algorithm for Path Connections and Its Applications," IRE Trans. on Electr. Computers, Vol. EC-10, No. 3, pp. 346-65, September, 1961. 3. Loberman, H., and Weinberger, A., "Formal Procedures for Connecting Terminals with a Minimum Total Wire Length," ACM Proc., Vol. 4, p. 428 (1957). 4. Von Neumann, J., "Probabilistic Logics," Automata Studies, Princeton University Press (1956). 5 Miller, R. E., and Selfridge, J. L., "Maximal Paths on Rectangular Boards," IBM Journal, Vol. 4, No. 5, p. 479, November, 1960. 6. Wilcox, R., and Mann, W., Redundancy Techniques for Computing Systems, Spartan Books (1962). 7. Hald, A., Statistical Theory with Engineering Applications, Wiley and Sons (1952). 8. Holland, J., "Iterative Circuit Computers," Proc. W.J.C.C., May, 1960, p. 259. 9. Unger, S. H., "A Computer Oriented Towards Spatial Problems," Proc. IRE Trans., Vol. 46, p. 1749, October, 1958. 10. Newell, A., "On Programming a Highly Parallel Machine to be an Intelligent Technician," Proc. W.J.C.C., p. 267, May 1960. 11. Arden, B. W., Galler, B. A., and Graham, R. M., "An Algorithm for Translating Boolean Expressions," J. of the ACM, p. 222, April 1962. 12. Arden, B. W., and Graham, R. M., "On GAT and the Construction of Translators," Comm. of the ACM, p. 24, July 1959. 13. Bull Gamma 60 Reference Manual, No. 0943(6-10), publ. by Compagnie des Machines Bull, Paris. 14. Holland, J. H., "Iterative Circuit Computers," Proc. of the 1960 W.J.C.C., p. 259. 15. Schwartz, E. S., "An Automatic Sequencing Procedure with Application to Parallel Programming," J. of the ACM, p. 153, October 1961. 16. Slotnick, D. L., Borch, W. C., and McReynolds, R. C., "The Solomon Computer," Proc. of the 1962 F.J.C.C., p. 97. 311

17. Squire, J. S., and Palais, S. M., "Programming and Design Considerations of a Highly Parallel Computer," Proc. of the 1963 S.J.C.C. 18. Hu, T. C., "Parallel Sequencing and Assembly Line Problems," J. Operations Res., Vol. 9, No. 6, pp. 841-8 (1961). 19. Church, A., Introduction to Mathematical Logic I, Princeton University Press (1956). 20. Church, A., "An Unsolvable Problem of Elementary Number Theory, see footnotes, Amer. J. Math (1936), Vol. 58, pp. 345-63. 21. Davis, M., Computability and Unsolvability, McGraw-Hill (1958). 22. Detlovs, V. K., "Normal Algorithms and Recursive Functions," Doklady Akad. Nauk. SSSR (Proc. of the Acad. of Sci., USSR), Vol. 90, No. 3, pp. 249-52 (19). 23. Kleene, S. C., Introduction to Metamathematics, Van Nostrand (1952). 24. Markov, A. A., Theory of Algorithms, Acad. of Sci., USSR (1954). Translation available from U.S. Dept. of Commerce. 25. Smullyan, R. M., Theory of Formal Systems, Princeton University Press, Annals of Mathematical Studies 47 (1961). 26. Naur, P., et.al., "Report on the Algorithmic Language ALGOL 60," Comm. of the ACM, Vol. 3, No. 5, May 1960. 27. Henie, F. C., "Iterative Arrays of Logical Circuits," J. Wiley, New York (1961). 28. McCluskey, E. J., "Iterative Combinatorial Switching Networks-General Design Considerations," IRE Trans. on Electr. Computers, Vol. EC-7, 29. Henie, F. C., "Analysis of Bilateral Iterative Networks," IRE Trans. on Circuit Theory, Vol. CT-6, p. 35 (1959). 30. Holland, J., "A Universal Computer Capable of Executing an Arbitrary Number of Sub-Programs Simultaneously," Proc. E.J.C.C., p. 108, December 1959. 31. Amarel, S., Review of "A Universal Computer Capable of Executing an Arbitrary Number of Sub-Programs Simultaneously," by J. Holland, and "Iterative Circuit Computers," by J. Holland, IRE Trans. on Electr. Computers, Vol. EC-9, p. 384, September 1960. 32. Bauer, W., "Horizons in Computer System Design," Proc. W.J.C.C., p. 41, May 1960. 33. Squire, J., "A Comparative Study of Module Communications," Internal Report, The University of Michigan's Information Systems Laboratory (1962). 312

34. Carroll, A. B., and Confort, W. T., "The Logical Design of a Holland Machine," Internal Report, The University of Michigan, Electrical Engineering Department (1961). 35. West, G. P., and Koerner, R. J., "Communications Within a Polymorphic System," Proc. W.J.C.C., p. 225, December 1960. 36. Carlsen, R. A., Feingold, M. G., and Fife, D. W., "A Simulation of the AN/FSQ-27 Data Processing System," RADC-TR-61-254, The University of Michigan, Department of Electrical Engineering (1961). 37. Holland, John H.,"Outline for a Logical Theory of Adaptive Systems," ACM J., p. 297, July 1962. 38. McCormick. M. and Divilbiss, J. L., "Tentative Logical Realization of a Pattern Recognition Computer," Engineering Summer Conference Report, The University of Michigan (1961). 39. Harrison, M. A., "Electrical Engineering 467 Class Notes," The University of Michigan, Electrical Engineering Department, October 1961. 40. Kleene, S. C., Representation of Events in Nerve Nets and Finite Automata, Princeton University Press, Automata Studies, Annals of Mathematics Studies (1956). 41. McNaughton, R. F., and Yamada, H., "Regular Expressions and State Graphs for Automata," IRE Trans. on Electr. Computers, Vol. EC-9, No. 1., March 1960. 42. Moore, E. F., Gedanken-Experiments on Sequential Machines, Princeton University Press, Annals of Mathematics Studies, Automata Studies, (1956). 43. Rabin, M. 0., and Scott, D., "Finite Automata and Their Decision Problems," IBM J. of Res. and Development, Vol. 3, No. 2, April 1959. 44. Shepardson, J. C., "The Reduction of Two-Way Automata to One-Way Automata," IBM J. of Research and Development, Vol. 3, No. 2, April 1959. 45. Minsky, M. L., Recursive Unsolvability of Post's Problem of "Tag" and Other Related Topics, Annals of Mathematics, Vol. 74, No. 3, pp. 437-55 (1961). 313