THE UNIVERSITY OF MICHIGAN SYSTEMS ENGINEERING LABORATORY Department of Electrical Engineering College of Engineering SEL Technical Report No. 8 LECTED TOPICS IN AUTOMATA THEORY by Richard M. Karp September 1966 This research was supported by United States Air Force Contract AF 30(602)-3546.

Preface During the 1964-65 academic year the staff of the Systems Engineering Laboratory undertook a systematic review of automata theory and its applications, in an effort to identify significant research problems. This report represents a part of that study. The contents of this report were presented by the author as a course (Communication Sciences 622) in the Communication Sciences Program at The University of Michigan during the spring of 1965. The author wishes to acknowledge the helpful suggestions of the students in that course, and to thank Professors H. L. Garner and E. L. Lawler for their advice and encouragement. iii

TABLE OF CONTENTS Section Page 1. INTRODUCTION.................... 1 2. THE FUNCTIONS rp(x) AND res(x).......... 10 30 THE RESPONSE TREE; ACCESSIBILITY........... 16 4.m SEQUENTIAL CIRCUITS................. 23 5. RIGHT CONGRUENCES AND THE EQUIRESPONSE RELATION. 29 6. THE LATTICE OF STRUCTURE TYPES OF TRANSITION SYSTEMS. 40 7. SEQUENTIAL MACHINES WITH INPUT ALPHABET AI..... 57 8. SEMIGROUPS AND SIMULATION.............. 78 9. REGULAR EVENTS AND REPRESENTABLE FUNCTIONS..99 10. PAIR ALGEBRAS AND STATE ASSIGNMENT........ 124 11o LOOP-FREE DECOMPOSITIONS OF SEQUENTIAL MACHINES.. 183 12. TURING MACHINES AND RECURSIVELY UNSOLVABLE PROBLEMS. 213 13. READ-ONLY AUTOMATA............ 221 14. GRAMMARS AND LANGUAGES................ 248 References........................ 306 v

1. INTRODUCTION Automata theory is a young mathematical discipline dealing with conceptual models of information processing devices and methods. It is the purpose of this report to give a formal development of the properties of some of these models, and to discuss their relationships to some possible areas of application. Automata theory has its origins in several disciplines. One important model, the finite-state sequential machine, was originated by engineers seeking systematic methods for the analysis and synthesis of digital circuits. Other links with engineering include the similarity between communication channels and probabilistic sequential machines, and the use of abstract state-transition models in the study of control systems and sequential decision processes. There is a close relationship between formal generative grammars, devised by theoretical linguists and computer programmers to define the sentences of natural or artificial languages, and automata which distinguish sentences of a language from non-sentences. Finally, the Turing machine was invented in response to the inner needs of mathematics itself; Turing machines provide one satisfactory means of formalizing the intuitive notions of "effectively calculable function" and "effectively decidable proposition." We shall not attempt to give a general definition of the term "automaton." Instead, we shall illustrate the concept by informally discussing the two most widely studied models: finite-state sequential machines and Turing machines. We shall then point out some features common to these two models, and briefly mention other related models. 1

2 The most direct motivation for introducing sequential machines is to provide a means of describing formally the structure and behavior of sequential switching circuits. Since the definition of a sequential circuit requires some preliminaries, we shall postpone it, and shall instead consider a simplified model of digital computer operation. Consider a digital computer with a fixed, finite amount of internal storage (core memory, tapes, registers, switches, etc.) whose input data is in the form of a stack of punch cards. The computer proceeds by reading a card, performing some computations, (which amount to a change of its "internal state"), reading the next card, and so on until the computer receives an indication that all the cards have been read. The computer then prints out a single "record," consisting of some of the data contained in its memory. We can model this type of computer operation as follows: a finitestate sequential machine is specified by a sextuple (A, Q, A, q0' X, 5) where: AT is a finite set of input symbols Q is a finite set of internal states A., is a finite set of output symbols q0 C Q is the initial state is the transition function K: Q X A> - Q 5 is the output function 6: Q -> A. In modelling a digital computer, these entities may be interpreted as follows: A: the set of possible configurations of holes in a punched card

5 Q: the set of all possible internal configurations of the computer (Thus, for an IBM 7090, the approximate number 220 of states is (2).) A: the set of records that may possibly be printed out 0 qIO: the initial internal state of the computer X: If the computer is in the state q and reads the input a, its deterministic internal operations take it to the state X(q,a) just before reading the next card. 5: 5(q) is the record printed out if the computer arrives at state q. It is evident that certain important considerations about digital computers (the distinction between instructions and data, the subdivision into arithmetic unit, control unit, input-output units, and memory units with various characteristics, etc.) are not captured in this general model, and that the very large number of states is likely to preclude the use of the model for the exact description of any given computer. Still, it appears that general theorems about the computations that sequential machines represent could have implications concerning computers. The second model we shall describe also has bearing on digital computers, but it existed before computers did. Beginning in the 1930's several mathematicians (including Church, Kleene, Turing, Post, and Markov) grappled with the problem of defining what is meant by the term "algorithm", or "effective calculation procedure." Their definitions of an effectively calculable (general recursive, computable, K-definable) function are quite different in form, but the same in content. Turing's definition is based

on what are now called Turing machines. A Turing machine consists of a finite-state machine, together with a tape, ruled into a two-way infinite sequence of squares on which symbols may be written. The initial data, as well as all intermediate and final results of the calculation, are stored on the tape. The way in which the tape is read and the writing of symbols on the tape are controlled by the finite-state machine and the data it reads from the tape. A formal definition is as follows: a Turing machine is a sextuple (A, Q, qO, D, ya, f) where A is a finite set of at least two symbols Q is a finite set of internal states qO c Q is the initial state D = (LCR is the set of directions of tape motion (d is a "halt" indicator f is a function f: Q X A- ( U U A XQ > D). One symbol, bcA, is designated as a "blank." When a Turing machine begins its operation, with some initial data on its tape, all but a finite number of squares contain the symbol b, and the rightmost "non-blank" square is being scanned (If every square is blank, it doesnvt matter which square is being scanned.). Suppose, at some point in its operation, the Turing machine is in state q reading a square containing the symbol a. Then, (a) if f(q,a) = C, the machine halts (b) if f(q,a) = (a', qt, d) the machine i) replaces the symbol a in the square being scanned by a' ii) enters the state q

5 iii) reads the square to the right of the present one if d' = R; reads the same square if d' = C; reads the square to the left if d' = L. A Turing machine may be considered to represent a function. This function has the initial tape configuration (i.e. the unique shortest sequence of consecutive symbols bounded by the leftmost and rightmost non-blanks) as its argument, and the final tape configuration as its value; the function is undefined for initial tape configurations such that the Turing machine does not halt. With suitable conventions, a Turing machine can also be considered to represent a partial function f(x1,x2:,....,xn) whose arguments and values are nonnegative integers. To compute f(al,a2,...,an), put the initial configuration al b a2 b... b an on the tape, where beA, leA, and a represents a string of a+l l's. The rightmost 1 is initially being scanned. If the Turing machine halts, then f(al,a2,...,a) is equal to the number of occurrences of 1 in the final tape configuration; if the Turing machine does not halt, f(a,a2,...,an) is undefined. It is generally felt that the Turing machine (as well as the other equivalent formulations) gives a satisfactory solution to the problem of formalizing what is meant intuitively by an "effectively calculable Sunction." Thus, it is an accepted proposition, but not a provable one, that any function that can be calculated at all can be calculated by a Turing machine. This proposition is called Church's thesis. The finite-state sequential machine and the Turing machine have several properties in common. Both operate on symbols drawn from finite

6 alphabets, rather than symbols drawn from a continuum of values such as the real numbers. Thus, both are digital machines. Their basic operation steps, including the reading and writing of symbols, changes of internal state, and tape motion, occur in a discrete sequence, rather than continuously. Thus, they are discrete-time machines. Each basic operation depends on only a finite number of symbols, and affects only a finite number of symbols; the machines are therefore finitary. The sequence of steps, and the final results, are implicitly determined by the initial data. Thus, they are deterministic. The main respect in which finite-state machines and Turing machines differ is in the amount of memory they have The memory of a finite-state machine, as measured by its number of internal states, is fixed and finite. On the other hand, the tape of a Turing machine is available for the storage of intermediate results, and, usually, no a priori upper bound on the amount of tape a Turing machine will use can be given. Since, at any given time, the total amount of tape used is finite, it is misleading to call a Turing machine an infinite automaton; the terms "growing automaton" or "potentially infinite automaton" are preferred. Our main concern in the present report will be with digital, discretetime, finitary, deterministic automata. Accordingly, the main mathematical tools used will be algebra, combinatorial analysis, and one or two results from computability theory, rather than analysis and probability theory. Other types of automata are certainly of interest also. For example, the dynamical systems of classical mechanics are analog, continuous-time, deterministic automata; also, communication channels may be viewed as stochastic automata.

7 Most of the report will deal with finite-state sequential machines. We will develop characterizations of the functions representable by finitestate machines, methods of analysis and classification of such machines, and methods of realizing machines by interconnection of smaller machines We then turn our attention to automata more powerful than finite-state machines, but less powerful than Turing machines. Since we accept Church's thesis, all the automata we use as models of information processing systems will be no more powerful than Turing machines; note, however, that we may find it technically useful to define automata which represent noncomputable functions (infinite-state sequential machines, for instance), but whose operation is not finitary, since it involves look-ups in infinite tables. One simple example of such an intermediate automaton is a finite automaton which can look back and forth through its input tape (unlike a finitestate machine proper, which reads through the input sequence in single-pass fashion), but cannot write or erase symbols on the tape. Another variant is a linear bounded automaton, which operates like a Turing machine, except that it may write in or erase only those squares which originally contained symbols of the input sequence. A third type of machine is the pushdown automaton, which is a finite-state machine together with an infinite tape called the "pushdown" which is used subject to the following restrictions: i) Initially there is only one square on the tape which is not blank; it contains a special symbol i, and is designated as the top of the pushdown.

8 ii) The only square that can be read is the top of the pushdown; upon being read it is erased (made blank), and the square to its left becomes the top. iii) The only square that can be marked (written on) is directly to the right of the top; upon being marked, a square becomes the top. When one studies the capabilities of pushdown automata and related intermediate automata, one discovers strong connections with the phrase structure grammars and associated phrase structure languages of mathematical linguistics. We can indicate the nature of these connections by means of an example. Consider the artificial language whose "sentences" are sequences of ags and bIs such that: i) the total number of b's exceeds the total number of a's by exactly 1; ii) in any suffix of the sentence (obtained by deleting an initial sequence of symbols) the number of b's exceeds the number of a's by at least 1. Some sentences: Some non-sentences: b a abb bab ababb abbba The set of sentences forms a so-called context-free language, which can be generated by the following context-free grammar: A - b A - aAA A derivation in this particular grammar is a sequence cO'Ple fcp-, n of

9 words composed of symbols a,bA such that c0 = A, cpn does not contain A, and, for i=O,l,...,n-l, pi+l is obtained from cpi by replacing an occurrence of A in cpi either by b or by aAA. Example: CPo A CP1 aAA P02 aAaAA cp3 abaAA cp4 ababA (p5 ababaAA (P6 ababaAb Cp7 abababb The set of sentences can also be recognized by a pushdown automaton which reads an input sequence of a's and b's from right to left and: i) if b is read, a 1 is written in the pushdown ii) if a is read, one symbol is read from the pushdown iii) after the entire string has been read, two additional symbols are read from the pushdown, An input sequence is recognized as a sentence if and only if each symbol read from the pushdown is a 1, except for the last symbol, which is a blank. This example gives an indication of the connection between phrasestructure-languages and sentence-recognizing automata. More precise connections will be developed later in the report.

2. THE FUNCTIONS rp(x) AND res(x) In this section we present some pictorial representations of finitestate sequential machines, and define and give some elementary properties of the functions rp(x) and! res(x), which characterize the structure and behavior of a sequential machine. Our development will draw freely on [8] and [10. Definition. A sequential machine is a system ~ = (AI, Q, Ao, q0, X,5) where: A is a finite nonempty set (input symbols) Q is a nonempty set (internal states) A is a finite nonempty set (output symbols) 0 q0 Q (initial state) X is a function X: Q,>A -A Q (transition function) 6 is a function 6: Q -, A (output function) A sequential machine is said to be finite-state if Q is a finite set. Although we are interested mainly in finite-state machines, some of our early results will hold for infinite-state machines as well, The definition we have given is known as the Moore model of a sequential machine. An alternate model, the Mealy model, is the same except that the output function 6 has domain QeAT and range Ao. All 0 results about the Moore model translate into closely analogous results about the Mealy model; the Mealy model will not occur in our formal development. A finite-state sequential machine can be specified by tabulating the functions X and 6, 10

11 Example 1 AI = j0,1] Q {= (qO'ql'q2,q3} A = (a,b}. 0 0 1 q ql q a q0 q1 q0 a 1 q2 q 1 q2 q2 q b q3 q2 q q3 b Table for X Table for 5 An alternate representation employs a rooted, directed, labelled graph called a state diagram. The nodes are in one-to-one correspondence with the states. The node for state q is labelled q/5(q). If X(q,a) = q', then there is a branch labbelled"a" directed from the ~node associated with q to the node for q'. The node associated with q is the root of the graph, and is given a distinguishing mark. The state diagram for the machine of Example 1 is: 2i(

12 From the state diagram, it is possible to visualize the response of the machine to sequences of inputs. For example, the input sequence 0010 takes the machine from the initial state successively to q1,q2'q3, and q2. The output produced at the end of this sequence is 6(q2) = b. In general, for the given machine, a sequence produces the output a if and only if the sequence doe:s not contain two consecutive O's, or it contains two consecutive l's after the last pair of consecutive zeros, Let us introduce some terminology, in order to be able to deal with the response of a machine to input sequences. Let A* denote the set of all finite sequences of elements of AI; including the null sequence, denotedby e. Sometimes AI is referred to as the input alphabet, and the elements of A are called words in the input alphabet. Let AI denote the set of non-null sequences of elements of AI The length of a sequence x, denoted lg(x), is the number of symbols in the sequence x. Then lg(bl,b2 * ) br r lg(e) = 0. The concatenation of x and y, denoteed xy, is the sequence consisting of x followed by y; for example, if x = b and y c c 1- r r s then xy = b.. b c.. c. Note that 1 r 1 s xe = ex = x lg(xy) = lg x + lg y (xy:)z = x(yz) The algebraic system (At, concatenation) is a semigroup, and the

13 system (AI, concatenation) is a semigroup with identity (monoid). Let us recall the definitions: A semigroup is an algebraic system (S,.), where S is closed under the associative binary operation *. The element eES is a (two-sided) identity element if, for all seS, es = se = s. A monoid is a semigroup with a two-sided identity element. Remark. A semigroup has at most one (two-sided) identity element. The system (AI, concatenation) is called the free semigroup generated by AI, and the system (A, concatenation) is the free monoid generated by Al. We can now extend the domain of the transition function X to Q X: The definition of the extended function is recursive/ For x E A a E A; I I X(q,e) = q X(q,xa) = X(X(q,x),a). An alternate recursive definition is: X(q,e) q X(q,ax) = X(X(q,a),x). Lemma 2.1 The two recursive definitions are equivalent. PROOF. We use induction on lg x. Basis. The result certainly holds if lg x < 1. Induction step. Suppose the result holds whenever Ig x < r. Consider x = a 2 ao. a Then

14 \(x(qLal), a2.. ar) x(x(.(q,al), a2. ar1.), ar),\(\.(q, al... ar), ar). Q.E.D. Lemma 2.2 For all q c Q, and for x e Ai, y E Ai, \(q,xy) = \(k(q,x),y). The proof, which we omit, is by induction on lg x. We define the response function rp(x) as follows: for all x E AI, rp(x) = %(q0,x). In other words, rp(x) is the state the machine reaches upon application of the input sequence x. The domain of this function is * AI, and its range is Q. Corollary 2.1 rp(xy) - \(rp(x),y) Corollary 2.2 If rp(x) = rp(y), then, for all z c AI, rp(xz) = rp(yz). Another important function is the result function res(x): A1 - A. This function is defined by: res(x) = 5(rp(x)). This function gives the output obtained upon application of the input sequence x. The function res(x) is referred to as the function represented (defined) by the sequential machine 7/. A basic question concerning the behavior of finite-state sequential machines is: what are necessary and sufficient conditions on a function f in order that it be represented by some finitestate sequential machine?

15 The proviso "finite-state" in this question is important, since every function f: A - Ao is represented by some infinite-state machine. A machine representing a given function f is obtained as follows. The input alphabet is AB, and the output alphabet, Ao. The states are in one-to-one correspondence with the elements of A,. The $tate corresponding to the word x is denoted < x >. The initial state is < e >, and the transition and output functions are defined as follows: (<x >, a) = <x a > (< x > ) = f(x). It is easy to prove by induction that, for all x, rp(x) = <x >. Then res(x) = 5(rp(x)) = (< x > ) = f(x), as desired.

3. THE RESPONSE TREE; ACCESSIBILITY A convenient way to represent A4 pictorially is by means of the free tree generated by A1. This is a rooted infinite directed tree, in which the branches directed out of any node are labelled in 1-1 correspondence with the elements of A,. Part of the free tree for A, = (0,1} is shown below: 0 1 0 1 0 1 0 1 1 There is a one-to-one correspondence between A and the set of nodes; if xeAI, let its corresponding node be n. Then the root is ne, and, if x = ala2.. a, n is reached by following the successive branch n x labels a, a2,...,a up from the root. The response tree for a sequential machined with input alphabet AI is obtained by labelling each node n with the state rp(x). For the infinite-state machine discussed x at the end of the last section, rp(x) = < x >, so n is given the label x < x >. For this particular machine, but not in general, the response tree is a state diagram, except that the outputs are not shown. For the machine of Example 2.1, a part of the response tree is: 16

17 12 q3 ql qo q2 0o ql o ql 2 q 0 21 qo A state q is accessible if, for some x, q = rp(x). If q is accessible, let d(q) = min lg(x). ({ |rp(x)=q} Evidently, q occurs as a label in the response tree for 72 if and only if q is accessible; the label q will appear at the d(q)t "level" of the response tree, and will not appear at any earlier level. In general, the response tree for a machine contains redundant information, since the transitions out of a given accessible state may be exhibited more than once. This redundancy is eliminated in the pruned response tree for A, which is constructed as follows. Starting with the root the tree is generated level-by-level, and from left to right within a given level. Each node already generated is examined, but the branches out of a node, and its successors at the next level, are generated if and only if the label of the node has not been encountered before. A unique left-to-right ordering of the nodes at a given level is induced by putting Ar in some alphabetic order; the branches out of a

18 node are arranged left to right in ascending alphabetic order of their labels. Example 1: The pruned response tree for the finite-state machine of Example 2.1 is: q2 o0 Cp 2O 0 Xq0 X 0 The property that every accessible state is a node label is preserved even in the pruned response tree. Lemma 1 The node label q occurs in the pruned response tree for if and only if q is accessible. If q is accessible, the shortest path in the response tree from the root to a node labelled q is of length d(q).

19 PROOF: If q is not accessible, then there is no x such that q = rp(x), and therefore no node is labelled q. To complete the proof, we show by induction on d(q) that, if q is accessible, there is a node labelled q, and that the shortest path from the root to such a node is of length d(q). Basis. d(q) = O; then q = qO = rp(e). The root is labelled qo. Induction step. Suppose the result holds for d(q) < r. Consider q, with d(q) = r. Then there exists an ordered pair (q',a) such that d(q') = r-l and.(q',a) = q. By the induction hypothesis, there is a node n labelled q' at a distance r-l from y the root. Therefore, the first node labelled q' to be encountered in the construction of the pruned response tree is at a distance r-l from the root (the distance is < r-l, since it is encountered earlier than n in the construction process, and y it cannot be < r-l since, if rp(z) = q', lg(z) > r-l); let this first node be n. Then there are branches out of n in the z z pruned response tree, which therefore contains the node n za r But rp(zar) = X(rp(z),ar) = (q,a) = q, and the node n is r at a distance r from the root. Finally, the by definition of d(q), no node n at a distance < r from the root can be labelled y q. Q.E.D. Lemma 2 Let 7 have. n accessible states. Then, if q is accessible, d(q) < n-l.

20 PROOF: Let x be a word of minimum length such that rp(x) = q. Suppose x can be written as x = u v w where v f e and rp(u) = rp(uv). Then rp(uw) = k(rp(u),w) = \(rp(uv),w) = rp(uvw) = q, contradicting the minimality of x. If this contradiction is to be avoided, all the prefixes of x (including the null sequence) must give different responses, and each response must be an accessible state. Since the number of such states is n, x may have only n prefixes, and lg(x) < n - 1. Q.E.D. The pruned response tree for a finite-state machine can certainly be constructed with a finite number of operations. For, if AT has k elements, and the number of accessible states of a is n, the response tree has kn branches, and, as an easy consequence of Lemma 2, every node is at a distance < n from the root. Thus, we have an algorithm for determining the accessible states of a finite-state machine. Stating this result more precisely, we claim: Corollary 1 There is an effective procedure for determining the accessible states of a finite-state sequential machine. To prove this corollary quite rigorously, we would have to define a Turing machine to carry out this procedure, given a conventional representation of a finite-state machine as input data. We shall not be so fussy, and shall accept"effective procedure"- more intuitively defined. More care will be required, of course, when we prove the nonexistence of algorithms for certain decision problems. Some other results now follow quite easily.

21 Corollary 2 Let 7% be a finite-state sequential machine, and let beA. 0 There is an algorithm to determine whether there exists xc4 such that res(x) = b. PROOF: The algorithm is as follows: list the accessible states of 7k, and determine whether any accessible state q has 5(q) = b. Q.E.D. Corollary 3 Let 7A = (A, Q, Ao, qO,', ) and X~X' = (AT, Q', A, q$, C', 5') be finite-state machines. There is an algorithm to determine whether there exists xA such that res res) = res (x). PROOF: Let >h = u < X't, the direct product of ~y and 1t', be a finite-state machine defined as follows: 7 = (A', Q X Q', A A, (>qOq,), A, ) where f((q,q,'),a) = (x(q,a), X'(q',a)) and (qq') = (5(q), 56(q)).. The desired algorithm consists of determining the accessible states of, and checking whether there is an accessible state (qq') such that 5(q) = 5'(q'). Q.E.D.

22 Since the inaccessible states of a machine play no part in the determination of the functions rp(x) and res(x), they can be ignored for most purposes. In what follows, we shall sometimes restrict our attention to machines in which all states are accessible.

4. SEQUENTIAL CIRCUITS One of the principal reasons for studying sequential machines is their usefulness in the analysis and synthesis of synchronous sequential circuits. In this section we present a moderately realistic model of a certain class of sequential circuits, and establish a connection with sequential machines. A sequential circuit is a finite collection of interconnected elements of two types: gates and latches. A gate with I inputs and m outputs is depicted below. Each input and output is a function of a real variable t (time) x1 xY'' —'' Ym and has 3 possible values: 0, 1/2, 1. Whenever an input is 1/2, we assume that the outputs are undetermined. If we were modelling a physical circuit with voltages as inputs, the interpretation might be as follows: 1: a voltage above a given threshold V1 0: a voltage below a given threshold V0 1/2: a voltage between V0 and V1, for which the response of the circuit is uncertain. 23

24..-' 1., —bD ~~. V1 time x(t)=0; x(t(t)=1/2 x(t)=l time1 A switching function of I variables has as its domain {0,1}) (1-tuples of O's and l's), and, as its range, (0,1). An 2-input m-output gate is specified by: a) m switching functions of i variables, f (x,x, X. )x i=l,...,m. b) a response time A. The behavior of a gate is determined as follows: let (ala2...,a,) E (0,1}(2). If, for t < rT t + A, and for j=l,2,...,2, Xj(T) = aj, then, for all i, Yi(t+A) = fi(al,a2,...,a). In all other cases, the outputs are undetermined. There is a special input signal to the circuit, called 0, which is a clock signal of width e. The graph of the function 0(t) is of the form: 0E I I E I I E tl t' tt ( —-? ~~~ 3 — --

25 In other words, 1 if, for some j, t. < T <t. + 0 (T) = 0 otherwise. A latch with response time pi has e as an input, one additional input x, and an output y. If, for t. < T < t. + e, X(T) = a, where a e (0,1), then, for t. + E + 1 < T < tj+l + c, y(T) = a. In all other cases, y is undetermined. The element E is connected to F (denoted E - F) if some of the outputs of E are identified one-to-one with inputs of F. A cycle is a E ----------— >. F -- sequence of elements E1,E2,..,E such that E1 - E2... E, - E - E1. We make the following interconnection assumption: every cycle includes a latch. With this assumption, a sequential circuit takes the following form:

26 xy... x l pI Yq'LATCHES CYCLE-FREE GATE 0... NETWORK X qI z'W.... z!! *... I z1 1 r A cycle-free gate network is sometimes called a combinational circuit. With regard to its behavior, the entire cycle-free gate network may be represented as a (p+q)-input, (r+q)-output gate, with a response time A which is equal to the maximum, over all directed paths in the gate network, of the sum of response times of the elements in the path: Example 1: x1 \ X Y \Y21 /Y3 X3 \ 4 5 x 7 Wy y 3 ix xf

27 If each element has the same response time A, then all the outputs will be determined if the inputs are held at constant binary values for a period 3A. Let the switching functions associated with the "equivalent" gate be divided into an r-tuple F = (fl, f,..f ), determining Zl,...,,z and a q-tuple G = (qlq2,.,q ) determining w...,w. We shall also make the following timing assumptions: i) there is a constant [ > A such that, for all j, t. - tj > E + i + w, where p is the maximum j j-1 response time of any latch in the circuit. ii) a fixed binary input p-tuple X(j) is applied to the gate network throughout the interval [tj+i - d, tj+ + ]. iii) in the period [t, - q, tl + e], the outputs of the latches are given by a fixed binary q-tuple Y(O). It follows by induction that, for j > 1, the outputs of the latches assume fixed binary values Y(j) in the interval [t. + A, tj+l + E], and that the outputs Zl,...,zr assume fixed binary values given by Z(j) in the interval [t+ - + t + ], where Z(j) = F(X(j), Y(j)) Y(j+l) = W(j) = G(X(j), Y(j)). Perhaps a picture will clarify the timing.

28 C, I, 7 7 0 __III J tj+l,., Y(.i) x(j), Z(.), z(j) * Lw(.j) Thus, because of the interconnection and timing restrictions we have imposed, the essentials of the behavior of the circuit are reduced to difference equations, giving a digital, finitary, discrete-time, deterministic representation. In fact, the circuit can be represented by a Mealy-model sequential machine = ({0o,1}(P), 0,1}(q), {0,1}(r), Y(), G, F). The transition. and output functions are determined by the combinational circuit, and the internal state is "stored" in the outputs of the latches. If the functions F = (fl,...,f ) are independent of xl,...,xp, a Moore model representation is obtained. It is possible to show that, in a sense to be defined later, every sequential machine can be realized by a sequential switching circuit. Later, when we discuss state minimization and decomposition of sequential machines, we shall comment on the implications for sequential circuit design.

5. RIGHT CONGRUENCES AND THE EQUIRESPONSE RELATION In this section we derive necessary and sufficient conditions for a function f: A4 Q to be the function rp(x) for some sequential machine with input alphabet AI and the state set Q. In the course of our development we introduce the concept of a transition system (sequential machine without outputs), define the equiresponse relation of a transition system, and discuss the lattice of equiresponse relations (right congruences) over A. First, we review some important concepts. A relation R over a set S is a subset of S X S. The statement (a,b) E R is sometimes written aRb. A relation R over S is: reflexive if, for all aeS, aRa symmetric if aRb. bRa antisymmetric if aRb and bRa > a=b transitive if aRb and bRc > aRc. A relation which is reflexive, symmetric, and transitive is an equivalence relation. Associated with an equivalence relation - over S is a unique partition of S into equivalence classes, such that a=b if and only if a and b are in the same equivalence class. An example of an equivalence relation over the integers is congruence modulo n; the equivalence classes are the residue classes modulo n. An equivalence relation over the set of American citizens is the relation "A lives in the same state as B;" in this case, there are fifty equivalence classes (if we exclude citizens living abroad, etc.). The rank of an equivalence relation is the number of equivalence classes. An equivalence relation is of finite 29

30 rank if the number of equivalence classes is finite. A relation which is reflexive, antisymmetric, and transitive is a partial ordering relation. An example of a partial ordering relation over the subsets of a set is S1 C_ S. Let (S,') be a semigroup. An equivalence relation E over S is called a right regular equivalence relation or a right congruence if x = y — > for all w e S, xw - yw; left congruence if x - y --- for all u e S, ux uy; two-sided congruence if x - y --- for all uw e S, uxw - uyw. Equivalently, - is a two-sided congruence if and only if it is both a right congruence and a left congruence. Two-sided congruences over semigroups are closely related to the homomorphisms of semigroups. In general, a homomorphism is a mapping from one algebraic system onto (or into) another, such that certain relationships among elements are preserved; i.-e., if the relationship holds for the original elements, it holds for their images. Let (S,~) be a semigroup, and let (T,*) be a multiplicative system (T is closed under the binary operation *.). A homomorphism of S into} onto into T is a function p from S {n} T such that: onto for x c S, y: cS, cp(x.y' =:. cp(x)* p(y)'. Here, the-relationship preserved among x,y, and z is this: if x y = z, then (x)-* c(y) = cp(z).

31 Example: The function c(x) = lg(x) is a homomorphism from (A*, concatenation) onto (nonnegative integers, +). A 1-1, onto homomorphism is an isomorphism; the inverse of an isomorphism is an isomorphism. If a homomorphism exists from a semigroup (S,') onto a multiplicative system (T,*), then (T,*) is also a semigroup (i.e., * is associative): (cp(x)* cp(y))* cp(z) = p(x-y)* cp(z) = cp(x.yz); cp(x)*(cp(y) cp(z)) = cp(x)*(cp(yz))= cp(x.y z). Any two-sided congruence p over S determines a homomorphism of S. For any xeS, let [x] denote the equivalence class of x with respect to p. We define S/p, the factor (or quotient) semigroup of S modulo p as: ([x], xcS), ~), where [x] o [y] = [x.y]. To -show that the operation ~ is well defined, it is necessary to show that its result does not depend on which representations of the equivalence classes [x] and [y] are considered. Thus, suppose x1 C [x] and x2 c [x]; therefore x1 p x2. Also, Y1 e [y] and Y2 E [y]; therefore yl p Y2 Then xlY1 p xlY2, since p is a left congruence, and xlY2 p x2y2, since p is a right congruence. Therefore, by transitivity, xlY1 p x2y2. It is now evident that o is well defined, and that the function x - [x] is a homomorphism of S onto S/p.

32 Conversely, let cp be a homomorphism of (S,') onto (T,*), and let the relation be defined as follows: x - y p cp(x) = p(y). Lemma 1 The relation - is a two-sided congruence, and T is isomorphic with S/I. PROOF: First we show x y I for all u,w, uxw 2 uyw: cp op x - y =>== CP(x) = Cp(y); p(uxw) = p(u)* p(x)*) p(w) = p(u)* p(y)* C(w) = p(uyw); therefore uxw - uyw. The isomorphism from S/I onto T is f: [x] ->cp(x), where [x] is the equivalence class of x with respect to =. We must show that f([x] o [y]) = f([x])* f([y]). But f([x] 0 [y]) = f([x.y]) = p(x.y), and f([x])* f([y]) = (p(x)* cp(y) = p(x.y). Thus we have established the following result.

33 Theorem 1 Every homomorphic image of a semigroup (S,.) is isomorphic to the factor semigroup with respect to some two-sided congruence, and every factor semigroup (S/p,o) is a homomorphic image of (S,.) With this introduction to right, left, and two-sided congruences, we may return to the question of characterizing response functions (i.e., determining the possible ways in which the response function may classify the elements of AJ). The answer to this question will involve the right congruences over A,. Since the function rp(x) does not involve outputs, it will be convenient to consider transition systems (sequential machines without outputs). A transition system is a quadruple T = (A, Q, qo, x), each element of which is defined exactly as in the definition of a sequential machine. Since rp(x) ranges only over accessible states, we shall further restrict attention to transition systems with every state accessible. The equiresponse relation.- of a transition system T is an equivalence relation over A, defined as follows: x J y — > p(x) = rp(y). Thus, _ is of finite rank if and only if the number of (accessible) states of _' is finite.

34 Lemma 2 For any transition system T, the equiresponse relation is a right congruence over A. PROOF: x 1 Y == rp(x) = rp(y) - for all w e A, rp(xw) = rp(yw) ==~ for allw C A,;, x yW > for all ~ Ai,. X w |yW. Lemma 3 Let p be a right congruence over AI. Then p is the equiresponse relation of a transition system. PROOF: For x C AT, let [x] be the equivalence class associated with p containing x. Consider the transition system T(p) = (AI, ([x], x E AI,[e], A), where k([x], a) = [xa], The function k is well defined by this equation since X1 P x2 - xla p x2a; i.e. [xl] = [x2] [xla] = [x2a]. It is easy to prove that, for all x, rp(x) = [x]. Therefore, x _ y < > rp(x) = rp(y) "- > [x] = [y] 4 =, -a>x p y.

35 Theorem 2 Every equiresponse relation is a right congruence over A1, and every right congruence over A, is an equiresponse relation. Thus, we have a complete characterization of equiresponse relations, and, therefore, of the possible functions rp(x). It is interesting to compare right congruences (equiresponse relations) with regard to how finely they classify input sequences. Let R1 and R2 be right congruences over A. Then R1 < R2 x R y - X R2 y. Thus, when R1 < R2, each equivalence class for R1 is contained in an equivalence class for R2. It is easy to verify that < is reflexive, antisymmetric, and transitive. Thus, it is a partial ordering of f, the set of all right congruences over AI; equivalently, we say that (~, <) is a partially ordered set. We shall establish the stronger result that (V, <) is a lattice. Let us review the definition of a lattice. Let (S, <) be a partially ordered set, and let x and y be elements of S. The element z C S is the greatest lower bound of x and y (denoted glb(x,y)) if a) z < x and z < y, and b) if w < x and w < y, then w < z. Similarly, v is the least upper bound of x and y (denoted lub(x,y)) if

36 a) x < v and y < v, and b) if x < u and y < u, then v < u. A lattice is a partially ordered set in which any two elements have a glb and a lub. Example 1: The partially ordered set (S, <) where S is the positive integers, and x < y if x is a divisor of y, is a lattice. In this lattice, lub(x,y) = lcm(x,y), and glb(x,y) = gcd(x,y). Example 2: The partially ordered set (S, <), where S = (a,b,c,d), and the relation < is ((a,c), (a,d), (b,c), (b,d)}, is not a lattice. Theorem 3 The partially ordered set (X/, <) is a lattice. PROOF: If R1 E f and R E /, we shall explicitly define P = glb(R1,R2) and Q = lub(R1,R2), and prove that P and Q have the required properties. We define P as follows: x P y < - x R1 y and x R2 y It is evident that P is an equivalence relation. Also, P is a right congruence:

37 x P y x R1 y and x R2 y V w,'xw R1- yw and xw R2 yw =- *-V w, xw P yw. Also, it is clear that x P y - -x R1 y and x P y > x R2 y, so that P < R and P < R2. Now, suppose R < R1 and R < R2. Then x R y' - x R1 y and x R y x R2 y; thus x R y > x P y. This completes the proof that P = glb(R,R). It is tempting to define Q lub(R1R2) by x Q y = x R1 y or x R2 y < i x(R1 U R2)y. Unfortunately, this is not a transitive relation. For example, take x R1 y if lg(x) - lg(y) mod 2 and x R2 y if lg(x) lg(y) mod 3 2 2 5 Then, if a e AT, e R1 a and a R a, but it is neither true that e R1 a nor that e R2 a5 n (Here,: a denotes the concatenation of n a's..) Instead, we must define Q as the transitive closure of the relation R1 U R2: x Q y < 3 z2,,... zr such that x=zl, Y=Zr, and zl(R1 U R2)zi+l for i=1,2,...,r-1. ZlZ2..,z.ir Then Q is an equivalence relation; to check transitivity,

38 for example, note that, if x is chain connected to y by the sequence Zl z2',..'zr and y to z by the sequence wl,w2,...,ws, then x is chain connected to z by the sequence Zl,z2,...,zrj. w2,..,w. lAlso, Q is a right congruence; for, if x is chain connected to y by z!,z2,...,Zr, then xw is chain connected to yw by zlw, z2w,..., rw. It is further clear that Q is an upper bound for R1 and R2: x R Y. x Q y.:;. x R2 y - x Q y. Finally, suppose R is an upper bound for both R1 and R2; x R x R y and x R2 y _ x R y. Suppose x Q y. Then x and y are chain connected by zl,...,Zr, where, for each i, z. R1 z i 1 i+l or Zi R2 Zi+l; thus z. R zi. Since R is transitive, R Zr; i.e., x R y. Therefore, x Q y; > x R y, and Q < R. This completes the proof of Theorem 3. It may be helpful to discuss the equivalence classes of the right congruences P = glb(R1,R2) and Q = lub(R1,R2). The equivalence classes of P are simply those nonempty sets obtained as the intersection of an equivalence class of R1 with an equivalence class of RH. The determination of the equivalence classes of Q can be illustrated

39 by an example. Suppose R1 and R2 are both of finite rank. Let R have equivalence classes E1,E2,E3,E4, and R2, equivalence classes F1,F2,F3,F4,F5. These classes are indicated below, with lines drawn between the classes which are assumed to have an element in common. Then E1 U E2 = F1 U F2 U F3, and E3 U E4 = F4 U F5. The equivalence classes of Q are E1 U E2 and E3 U E4. The lattice ( <, <) has the universal lower bound 0 and the universal upper bound I, where: Yx Vy, xOy, and xIy~=4 x=y. The sets of left congruences and two-sided congruences over A are lattices, with the same ordering relation and definitions of glb and lub as in (i, <). Each of these lattices is a sublattice of the lattice of equivalence relations over A,.

6. THE LATTICE OF STRUCTURE TYPES OF TRANSITION SYSTEMS It is natural to introduce an ordering -4 of transition systems with input alphabet A, induced by the ordering of their equiresponse relations: T T < --- I' L This is a quasi-ordering (reflexive and transitive), but not a partial ordering (not antisymmetric), since distinct transition systems may have the same equiresponse relation. If we say that two transition systems are of the same structure type if their equiresponse relations are the same, then the structure types form a lattice isomorphic with (~, <). In this section we develop the lattice ordering of structure types of transition systems somewhat more indirectly than in the above discussion, and thereby introduce the important concepts'homomorphism of a transition system,' and'equivalence relation with the substitution property.' Let P = (AI, Q' qO' ) and T = (AI, Q qO be transition systems for which all states are accessible. A homomorphism from P onto T is a function h: Q - Q such that a) h(qo)= =o b) V q c Q, V a C AI, h(\(q,a)) = \(h(q),a). Thus, if \(q,a) = q', \(h(q),a) = h(q'). A one-to-one homomorphism is an isomorphism; the inverse of an isomorphism is an isomorphism. 40

41 There is an immediate application of the concept of a homomorphism to the "series connection" of transition systems. In what follows we use the terms "series connection" and "realization" heuristically without defining them. Let h: Q -, Q be a homomorphism of T onto, let R be a set, and let f be a one-to-one function from Q onto S, where S C Q X R, such that, for all q, f(q) = (h(q k(q), k(). Since f is one-to-one and onto, it has an inverse f-1 Each element of Q thus has a "representation" in S as an ordered pair (h(q), k(q)), and, corresponding to the function k: Q X x - Q, there is a function \i which operates on the representations:': S X A - S. It is required that P(f(q),a) = f(x(q,a)). Thus, if (q,r) e S, K((q,r),a) = F k(f (q,r),a) = (h k(f (q,r),a), k k(f-(q,r),a). Since h is a homomorphism, h %(f-l(q,r),a) = k(h f- (,r),a). But, if f (,r) = q, then h(q*) = q; thus, h f (q,r) = q, and the first component of.(q,r),a) is \(q,a), which is independent of r, and is given by the transition function of rT. The second component, k k(f (q,r),a) will, in general, be dependent on q, r, and a. Let us

42 write this function in the form(cp(q,a),r); if (q,r) $ S, c may be assigned an arbitrary value in R. Then cp can be taken as the transition function of a transition system X = (QX AX, R. k(qO), C Strictly speaking, U is a transition system only if Q if finite; otherwise, u has an infinite number of input symbols. The function j is the transition function of the transition system which is the series connection of T and U: Thus, the homomorphism h has yielded a series realization of r T. T has been decomposed into "simpler" transition systems; note that, if T is finite-state, and h is neither an isomorphism nor the constant function h(q) = qW, then R can be chosen such that Q and R each have fewer elements than Q. Just as semigroup homomorphisms are determined by two-sided congruences, the homomorphisms of a transition system T are determined by the equivalence relations with the substitution property over Q. An equivalence relation p over Q has the substitution property if ql P q2 -= for all a e A,, k(ql,a)p \(q2,a). Associated with p, an equivalence relation with S.Po, is a homomorphism of T onto the factor transition system modulo p, T/p. For q E Q, let [q] denote the equivalence class of q with respect to p. Then

43 v /p = (, ([q], q e Q), [qO], \), where k([q],a) = [x(q,a)]. We omit the easy verification that x is well defined, and that the mapping q - [q] is a homomorphism of T onto T/p. Going in the reverse direction, let h be a homomorphism of T7 onto'. Then the equivalence relation - defined by: ql q2 < h(ql) = h(q2) has S.P., and T is isomorphic with T/-. The verification of this is analogous to the corresponding step in our discussion of semigroup homomorphisms. The following theorem summarizes the discussion so far. Theorem 1 Every homomorphic image of a transition sytem T is isomorphic with T/p, for some equivalence relation with S.P. p, and every factor transition system Tp/p is a homomorphic image of T. Let T and 77 be transition systems with input alphabet A and with all states accessible. Let their equiresponse relations be, respectively, and I. We shall show that there is a homomorphism from 7 onto T if and only if < 1, and that such a homomorphism, if it exists, is unique.

44 Lemma 1 _- ~~~~* Let h be a homomorphism from T onto T. Then, for all x AI, rp (x) = h(rp (x)). PROOF: The proof is by induction on lg(x). Basis. If lg(x) = 0O then x = e, rp (e) =qo and h(rp r (e)) = h(qo) = q Induction step. Assuming that the statement of the lemma holds when lg(x) = r-1, consider y = xa, where a e A, and lg(y) = r. Then rp (xa) = \(rp (x),a) \(h(rp (x)),a) h((rp (x),a)) h(rp (xa))o Let the relations -> and <-> be defined by 77 -, T < >there is a homomorphism from 7F onto 7T7, and 7 t 7 if and' only if there is an isomorphism from T onto. The following corollaries of Lemma 1 are immediate. Corollary 1 x y $ > rp (x) p rp (y) mr/p

45 Corollary 2 If T -e T, then < - Corollary 3 If T - T, then = 1. -T T Corollary 4 If T - T,there is exactly one homomorphism from T to T PROOF: Since all states are accessible, h is completely determined by the equation rp (x) = h(rp (x)). T P We next establish the converse of Lemma 1. Lemma 2 If _ l _, then T - T. T T PROOF: Let the function h from Q onto Q be defined as follows: h(q) = q <-> rp (x) = q - = rpl (x) = q. Since rp (x) = rp(yx)r = rp (y), and since every state q is accessible, the given definition determines h completely. To verify that h is a homomorphism, we note that: a) rpT (e) = q and rp~(e) = qo; therefore, h(q0) = qo'

46 b) Suppose rp (x) = q, so that rp-(x) = h(q). Then rp (xa) = k(q,a), and rp, (xa) = \(rp.(x),a) \(h(q),a). Therefore h(\(q,a)) = \(h(q),a). Summarizing, we have: Theorem 2 T -r r ^1 < 1 r - T Corollary 5 1r Example 1: Let us apply our results so far by constructing a homomorphic image of a finite-state system T' with A = [a,b}, Q = (1,2,3,4,5}, q0 = 1 and X specified by the following table: 1 14 %(q,a) a b 2 1 5 Q 3 5 2 4 5 l 5 2 3 The pruned response tree for T' is:

47 1 3 5 2 a b/ a\ b 2 2 b a b 1 4 a b 1 The equivalence relation p with equivalence classes (1,2) = A, 3(,4) = B, and (5) = C has the substitution property. The response function for 77/p is given by rp (x) = [rp (x) ]. T/P Thus Q = (A,B,C). The pruned response tree for T/p is:

48 A B aa b q fq/ 1 A, A series decomposition of T based on the homomorphism T -4 g/ can easily be constructed. Let R = {<^PL and let f: Q - S C Q X R be as follows'. 1 Aa 2 A,P 5 B,Oa 4 B,B 5 C,, Then p is given by the following table: a b A, P A, Bl, Ba Ca Ap B,P C,a A,c C,a A,p B,a The transition system T7 may be realized by the series connection of 1T/p and 2, where U = (Q X A-, R, C, cp), and cp has the following table, where - indicates that the entry is arbitrary.

49.,a A,b B,a B,b C, a B.b. Cj,b eT... Corollary 5 establishes that the relation an equivalence relaCorollary 5 establishes that the relation k —> is an equivalence relation over the set of transition systems with input alphabet A, for which all states are accessible. Let the equivalence classes associated with this relation be called the structure types of transition systems. Let the structure type of T7 be denoted < 7T>, and let the equiresponse relation common to all the elements of < T > be denoted. The rela<7 > tion - between structure types is defined in the natural way: <T > -< T > if and only if, for all X E < r > and e < T >, Z -.. Then the following is an easy corollary of Theorem 2. Corollary 6 < - > < <? > if and only if _ < I_ < r> < T> The relation - is a partial ordering relation over the set of structure types. To discuss the relationship between the system ((< 7 >),-) and (4',<), we need some further general concepts. The partially ordered systems (s, <) and (T, C) are isomorphic if there is a one-to-one function cp from S onto T such that: x < y < cp(x) C cp(y). If (S, <) and (T, C) are isomorphic, then one is a lattice if and only if

50 the other is. If (S, <) and (T, C) are isomorphic lattices, then cp(glb(x,y)) = glb(c(x), c(y)), and cp(lub y lub( p(x),cp(y)). Theorem 3 The partially ordered systems ({<T>]}, -) and (i, <) are isomorphic lattices. PROOF: The structure type <T> e (<T>} corresponds to the right congruence e.. Since every right congruence f is the equiresponse <I >.. relation of some transition system 7, every right congruence is L_ for some structure type <7 >. It is immediate that the condi<?> tions for isomorphism are satisfied. To complete our discussion of the lattice ((<7r>], -4), we shall explicitly determine the greatest lower bound and least upper bound of two arbitrary structure types <'7P> and </f>, and shall specify universal upper and lower bounds. Let us keep in mind that glb(<T>, <X7>) is the structure type of the "least complex" transition system that has both / and'7 as homomorphic images, and that lub(<7>, <T'>) is the structure type of the "most complex" transition system that is a homomorphic image of both T and T Let us recall that 77 77 the direct product of the transition systems T and 7, is defined as follows: ~~77 <X ^77 = (IA, Q X Q, (q0, q), h),

51 where x((q,q),a) (x(q,a), x(qa)). Since T X r' may have inaccessible states, we define ac( T X 7 ) (the accessible part of'7 X > 7 ) as the transition system (AI, R, (qO'o), I'), where R is the set of accessible states of T X 7, and A' is X restricted to the domain R X A>. Lemma 3 For any two structure types < r > and < 7'>, gib (< C >, < T > ) = <ac ( T7 77 ) >* PROOF: Since all states of ac( 7 X 7T) are accessible, the structure type < ac( 77 X T):> is well defined. Because of the isomorphism between ((<T >}, -) and (i, <), it is sufficient to show that glb(l, 1 ) = <r> <T> < ac(7 x r7)> But glb(L, b ) = glb(l, 1-) < T> <7> and = 1 < ac(t X ) > ac( rxr=). /7 X 77 But, clearly, rp (x) = (rp (x), rp (x)), so that

52 x - y <- > x 1 y and x i_ y. 7r X7 7' T Thus _L = glb(l 1 ). The definition of the least upper bound of two structure types is somewhat more complex than the definition of their greatest lower bound. Let p be the following equivalence relation over R, the set of states of ac(Tx>< ) (p,p)p(qq) if and only if there exists a sequence (qlql) "'..,(<qqr,) such that: i) (ql'ql) = (P,P) ii) (qr1qr) = (q,q) iii) for i=1,2,...,r, (qqi) E R, and iv) for i=l,2,...,r-l, qi = q+l or qi =qi+ If these conditions are met, then (p,p) is said to be chain connected to (q,) by the sequence (qlql),..(qrqr) The equivalence relation p has SoPo; for, if (p,p) is chain connected to (q,q) by (ql,ql),.., (qr,), then X'((p,p),a) is chain connected to's((q,q),a) by the sequence X'((lql'8),a ),'~K(( qrqr),a)o Lemma 4 For any two structure types <T> and <T> lub(<'T>, <ir>) = < ac( f')/p >

53 PROOF: Since every state of ac(T X') is accessible, every state of its homomorphic image ac('T/X><)/p is accessible, therefore, the structure type < ac(T X> )/P > is well defined. We have to show that 1,^ = lub(l, 1 ) < ac(T X>)/p > < T> <T> But =. I < ac(T x -)/P > ac( X )/P and, by Corollary 1, xl ac( X T)/P if and only if rp (x) p rp (y), ac( X 7') ac (r X ) and this is equivalent to rp (x) being chain connected to rp (y) Tx (x)>< by a sequence (q,ql),(q2,q2),,(qrq) meeting the conditions stated above. On the other hand, lub(i, L ) - lub(_L, ), <7t> <T>,' and x y (lub(<T>, <->) if and only if x is chain connected to y by a sequence zl,z2...zr meeting the conditions stated in Theorem 5.3. But the two types of chain connectedness are indeed equivalent. For if rp x is chain (T X )>< connected to rp y by (qlq) (ql2q2)''' (qrq r), then x is chain (Tx ) ~'2 connected to y by Zlz2,...,z, where zl=x, z =y, and, for i=2,3,...,r-1, z. is any element of AI such that 1

54 rp (z) = (q i) 7x 1 1 1 such a z. is guaranteed to exist, since (qi,qi) is an accessible state of PT X. Going the other way, we observe that, if x is chain connected to y by the sequence z-,z2...,zr then rp _(x) is chain connected to rp (y) by the sequence ~ xT rp (z ), rp (z2),.., rp (Zr) T x T T T T xT Example 2: Let t and T be transition systems, where A, = {0,1}, Q = {a,b,c}, Q = {A,B,C}. qo=a, qo=A, and X and \ are specified as follows: 0 1 0 1 a b c A A C b a c B B A c l b | C B A The response tree for ac( ) is as follows The response tree for ac(Tx><) is as follows:

55 (c,B) (b,A) (a A) (c C) d(B) (b,A) 1 0 1 (\A) (c C) 01 (a,A) Thus, every transition system in glb(<T>, <' >) has four states, and is isomorphic to ac(f XT). There is a homomorphism h from ac(r7 X><) onto T, where h(a,A) = a, h(b,A) -.. b, and h(c,B) = h(c,C) = c. There is a homomorphism h from ac(TX T) onto, where h(a,A) = h(b,A) = A, h(c,B) = B, and h(c,C) = C. The equivalence relation p has the equivalence classes a = ((a,A), (b,A)) and = ((c,B), (c,C)}. The pruned transition tree for ac( X T)/p is: 0 I \ 7

56 The transition system ac(/7? >X )/p is the "most complex" transition system which is a homomorphic image of both T and. There is a universal lower bound <F > in the lattice ([{<>}, - ). The transition system is the "free" transition system = (A, A, e, cp), where cp(x,a) = xa. The equivalence relations with S.P. for / are precisely the right congruences over A, and any transition system 77 is isomorphic to F/I. There is also a universal upper bound < Y>, where / is a transition system with only one state.

7. SEQUENTIAL MACHINES WITH INPUT ALPHABET AI Building on the results of the previous section, we shall next develop the lattice of types of sequential machines with input alphabet AI. We shall also derive and apply a necessary and sufficient condition for a function to be representable by some finite-state sequential machine, and shall show that any function with domain AI has a unique "least complex" machine representing it. We shall also provide an algorithm for determining the machine with a minimum number of states which represents the same function as a given finite-state machine. Our development in this chapter as well as Chapters 2, 3, 5 and 6, is largely derived from [ 8 ], and from the exposition of [10] given in [8 ] We have already seen an important equivalence relation over A, which may be associated with a sequential machine V~ = (A,' Q, Ao' qo, ), namely the equiresponse relation 1, determined as follows: x 1 Y = rp (x) - rp.(y). It is natural to introduce a second equivalence relation, the equiresult relation A, as follows: x Ap y < =- res(x) r es) res(y). As we have shown, _ is a right congruence. Also, I < A; i.e. x y = x Ay. This is evident from the definition of the function res (x) as 5(rp (x)); clearly, rp (x) = rp (y) = res (x) = res (y). 57

58 Let R and S be equivalence relations over A1 such that R is a right congruence and R < S. Then there is a natural way to construct a machine such that R = E_ and S = A e For x E Ai, let [x] denote the equivalence class associated with R to which x belongs, and let < x > denote the equivalence class of x in S. Then (AI, [[x], x a A}, (< x >, x e A1}, [e],', 5'), where X'([x],a) = [xa], and 51([x]) = < x >. We omit the simple proofs that \' and b' are well defined, and that 7 has the stipulated properties. Summing up, we have the following theorem. Theorem 1 Let (R,S) be an ordered pair of equivalence relations over A. Then there exists a sequential machine /7 such that R = 1 and S = A-7 if and only if R is a right congruence and R < S. Corollary 1 There exists a finite-state machine' with R = 1 7 and S = A- if and only if R is a right congruence of finite rank, and R < So Corollary 2 Let f be a function from A to Ao. Then f is representable by a finite-state sequential machine if and only if the equivalence relation -f given by x f y < —> f(x) = f(y) has, as a refinement, a right congruence of finite rank (R is a refinement of S if R < S).

59 Later, we shall illustrate the usefulness of Corollary 2. A lattice ordering on the ordered pairs (R,S) satisfying the conditions of Theorem 1 can be introduced in a natural way, as follows: (R,S) < (R',S') <- R < R' and S < S' Let this lattice be denoted (I, <). We could then say that the machines kt and1I are of the same type if (It A AD) = (1 As), and a lattice ordering on machine types would be induced. Following the strategy of the previous section, we prefer to develop the lattice of machine types from the point of view of homomorphisms. Let / =(= (AI, Q, A,o qo', 6) and 7 = (A,(,, A%, qo,,). We assume throughout the following discussion that all states are accessible, and that the functions 5: Q -> A and 5: Q - A are both onto. A homomorphism from a onto X is a pair of functions, onto onto h: Q —--- Q and cp: A --- A 0 0 such that i) h(qO) = 0 ii) h(X(q,a)) = \(h(q),a) iii) S(h(q)) = cp(b(q)). Conditions (i) and (ii) are, of course, familiar from our discussion of transition systems. Condition (iii) states that the output of the homomorphic image of q is the homomorphic image of the output of q. Applying

60o Lemma 6.1, we have the result that, for all x E A-, rp.n(x) = h(rp7(x)). Now, res (x) = (rp (x) = 5(h(rp (x)) = p(5(rp (x)) = p(resW(x)). Thus, for all x e A, res^(x) = cp(res7(x)) (2) Let 8( -~ P denote that there exists a homomorphism from X onto. Then the following lemma is a consequence of equations (1) and (2). Lemma 1 ISf E- - At, then (1_ Ag) < (, AJ). The converse of Lemma 1 is quite easy. Lemma 2 If (, A ) < (I, ), then ~ -X 7. PROOF: The required function h and p are constructed as follows. If there exists xc e such that rp (x) = q and rp^(x) = q, then h(q) = qo If h(q) = q, then Cp(q) = (q) We omit the simple proofs that h and cp are well defined functions, and that they determine a homomorphism from'* onto 7. Q.E.D.

61 Combining Lemmas 1 and 2, we have the following result. Theorem 2 Let 7 and 7 be such that all states are accessible, and the functions 5: Q -, A and 5: Q - A, are onto. Then 0 o A strong homomorphism from'% onto ~7 is a homomorphism such that the function cp is one-to-one. An isomorphism from-'1 onto 1At is a homomorphism such that the functions h and cp are both one-to-one. Let the expression' -+ k> denote the existence of a strong homomorphism from?H onto ~, and let the existence of an isomorphism be denoted by < -.e t. It is evident that t.- 7if if and only if A He o. The methods used to establish Theorem 2 yield the following corollaries. Corollaryl 3 if ad if and only if < and = Corollary 4 ~,-* 7? if and only if (, A ) (J A.t). Two machines 7t and 79C may represent the same function, and therefore have the same equiresult relation, even though their equiresponse relations are incomparable, so that neither is a homomorphic image of the other. The following two machines provide such an example.

62 1 1 -7^ 2 The functions res,(x) and res.-(x) are identical; each function is equal to 1 if and only if the number of l's in the sequence x is odd. It is also true, of course, that two machines may have the same equiresponse relation and have incomparable equiresult relations. Finally, it is possible to have 1 < - and A <J I The relation --- is an equivalence relation. Let us call the equivalence classes associated with this relation machine types, and let the machine type of 17 be denoted < 71>. Following the approach used in dealing with transition systems, we define a partial ordering -- on machine types as follows: <> -> < 7 > if and only if, for all t? e < 7> and -) < -v>, f -- 7W. Then the partially ordered system (([< >], - ) forms a lattice isomorphic with the lattice (t',<). The machine type <7W> corresponds to the ordered pair (I,A7 ). The greatest lower bound and least upper bound of two machine types are easily obtained by analogy with the case of transition systems. Let t = (AI, Q, Ao, q0 \, b), and let 7Y4 = (AI, Q, A, q, ).

65 Then 9 X 7' = (A, Q X Q, Ao X A, (qO, ),, 5') where x'((q,q),a) = ((q,a), 7(qa)), and s'(q,q) = ((q), s(q)). Let ac(72 X >) be the machine obtained by retaining only the accessible states of? ><X 7?, and retaining only those elements of A X A O O which are of the form (b(q), b(q)), where (q,q) is accessible. Lemma 3 Let 7$ and 7f be sequential machines belonging to the machine types < V > and < >. Then glb(<7'>, < i>) = <ac(; X < ) >. The construction of lub(< o>, < W>) is a direct extension of the corresponding construction for transition systems. The equivalence relation with S.P. p over the states of ac(?7X ) is defined exactly as it was defined in the last section: there exists a sequence (p,p) p (q,q) - (ql),.,( ) of states of ae(' X >'.) such that (P,p) = (ql'l),' (qq) = (q)r) and, for i=1,2,...,r-l, qi qi+l or qi = i+l' A second equivalence relation p has the following definition:

64 there exists a sequence (p,p) 1 (q,q.) <-=-> ~ (ltl^)' ^'(grv9) ~of states of ac( X>A) such that (p, ) (Pp) = (qlcl), (qq). = (= r qr)) and, for i=1,2,,,r-1l (ql) = (q+l) or = ~or^~ &5(Rqi) 6(qi+1)~ It is evident that p is a refinement of oi. Let the equivalence class of (q,q) in p be denoted [q,q], and let its equivalence class in p be < q,q >. Let the machine I be defined as follows: ~ -= (I, [[qq]}, (< q,q >}, [%o9O]9 An q) where %([qq],a) = e[(qq),al and %([1qE]): < ]q,1 >o The functions X and 6 are well defined by virtue of the fact that p is an equivalence relation with S.Po and a refinement of ~po Lemma 4 Let i and;? be sequential machines belonging to the machine types <7f > and < 7t >, and let - be derived from'e and f by the construction given above. Then lub(<) > < <7;>) = <-7> We omit the proof of Lemma 4.

65 The lattice ((<I>), -,) has a universal lower bound, namely the "free" machine type of which the following machine r is a representative: F = (AA, A,, e,., 6), where x(x,a) = xa, and S(x) = x. There is also a universal upper bound, the machine type whose elements are all the one state machines. Let A be an equivalence relation over A. Then the set of all elements (R,S) E (7/,<) such that S = A forms a sublattice of (_<): glb((R,A), (R',A)) = (glb(R,R'),A), and lub((R,A), (R',A)) = (lub(R,R'),A). We shall show that this sublattice has a universal upper bound. The element in the lattice ((< k>), -> ) corresponding to this universal upper bound is then the least upper bound of all machine types having the equiresult relation A. Let the equivalence relation RA over A, be defined as follows: x RA y <=== for all w E A, xw A yw. Then RA is a refinement of A, and RA is a right congruence: xzw A*z xzRy. x RA y — Y z E A, Y w E A, xzw A yzw = xz RA yz.

66 Moreover, if R is any right congruence which is a refinement of A, then R is a refinement of RA: x R y > for all w e A, xw R yw = for all w C A, xw A yw = x RA y Therefore RZ is the least upper bound of all right congruences which refine A, and (RA,A) is the universal upper bound of the sublattice of (/,<) consisting of all the ordered pairs (R, A). The right congruence RA is called the right congruence induced by A. The machine type < Jh> corresponding to (RAA) in the lattice ((<7? >), - ) is a strong homomorphic image of every machine type having the equiresult relation A. Although the following result is a direct consequence of Corollary 1, we designate it as a theorem because of its importance. Theorem 3 The equivalence relation A over Al is the equiresult relation of a finite-state sequential machine if and only if RA is of finite rank. Let f be a function from A, to a set A o Then any sequential machine realizing f has the equiresult relation -f: x Yf y f(x) = f(y). Then R_, the right congruence induced by -f has the following definition: -f x R_ y < ==> for all w, f(xw) = f(yw). -f

67 For any element x c Ai, let the function fx: AT Ao' be defined by the equation fx(w) = f(xw). Then x R_ y if and only if the functions f (w) and f (w) are "~~~f ~x y identical. Theorem 4 A function f from AT to A0 is representable by a finitestate machine if and only if the number of distinct functions f (w) as x ranges over AT is finite. PROOF: If f is representable by a finite-state machine then R_ is of f finite rank, which means that the number of distinct functions f is finite. Conversely, if the number of distinct functions f is finite, x then R_ is of finite rank. Let the equivalence class of x in R_ be -f -f denoted [x]:. Then the following finite-state machine realizes f:'i = (AI, ([x], x E A)}, Ao, [e], X, 5), where x([x],a) = [xa] and f(x). Q.E.D. Let us make some applications of Theorem 4. First, we shall prove that, in a certain sense, a finite-state sequential machine cannot

68 recognize perfect squares. Let f: ([a - (0,1 be defined as follows: 1 if n = s2 for some nonnegative integer s f(an) 0 otherwise. Let k and I be integers, with k < oI Then f (a f(a (k)) - i, k a and f (a2k+) 0. a Therefore the function f 2 k=0,l,2,... k a are all distinct, and, by Theorem 4, f is not representable by a finitestate machine. As another example, we shall show that the set of palindromes over an alphabet with more than one letter cannot be recognized by a finite-state machine. For x c Ai_, let x denote the reversal of x: e = e, and, if x=ala2. an la n x = a a - o a2al. n n-l1 21 Then x is a palindrome if and only if x = x. If punctuation and spacing are ignored, the following is a palindrome over the Roman alphabet:

69 A MAN A PLAN A CANAL - PANAMA! Let f: A,- ( 0,1) be defined as follows: 1 if x = f(x) = 0 otherwise. If AI has only the single element a, then every element of AI is a palindrome, and the one-state machine with output 1 recognizes f. Suppose, however, that AI has two distinct elements a and b. Let k and I be nonnegative integers, with k < a. Then f (a) = 1, akb and f k(ak) 0. a b Therefore there is an infinite number of distinct functions f, a\ k=0,1,2,..., and f is not representable by a finite-state machine. Before proceeding to the next application, it will be convenient to derive a variant of Theorem 3. Let A be an equivalence relation over AI. Then TA, the (two-sided) congruence induced by A, is defined as follows: X TA y - >V v V w, vxw a vyw. Clearly, TA is a refinement of A. Also, Ta is a two-sided congruence: x TA y - V v wV v' V w, v'v xw w' A v'vyw w' V v V w, vxw TA vyw.

70 Finally, if T is a two-sided congruence which is a refinement of A, x T y V v V w, vxw T vyw = V v V w, vxw A vyw = x T y. Theorem 5 The equivalence relation A over A is the equiresult relation of a finite-state sequential machine if and only if TA is of finite rank. PROOF: Since TA is a congruence, it is a right congruence, and T <_ RH. If Ti is of finite rank, then so is RH, and, therefore, A is the equiresult relation of a finite-state machine. Going in the other direction, suppose A is the equiresult relation of a finitestate machine' -with n states. For the rest of this chapter, let * the equivalence relation - over AI be defined with reference to the machine W as follows: x y _- > Y q, %(q,x) = x(q,y). With x fixed, the function xl(q) = x(q,x) is a transformation from Q into Q. Since Q has n elements, only n such transformations exist; therefore, is of finite rank. But - is a two-sided congruence: x y =- V q, x(q,x) = (q,y) > V q V v \(q,vx) = x(x(q,v),x) = x(x(q,v),y) = x(q,vy)

71 -== V q V v V w, c(q,uxw) =. (x(q,ux),w) = X(x(q,uy),w) = (q.,uyw). Also, ~ is a refinement of A. Therefore, < T, and, since is of finite rank, so is TA. Corollary 5 A function f from A- to A is representable by a finitestate machine if and only if the congruence relation T_ ~f is of finite rank. * N Let f be a function from AI to Ao, and let the function f be defined as follows: f(x) = f(x). Using Corollary 5 we shall prove that, if f is representable by a finitestate machine, so is f. If f is representable, then T_ is of finite f rank. But x T_ y - V v V w, vxw =_vyw. "f f - AeV w V v, wxV. F/ wyv,. - x T y. Therefore, T is of finite rank, and, by Corollary 5, 5 is representable -f by a finite-state machine. Having shown how results associated with the induced right congruence Ri and the induced congruence Ti are used in determining whether certain

72 functions are representable by finite-state machines, we shall next apply the same ideas to a problem of machine minimization: given a machine, find a "least complex" machine having the same equiresult relation as ~7 More precisely, the problem is this: given IX, find a representative U of the machine type < 7t >.such that A = AL and = R. > A <7> < -,> A71V From our knowledge of the properties of induced right congruences and of the isomorphism between the lattices (',<) and ((<7 >1}, - ), we know the following facts. 1) A machine X with the desired properties exists. 2) There is a strong homomorphism from (/ onto co/, 3) If there is a strong homomorphism from-I onto a machine -,. then 7 and m7 are isomorphic. I4)'77 is a finite-state machine if and only if RA is of -7+ finite rank, and, if A/ is finite-state, it has as many states as R, has equivalence classes. 5) If y is finite-state then, up to isomorphism, ~7 is the unique minimum-state machine having the equiresult relation A 6) If f is a function such that = is A, then7t is isomorphic f 7 to the minimum-state machine realizing f. Our next endeavor is to determine the strong homomorphism from -7l onto f. In the case where 7 = (AI, Q, A, q0',., 5) is finite-state, we will then be able to give an algorithm for the construction of %.

73 Let an equivalence relation over Q be defined as follows: the states ql and q2 are indistinguishable (denoted ql q2) if, for all w e AI 5(%(q.,w)) = 5(X(q2')); q and q are distinguishable if they are not indistinguishable. Lemma 5 Indistinguishability is an equivalence relation with S. P. PROOF: It is evident that is an equivalence relation. Also, q1 q2 == a E A,, V w e AI, k(ql,aw) = (q2,aw) =- V w E A, \(x(qla),w) 2= )((q2,a),w) j h(qla) -a (q 2,a). Therefore - has S.P. Lemma 6 Let x and y be elements of AI. Then x Ra y > rp(x) = rp1(y). PROOF: x RA y ===> V w, res,(xw) = res,,Xyw) <===> V w, ((rp_(x),w): 8(k(rp (y),w) < — rp,(x) -- rp_(y),

74 For q ~ Q. let [q] denote the equivalence class of q in the relation =. We shall prove that the desired machine 7 is the image of the strong homomorphism in which h maps q into [q], and cp is the identity function. Theorem 6 Let = (AI, ([q], q E Q}, Ao, [qo], k, 5), where k([q],a) = [x(qa)], and 5([q]) = 5(q) Then (I n A,) (R ) PROOF: The function \ is well defined, since - has S. P., and 6 is well defined, since ql = q2 = (ql) = 6(q2). Also, it is immediate that /i > A; the functions associated with the strong homomorphism are h, mapping q onto [q], and cp, the identity function. Since the homomorphism is strong, A 7 = A It remains only to prove that | RA 3 By Lemma 6, x R. y r >rp (x) rp (y) 4== [rp (x)] = [rp (y)j < >=rp (x) = rp (y) A sequential machine 7t is said to be reduced if _- R A or, equivalently, if any two states of t' are distinguishable. As an application of the foregoing development, we shall give an algorithm for

75 explicitly determining the equivalence relation =, and thereby constructing 1&, the reduced form of Y7, when ~ is finite-state. This algorithm is useful in the design of sequential circuits, since the number of latches required in a circuit "realizing" an n-state machine is q, where 2-1 < n < 2q; q is therefore a monotonic function of n. The states ql and q2 are k-indistinguishable (denoted ql q2) if, for all w e A, such that lg(w) < k, F( (q2, )) = 5(x(q2.w)). The equivalence relation does not necessarily have S.P. Lemma 7 For ql c Q and q2 c Q i) q1 o q2 == l) -= (q2) ii) for k > O, ql k q2 < > ql k q2 and, for all a AI,;(ql,,a) - x(q2,a). PROOF: q1 0 q2 == 5( &((ql,e)) = 5(%(q2,e)) < 6 (q1) = 5(q2); q1 k-l (q if and only if the following two conditions hold: a) for all w 3 lg(w) <_k, 5(%(ql,w)) = 6(x(q2,w); b) for all w 3 lg(w) < k, for all a C A;, 5(X(ql,aw) = 5(x(q2,aw)). Condition (a) is equivalent to ql k q2. Condition (b) is equivalent to: for all a c AT, and for all w such that Ig(w) < k,

76 s(\(\(q~a),w)) = s5(%((q,a),w)); i.e., for all a c A1, (qla) E (q^,a). Corollary 6 If, for some k > 0, the relations and l are equal, then each of them is equal to the relation Corollary 7 If 2 has n states and p output symbols, then the relations and - are equal. n-p PROOF: As a consequence of the definitions of indistinguishability and k-indistinguishability, - < < <, < <= -n-l -n-2 -1 - 0 If, for some k < n-p, = is equal to then is equal to, and both are equal to - Otherwise, for k=0,l,..,n-p-l,, strictly refines n -p and therefore has at least one more equivalence class than ko Therefore, since - has p equivalence classes, p has at least p + (n-p) = n 0n-p equivalence classes. Then n is the identity relation, and, since the n-p identity relation has no refinement except itself, npis equal to Q.:E D. Corollary 7 has the following interpretation: if the states q and q2 are distinguishable, then there is an "experiment" w c A, of length n-p or less that distinguishes them: i.eo, 65((q1,w)) / 6('(qow)).

77 The algorithm for determining = is based on Lemma 7. The equivalence relations -, =,... are determined successively until, for some k, 0 1 is equal to k-l' By Corollary 7, this procedure must terminate. Example Consider the machine 7T specified by the following two tables... l la |b q 8(q o0* 5 1 0 f 1 4 3 1 l Q 2 2 5 2 2 3.3 3 P 4 1 2 4 & 5 45 5 k 5 The computation is summarized in the following table. Equivalence relation Equivalence classes =0 (0,5), (1,2,3,4} =1 (0,5), (1,4, (2,53 -2 o(0,5}, (1,43, (2,53 The relation- has three equivalence classes: A = (0,5}, B = (1,4), and C = (2,3}. The three-state machine' is specified as follows: a b () AX IA B A a Q B B C B 5 C C A C _'..

8. SEMIGROUPS AND SIMULATION In this section we discuss two important congruence relations over the set of input words of a sequential machine %. Properties of the associated factor monoids (the transition monoid of t and the function monoid of 7' ) are derived, and the Krohn-Rhodes normal form, involving the function monoid of t, is introduced. The main result of the section shows how the capabilities of two machines, possibly with different input alphabets, can be compared by considering the normal forms of the machines. Most of the material of this section is drawn from [13], and from J. Myhill's paper, "Finite Automata, Semigroups and Simulation," University of Michigan Engineering Summer Conference on Automata Theory, 1963[16]. Both of the two-sided congruence relations that we wish to consider occurred in the last section, but we reintroduce them here. Let % = (Ae, Q, Ao, q,, ) be a sequential machine with all states accessible, and with every element of A equal to 6(q) for some q E Q. The equivalence relations and cp over A are defined as follows: x - y <==V v Y w, rp 7X(v x w) = rp7,(v y w). x cp y c > V v V w, res (v x w) = res (v y w). Then, using the notation of the last section, cp = TA It is evident from the definitions that - and cp are two-sided congruences, and that ^j < p. Using the fact that all states of %4 are accessible, we can give an 78

79 alternate characterization of each of these relations. The first lemma shows the equivalence of the alef:intikon: of! -with that on page 78. Lemma 1 Let x and y be elements of A,. Then x - y <-= q, x(q,x) = x(q,y). PROOF: From the definition, x ~y;V v V w, rp (vxw) = rp (vyw). Since rpt(vx) = rp (vy) ==v w, rp (vxw) = rp_(vyw), x.'y p V v, rp,(vx) = rp^(vy) -= =V v, \(rpw(v),x) = \(rp,(v),y), Since all states are accessible, every state q is rp (v) for some v; therefore, x y > yV q, A(q,x) = (q,y) The following lemma relates the relation cp to the indistinguishability relation Lemma 2 Let X and y be elements of A,. Then x cp y = > V q, \(q,x) - ((q,y). PROOF: From the definition,

80 x cp y < > v V w, res7(vxw) = res.z(vyw) #-== Vv Vw b (x(rp-(v), xw)) = 5(x(rp.,(v), yw) < — PV v, %(rp, (v), x) - %(rp,(v), y). Since all states are accessible, every state q is rp (v) for some v; therefore x c y = — > V q, \(q,x) -- (q,y). Corollary 1 If h is reduced, then the congruence relations - and cp are equal. The converse of Corollary 1 is false. The factor monoid A,/ is called the transition monoid of I, and A/cp is called the function monoid of ). Just as every group is isomorphic to a group of permutations, every semigroup is isomorphic to a semigroup of transformations-functions from a set to itself. A semigroup of transformations isomorphic to (S,.) is ((R, a EcS, *), where R is the transformation of S defined by the a a equation Ra(s) = sa, and R * = Rb a a a'b It is immediate that the function cp: a -e R is an isomorphism. The, a, *) i cllaed the reular reresentation of (S semigroup ({R, a ~ S), *) is called the regular representation of (S,o)o

81 The transition monoid of a machine~ 7 has another, more useful, representation as a monoid of functions. For x. A, let the function A: Q - Q be defined by the equation X (q) = A(q,x). x Then, by Lemma 1, the functions Xx(q) and ky(q) are identically equal if and only if x y. Let the monoid of transformations Ta be defined as follows: TAX = ((x,x x E A}, *), where x y = xy Lemma 3 The monoids A/~ and Ta are isomorphic. PROOF: Let [x] be the equivalence class of x in the equivalence relation ~. The function p: /onto given by the equation cp(:[.x) = Xx is well defined and one-to-one, by Lemma 1. Clearly, cp is onto. Finally, c is a homomorphism:

82 x y xy cp([x y]) @C([xy]) = \y. Let (S,o) be a monoid. A set A C S is a set of generators of S if every element of S is a product of elements of A (by convention, the empty product is equal to the identity element). Lemma 4 Any monoid (S,') with a finite set of generators A is isomorphic with both the transition monoid and the function monoid of some sequential machine. PROOF: The required machine is X7(S) = (A, S, S, e, x', 6'), where X'(sa) = sea, and 6'(s) = s. Then the semigroup T is isomorphic with the regular representation (S) of S. The isomorphism p is as follows: if (s!,s2,.,sm} C A, and s1 s2 o** *m = a, then cp(k' ) = R ~ S1 S2... S) Ra Then the semigroups

85 A/~, T t(S) ({Ra, a c S), *), and (S,e) are all isomorphic. Since every state of G9(S) has a different output,'29(S) is a reduced machine, and A/cp is isomorphic with the other semigroups mentioned above. Since A/ can have as many as n elements if 7;f has n states, its primary usefulness is theoretical, rather than computational. Nevertheless, it is instructive to work through the construction of this semigroup, and the isomorphic semigroup T.., in a."toy"' ekample, -It'would be possible to formalize the methods we shall use in this example, but it is doubtful whether such a formalization would serve a useful purpose, in view of the typically unwieldly size of transition monoids. Example 1 We consider a machine 7Z with Q = qO'q1!'q2 }' A] = (a,b), A0 = (0,1), and \ and 5 specified by the following tables,

84 AI ___ a b q ( q0 q~ q0 1 Q qlqO q2 ql 2 q q2 It is easy to check that T is reduced; therefore A/ and A/cp are isomorphic. Since the functions a (q) and D(q) are permutations, every function Xx(q) is a composition of permutations, and, therefore, a permutation. Thus, TV is alsemigroinp of permutations of the finite set Q, and T~, is necessarily a group. By isomorphism, A / ~* A_/~ and AI/cp are also groups. The following diagram shows part of the free tree generated by A,. Each node n is labelled with the 3-tuple x (xx(qo), X x(_, x(q2)) specifying the function x. The label of the node n may be obtnx xa tained conveniently from the labels of n using the identity

85 (q2',,qqO (q2,l'qO) a b (qO' q'q2) (q2'qo' l) (ql'q2,2O) (qq0,1,q2) a b a b (ql, q0,q2) (q0',q 1) b (EO' q1' q2) Each of the six possible permutations of Q appears as a function Xx in the subtree. Therefore, T, is isomorphic to S3, the symmetric group on three letters. Choosing the first (i.e., minimum-length, and leftmost within a given length) representative of each equivalence calss for the equivalence relation a, we find that the equivalence classes are [e], [a], [b], [ab], [ba], and [aba]. Also, we note that the functions aa,`bb' and ke are all identical, and that the functions aba and,ab are identical. Therefore: a a e bb b e b a b ab a. The multiplication table for A/, is as follows:

86 e a ] [b ] [ab] [b a] [aba] [ e ] [ e ] [ a ] [ b ] [a b] [b a] [aba] [ a ] [ a ] [ e ] [a b] [ b ] Laba] [b a] L b ] [ b ] [b a] [ e ] [aba] [ a ] [a b] [a b] [a b] [aba] [ a ] [b a] [ e ] [ b ] [b a] [b a] [ b ] [aba] [ e [a b] [ a ] [aba] [aba] [a b] [b a] [ a ] [ b ] [e ] The construction of this table was carried out with the aid of (1) For example, the computation of [aba] [ba] was carried out as follows: [abaJ Lba] = [ababa] Since bab aba a(bab)a a(aba)a Since aa e (aa)baa e(baa) = baa Since: aa -e b(aa) b(e) = b, Therefore, [aba [ba] = b], Since the set of identities (1) is sufficient to determine each of the entries in the above table, these identities are called defining relations for the semigroup A,/-. The defining relations are sufficient to determine the equivalence class to which a given word belongs. Consider, for example, the word x = aabababb abab abab a x = (aa)baba(bb)(aa)babababa Using the identities aa - e and bb - e,

87 x - babababababa = (bab)a(bab)a(bab)a. Since bab ~ aba x - abaaabaaabaa = ab(aa) ab(aa) ab(aa). Since aa ~ e x - ababab = aba(bab) Since bab ~ aba, x - abaaba = ab(aa)ba Therefore, x a(bb)a x aa x e. Therefore [x] = [e], x = e' so that x is the identity function. This tells us, for example,that x rp(x) = qO. We shall next show how concepts about semigroups and semigroup homomorphisms may be used to compare the capabilities of sequential machines, and in particular, to determine the conditions under which one machine can simulate another. In the following discussion, all the relevant characteristics of a machine are determined by the function which the machine represents. Thus, a machine is specified by a function f: A -B, and we speak interchangeably of the machine f and the function f. We shall not exclude the possibility that A, the set of generators of the

88 free monoid A, may be infinite, Let us recall that the congruence relation cp over A for the machine f is defined as follows: x cp y < V v V w f(vxw) = f(vyw). The factor monoid A /cp is called the function monoid of f; its elements are equivalence classes associated with cp, and its multiplication operation * is as follows: [x] * [y] [xy], where [x] denotes the equivalence class of x. Henceforth, this monoid will be denoted Sf. If [x] E Sf, x1 c [x], and x2 E [x], then, clearly, f(xl) = f(x2). An equivalence relation pf over the elements of Sf may be defined as follows: [x] pf [y] - f f(x) = f(y) The ordered pair (SfPf) is called the normal form of f, and denoted NF(f)o A reflexive, transitive relation over the set of normal forms of machines may be introduced as followso NF(f) I|F(g) ( is read'divides') if S has a subsemigroup S, such that there is a homomorphism g 0 from S onto Sf satisfying the following conditions: for all sl c S, s2 c S, S1 Pg S2 -- (s) p ef (2) It should be noted that this definition involves only the abstract properties of Sf, as given by its multiplication table, and has nothing to do with the fact that the elements of Sf are equivalence classese Also,

89 the definition applies even if the input and output alphabets of f are different from those for g. By contrast, the relation (I j'a AS (I7 A) studied in the last section makes sense only for machines with the same input alphabet, and involves the state-transition structure of a machine as well as the function it represents. Let f: A -*B and g: C - D be machines. We say that f | g (f divides g) if there exists a homomorphism H from (A,.) into (C*,'), where'.' denotes concatenation, and a function h from D to B, such that the functions f(x) and hgH(x) are identical. The mappings involved in this definition may be diagrammed as follows. * gf A --— > B H h C D Since H is a homomorphism, it is completely determined by the values it assumes for the elements of A; for, if x = a 2... ana where each of the a. is an element of A, then H(x) = H(al)H(a2)...H(an) Thus, the statement f | g may be interpreted as meaning that, given an appropriate input encoder and output decoder, the machine g can

90 simulate f. The input encoder replaces each letter a by the string H(a); the output decoder maps the final output symbol d into h(d), an element of B, In digital computer terms, we may say somewhat fancifully that, with suitable data format conversions, the computer g can simulate the computer f. Theorem 1 (Krohn-Rhodes) Let f and g be machines. Then f g if and only if NF(f) [ NF(g) o This theorem states that the capability of g to simulate f is determined by the properties of the function monoids Sf and Sg, and the equiv alence relations pf and pg Before entering into the proof of Theorem 1 let us consider an example. Example 2 Let f be the function represented by the following sequential machine-,/ \A

91 It is clear by inspection that the transition monoid of this machine is C3, the cyclic group of order 3. Since the machine is reduced, Sf = C3. The multiplication table for Sf is as follows: [e] [A [] B] [e] [e] [A] [B] [A] [A] [B] [e] [B] [B] [e] [A] Since f(e) = a, and f(A) = f(B) = D, the equivalence relation pf has the equivalence classes [e], [A], [B] Let g be the function represented by the reduced machine that was considered in Example 1. aa. \~~~~~~~~~a As we found in Example 1, the function monoid S is isomorphic to the g symmetric group S3, which has the subgroup C3. The subgroup S of S isomorphic to C3 has the following multiplication table. [e] [ba] [ab] [e] [e] [ba] [ab] [ba] [ba] ] [ab] [e] [ab] [ab] [e] [ba]

92 Also, g(e) = 1, g(ab) = g(ba) - 0; [ab]p[ba] [ab]g[e]., Therefore, the following isomorphism 0 from S onto S. satisfies the condition Slp =s2 0(s)Pf 0(s2): 0([e]) = [e], 0([ba]) = A, 0([ab]) = B. Therefore, NF(f) INF(g). On the other hand, it is also true that f g. The required homomorphism is given by H(A) = ba, H(B) = ab, and the function h is given by h(l) = a, h(0) = p, In proving Theorem 1, it will be useful to associate with a function fo A - B, another function F~ Sf - Bo The function F is represented by the following sequential machine (which may have an infinite input alphabet): (Sf, Sf, B, [e], L, 6), where \(s,t) = s * t, and, if s = [x], 5(s) = f(x). Then

93 F([e]) = f(e), and, in general, F([x1][x2]..[xn]) = f(xlx.. xn). Note that (Sf,) and (Sf,*) are not the same monoid. Thus, for example, * [e] is the identity element of Sf, but is a word of length 1 in Sf, whose identity element is e. Lemma 5 Let f be a function from A to B, and let F be the associated function from Sf to B. Then f I F and F f. PROOF: First, wze show that f F. The diagram showing the domains and ranges of the required homomorphism and function is as follows: *x f i T A- I F Sf B For all a c A, let H(a) = [a]. Then H(ala2..a) = [a] [a]...[an], and F([a1][a2]...[an]) = f(ala2. a n). If h is the identity function then f(x) is identically equal to hFH(x), as required. It is equally easy to show that F I f. The diagram of mappings for this case is as follows:

94 *g F Sf -> H h V i'h * f A > B The required homomorphism H is determined by the equation H([x]) = x, where x is an arbitrary, but fixed, element of [x]. Then F([Xl][X2]ooo~ [x n]) = f(xlx2 ooxn) = f(H([xl] x2], [xn])). If h is the identity function, then F is identically equal to hfH. This completes the proof of Lemma 5. PROOF: (of Theorem 1) f g NF(f)|NF(g) We begin by showing that f I g NF(f) NF(g). If f I g, then f = hgHo The diagram of mappings is as follows: * f A > B H h * C. > D Let S, a subset of S g be defined as follows: S = ([H(x)], x c A ); note that [ ]P denotes'equivalence class of the relation cp. To prove that S is a semigroup (and therefore a subsemigroup of Sg) we g

95 must show that [H(x)]*[H(y)] c S. But [H(x)]*[H(y)] = [H(x)-H(y)] = [H(xy)]. Therefore, S is closed under *, the multiplication operation in Sf. Now, to show that NF(f) I NF(g), we must construct a suitable homomorphism from S onto Sf. First we show that [H(x)] = [H(y)] -- [x] = [y]: [H(x)] = [H(y)] ==- V r e C V s e C, g(rH(x)s) = g(rH(y)s) ==' V v e A* y w E A*, g(H(v)H(x)H(w)) = g(H(v)H(y)H(w) V v V w, g(H(vxw)) = gH(vyw) >== V v V w,hgH(vxw) = hgH(vyw),=> x cp y =, [x] = [y]. In view of this result, the mapping 0: [H(x)] [x] is well defined, and is a mapping onto Sf. Also 0([H(x)] * [H(y)]) = 0([H(x) H(y)]) = o:([H(xy)]) = [xy], and 0([H(x)]) * 0([H(y)]) = [x] * [y] = [xy]. Therefore, 0 is a homomorphism. Finally, we must show that, for [H(x)] c S and [H(y)] E S,

96 [H(x)]pg[H(y)] —, [x] pf [y]. This is done as follows: [H(x)] p [H(y)] -- sg(H(x)) = g(H(y)) = hgH(x) = hgH(y) = f(x) = f(y) =^ [x] pf [y]. To sum up, S is a subsemigroup of S g and there is a homomorphism 0 from S onto Sf such that Sl Pg 2 == O(s1) Pf O(S2). The second half of the proof requires us to show that NF(f) | NF(g) == fjg. By Lemma 5 and the transitivity of the "divides" relation, an equivalent statement is: NF(f) | NF(g) = ~ F | G, where G is the function from S to D associated with g. The domains g and ranges of the mappings to be constructed are shown in the following diagram, * F S ------ B A H h V * G S.. > D g Since NF(f) INF(g), there is a subsemigroup S of Sg, and a homomorphism e S ontoSf

97 such that 1 Pg S2 == 0(sl) pf O(sr) The required homomorphism H from (Sf,') to (Sg,) is determined by the action of H on the elements of Sf. Let H be chosen so that, for all s c Sf, H(s) = s, where o(s) = s. Such a choice is possible, since 0 is onto. Let s1s2...s be a word of Sf. Then 1 * n f F(sls2...sn) = F(s * s2.. * sn) and GH(slS2..n) = G(SlS2.. sn) G(sl s*... e sn). Since 0 is a homomorphism, 0(s1 *2 ** s.. = ( sn) * 8(s2) ( * * 0(sn) = s s2 * * Thus, h can be constructed if, for all s e S, the value of F(0(s)) can be predicted from the value of G(s). This is equivalent to the statement that G(s) = G(t) == F(0(s)) = F(0(t)). This is true, since

98 G(s) G(t) == s P t == 0(s) pf 0(t) ==FF(O(s)) = F(0(t)). This completes the proof of Theorem 1. and with it our discussion of simulation. The Krohn-Rhodes Normal Form will appear again later, in our discussion of the decomposition of sequential machines,

9. REGULAR EVENTS AND REPRESENTABLE FUNCTIONS In Chapter 7 we proved that a function f is representable by a finite-state sequential machine if and only if the right congruence R_ -f is of finite rank, or, equivalently, if and only if the number of distinct functions f is finite. In the present section we introduce regular events x and regular expressions, and thereby obtain another characterization of representable functions. Much of the material we cover can be found in: J. Ao Brzozowski [ 7] and D. Arden [ 5]. To begin with, we shall consider a restricted class of finite-state machines, the finite automata, or finite-state acceptors. A finite automaton has only two output symbols, A and R. A sequence x is said to be accepted if res(x) = A, and rejected otherwise. It is convenient to specify a finite automaton as a quintuple 6? = (A, Q, q0, F, ) where F C Q, and 5(q) = A if and only if q C F; the set F is called the set of final states. Note that the word x is accepted if and only if rp(x) c F. * * An event over A, is a subset of AL. The finite automaton represents the event P if P is the set of words accepted by 6. An event is representable if and only if some finite automaton represents it. The characterization of representable functions in terms of right congruences yields, as an immediate consequence, a characterization of representable events. For any event P over A, and any word x AI, D (P), the derivative of P with respect to x, is defined as follows: 99

100 D (P) = y | xy e P)} Lemma 1 The event P is representable if and only if the number of distinct derivatives D (P), as x ranges over A, is finite. PROOF~ By definition, P is representable if and only if the following function f(x) is representable by a finite automaton: A if x c P f(x) - R otherwise. But, for x and y E A1 the functions f and f are identical if and only if, for all w, xw C P = —> yw c Pc ioe, if and only if Dx(p) = Dy(P)o Therefore, the number of distinct functions f (w) is finite if and only if x the number of distinct derivatives D (P) is finite. x The main result of the present section is that the representable events over A are precisely the regular events over A, which we now proceed to define In the definition, we shall use the following operations on events:

101 union P U Q = (x |x E P or x e Q) concatenation P ~ Q = (xy Ix e P and y c Q} n-th power Pn, where n is a nonnegative integer, is defined recursively as follows: P~ = (e); for n > 0, = pn-lP. 00 iteration P = = (e P U 2 U n=O * Let A be a finite alphabet. The definition of regular event over A is recursive. i) the empty set cp is regular; ii) the set (e) is regular; iii) for each a e A, (a) is regular; iv) if P and Q are regular, then P U Q and P ~ Q are regular; v) if P is regular, then P* is regular; vi) no event is regular unless its being so follows from (i)... (v). Regular events may be denoted by well-formed formulas constructed from the constant events cp, (e), and (a), a c A, together with the binary operators v and o, and the unary operator *; such formulas are called regular expressions. In writing regular expressions, certain conventions are useful. Any set consisting of a single word may be written without braces, and the'.' between concatenated events may be omitted. Thus, instead of (a).(b}l(c.d) we may write abcd. The operation * has precedence over ~ and U, and ~ has precedence over v; thus: P* U (QoR) may be written as P* U QR.

102 The following identities, valid for all events P, Q, and R, are direct consequences of the definitions: PU P P P UQ = Q UP P U (Q U R) = (P U Q) U R P(QR) = (PQ)R PQ U PR = P(Q U R) PR R U QR (P U Q)R (P U Q)* (P U Q ) = (*p-) Pe = eP = P Pp = cp = p * p = e cp UR = R The associative laws enable us to mit parentheses according to the usual convention: (P U Q) U R = P U Q U R It is useful to acquire facility in translating verbal descriptions of events into regular expression notation, when possible. For example, the event consisting of those sequences of a's and b's which end in b, followed by an odd number of a's, followed by an even number of bIs is (a U b) b(aa) a(bb)O The event consisting of those sequences of a's and b's which either do not include three consecutive a's, or have two consecutive b's since the last three consecutive a's, is (e U (a U b)* bb)(b U ab U aab)*(e U a U aa). The importance of regular events is underscored by the following thecrem, which is the main result of the present section.

103 Theorem 1 An event P over A — is representable if and only if it is regular. As by-products of the proof of Theorem 1, we shall obtain two useful algorithms. The first of these, given a finite automaton representing the event P, produces a regular expression for P. The second, given a regular expression for P, produces a finite automaton representing P. The proof of Theorem 1 consists of the proofs of corollaries 1 and 5. We begin by proving that every representable event is regular. Let P be the event represented by the finite automaton o (= (AI, Q, q0, F, \x, where AI = (al.,a2,...,am,, and Q = ({oqrlq.oqnl - For j=O,l,...,n-l, let j = x rp (x) = j }. Then P = X.. J Ij 1, F) In order to prove that P is regular, it is sufficient to show that, for each j, X. is a regular event. For any two indices i and j in the set (O,l,...,n-1), let Ci.. be the following regular event:

lo4 x e a.. < > x C A xc ij 13 and x(qi x) = qj In other words, a~ij is the set of input symbols which cause a transition 1j of C from qi to qj. Now, if x e A, and a c AI, rp(xa) = \(rp(x),.a) therefore rp(xa) = qj == for some i, rp(x) = qi and a c aijo Accordingly, the events X. are related by the following system of simultaneous equations n-1 X0 = e U U XiaiO X = eU Xca 0 i=O iii (1) n-l Xj U X.ij j=l,2,..,n-l. i=0 We shall prove that the system (1) has a unique solution, and that, in this solution, each event X. is regular. The proof requires the J following lemma. Lemma 2 (Arden) Let A and B be events such that e, A. Then the equation (2) X = XA U B has the unique solution X = BA

105 PROOF: Let X0 be a solution of (2). By induction on n, BAn C X for all n > 0; therefore, BA* C XOO Assuming that X0 / BA*, let w be a shortest word of X0 not in BA*. Then w e Xo A U B, and since w, B, w E XoA; i.e., W = yz, where y E X0 and z E A. Since e $ A, lg(z) > 1, and lg(y) < lg(w). By the minimality of lg(w), y E BA*; but then w c (BA*)A C BA*. This contradiction establishes that X = BA*. Theorem 2 For i=l,2,...,n, and j=1,2,...,n, let a.. be a regular 13 event not including the null sequence e, and let b. be a regular event. Then the system of equations n ^Y = JU Y a.. v b i ij jo i=l has a unique solution, and each of the events Y. in this solution is regular.J this solution is regular.

106 PROOF: The proof is by induction on n. Basis. n=l. By Lemma 2, Y1 = blall, which is regular. Induction step. Suppose the result holds for k=n-l. Using Lemma 2, we may solve for Y in terms of the other variables: n Y = ((U Yiail) U bl)al. (3) i=2 Substituting in the other equations, n Y.= Y (ai. U aaij ) U (b U b a ll ) j U 13 il 11 13 j 1 11 ij = i=2 Each event * a.. U ao alla. ij U ailal ij is regular and does not contain the null sequence, and each event b. U b a a j U blallalj is regular. To see this, one uses the facts that the regular events are closed under o, U, and *, and that e E AB if and only if e E A and e c B. Therefore the induction hypothesis applies, and, for j=2,3,.o,n, each event Y. is a uniquely determined regular event; by (5), Y1 is also a uniquely determined regular event. Corollary 1 Every representable event is regular~ PROOF: Theorem 2 applies to the system (1); therefore, eachevent X. is J regular. The proof of Theorem 2 suggests that a system such as (1) can be solved by elimination. This technique is applied in the following example.

107 Example 1 Consider the following finite automaton: a b b ko 7 <II/A ) (<2A ) )a b a X = e UXb -U Xla X1 = Xa U X2b X2 = Xlb U X2a Solving the first equation for XO: X0 = (e U Xla)b* Substituting in the second equation: X1 = (e U Xla)b*- U X2b = b*a U Xiab*a U X:b. Solving the third equation for X2 X2 = Xlba* Substituting in the equation for X1: 1 = b*a U Xlab*a iJ Xlba*b Solving for X1: X1 = b*a(ab*a U ba*b)* X2 = b*a(ab*a U ba*b)* ba* P = X U X2 = b*a(ab*a u ba*b)(e U ba*) Before going on to prove that every regular event is realizable, let us derive some further applications of Theorem 2. A nondeterministic finite automaton ~ is a quintuple

108 =. (AI, Q, Q0, F, L), where Q = (col,^' oo,%nl), QO C Q, F C Q, and L is a function from Q X A to subsets of Q. The set QO is the set of initial states, and F is the set of final states. A nondeterministic automaton may be interpreted as having a number of choices for its "move". When it is in a given state q and receives an input a, it may go to any state in the set L(q,a); of course, L(qa) may in some cases be the empty set. A word x = a 1 o anl is accepted by / if there exists a sequence of states q() q(1),. q (n) such that q(O) QO, q(n) c F, and, for i=0,l,...,n-l, q(i+l) c L(q i) a We shall prove that the set of words accepted by a nondeterministic finite automaton is regular. For j=0,l,..,n, let X. be the set of input J words that can cause the automaton to reach qj from some state in Q0; then e C X. if and only if qj c Q0c and the word ~J~~ X aaa x = aa..a o o a is an element of X. if and only if there exists a sequence of states (0) q(1)P. o e (n) such that q C QO (n) and for i=0,looon-l, (i+l) L(q) ai). Then

109 P = X. (j j e F) J Let cij = (aB I a, e a and qj e L(qia)]. Then the following system of equations holds: n-l if qj c QO' = e u ) iaij i=0 if qj Qo0 Xj = X.ij. i=O Theorem 2 applies to this system of equations, and asserts that each set X. is regular; therefore, P is regular. Thus, we have proved the following result. Corollary 2 The set of words accepted by a nondeterministic finite automaton is regular. Up to now, we have taken the point of view that a sequential * machine computes a function from AI to A0; i.e., that it maps each input sequence into an output symbol. It is also possible to consider sequential machines as transducers, which map sequences to sequences. This point of view is generally applied to the class of generalized sequential machines, a variant of the Mealy machines. Nevertheless, for the sake of uniformity we shall continue to consider the Moore model. Let = (AI, Q, A,, q0, A, 6) be a finite-state machine, and let the function seq(x) from AI to A0 be defined as follows:

110 seq(e) = res(e) = 5(qO); if x = aOal,..,anl, then seq(x) = res(e)res(al)...res(an_). For j=0,l,...,n-l, let Zj = (seq(x) I rp(x) = qj. Also, let Sj = [qI. 3 a2 (qiaQ) = qj} Then the following system of equations holds: Z0 = seq(e) v ( J Zi) 5(q0) itS0 Z = ( Z ) (qj), j=l,2,...,n-l. iS. Since this system of equations satisfies the conditions of Theorem 2, each of the events Zj is regular. Therefore, n-l ) Z. = (seq(x) x c A) J=o J is regularo Thus, we have proved the following result. Corollary 3 The set of all output sequences that a given finite-state machine can produce is a regular set. Using Corollary 3, we shall now obtain a stronger result;.: Let 7 be the machine introduced above, and let P C A, be the event represented by the finite automaton Q = (A^, Q~, q^, F, \).

111 Let't be the finite-state machine t - (AI, Q X Q, Ao, (qO'qo)' 8'5), where x'((q,q),a) = (X(q,a), 7(q,a)), and',(q,q) = 8(q). Then, for any sequence x, seq (x) = seq,(x) and x C P 3, &rp (x) = (qq), where q e F. Therefore, (seq (x) | x e P) = K (seq (x) rp (x) = (q,)}. M((qA) I-F) ru By the argument given in proving Corollary 3, this set is regular. Therefore, we have proved the following result. Corollary 4 Let 7., be a finite-state machine with input alphabet AI, * and let P C AI be a representable event. Then (se-qt(x) | xEP) is a regular event; i.e., the function seqT(x) maps representable events onto regular events. We shall next complete the proof of Theorem 1 by showing that every regular event is representable. Our method of proof will consist of showing that every regular event has only a finite number of derivatives.

112 It will prove useful to introduce the following operation on events: e if e c P A(P) = cp if e P The following identities, valid for all x and y in A*, all a e A, and all events P and Q over A, are direct consequences of the definitions of the operations A(P), D (P), P U Q, P'Q, and P*. A(P -U Q) = A(P) U t(Q) A(P.Q) = A(P)-A(Q) A(P*) = e e if x c P Z(D (P)) = cp if x ~ P De(P) = P Dx(P) = D (D (P)) D (P U Q) = D (P) Dx(Q) D (PQ) - D (P)QU U A(Dw(P))Dw ( ) ((W1,w2) ww2=x} 1 2 for x. e, D (P*) = D (P)P* {((w1,w2) I w1w2=x and wl P*) w2 Da(P U Q) = D (P) U D (Q) D (PQ) = D(P)Q U A(P)D (Q) D (P) (P)P

113 Most of these identities follow directly from the definitions. We shall prove the two most difficult ones. (1) D (PQ) = D (P)Q IJ A(D (P))D (a). 1 X ~ X c 1) (Q (P))% W( W ((wlw2) J w1w2=x} 1 2 PROOF: y C Dx(PQ) <==> xy e PQ. It is convenient to distinguish the two ways in which xy can be an element of PQ. i) y = vw, where xv c P and w E Q; ii) x = wlw2, where wl c P and w2y e Q. The set of words v such that xv e P is D (P). Therefore, the set of words y satisfying (i) is D (P)Q.: The set of words y such that w2y e Q is D (Q); also, wl c P if 2 and only if A(D (P)) = e. W1 Therefore, the set of words y satisfying (ii) is (D5 (P))D} (Q). ((w1,w2) w[ W2=x} 1 2 Taking the union of the two sets of words, we obtain (1). (2) For x A e, DX(P*) (w=x and w P D (P)P*.

114 PROOF: y c DX(P*) B==- xy C P* < x=wjlW2 and y=z1z2 where wl C P*, z2 c P*, and w2z C P. Hence zl C D (P), and Zlz2 C D (P)P*. Taking the union over all choices of (wl,w2) satisfying the required conditions, we obtain (2). The following two lemmas establish certain properties common to all the regular events over an alphabet A. In both lemmas, the method of proof is to show, for every regular expression R, that the event it denotes has the required property. Each of the proofs is by induction on the number of occurrences of the operators U,', and * in the expression R; this number i(R) is called the index of R. The induction steps of the proofs make use of the fact that, if i(R) = n > 0 then either R = P UQ, or R = PoQ, or R = P where P and Q are regular expressions of index less than n. Lemma 3 If R is regular, then, for all x c A*, D (R) is a regular event.

115 PROOF: Basis. i(R) = 0. Then R = cp, R = e, or R = a2, an: element of A. Clearly, all the derivatives of these expressions are regular. Induction step. Let R be of index n, and take the induction hypothesis that the derivatives of events denoted by expressions of index less than n are regular. Either R = P U Q, R = P.Q, or R = P-. Inspecting the formulas for Dx(P U Q), D (P'Q), and D (P*), we find that, in each case, D (R) can be composed from regular expressions by a finite number of, U, and * operations. Therefore, D (R) is regular. x Lemma 4 If R is regular, then R has only a finite number of distinct derivatives.

116 PROOF: Basis. The result is obvious if i(R) = 0. Induction step. Let R be of index n, and take the induction hypothesis that events denoted by expressions of index less than n have only a finite number of distinct derivativeso If R = P U Q, then, for any x, D (R) = DX(P) U DX(Q). Then, if P has p derivatives, and Q has q derivatives, R has at most pq derivatives. If R = P~Q, then, for any x, Dx(R) = DX(P)Q UU DW (Q), where the union is taken over some of the derivatives of Q. The number of ways of choosing Dx(P) is p, and the number of distinct unions of q q derivatives of Q is at most 2qo Therefore, there are at most p 2 distinct events having the form required of Dx(R). If R = P* then, for x / e, D (R) = D (P)P:* x W2 where the union is taken over some of the derivatives of P. The number of events expressible in this form is at most 2P. Corollary 5 Every regular event is representableo

117 PROOF: By Lemma 1, every event with a finite number of distinct derivatives is representable. This corollary completes the proof of Theorem 1; henceforth, the class of regular events and the class of representable events may be identified with each other. The states of the minimum-state finite automaton representing a given regular event R are in one-to-one correspondence with the distinct derivatives of R. Let < D (R) > denote the state corresponding to Dx(R). x Then the transition function of the reduced machine is as follows'(< D (R) >, a) < D (R) > Also, < D(R) > E F = A(Dx(R)) = e, which means that x e R. It is easy to show that, if < D (R) > is the initial state, then, for all x, rp(x) = < D (R) >, so that x is accepted if and only if x c R. A first step in constructing a finite automaton representing R now suggests itself. Starting at the root of the free tree generated by A, and moving up level by level, label each node n with an expression for the event D (R). The label for n may be obtained from the label for n x xa x by means of the identity DXa(R) = Da(Dx(R)). "xa^ a x^ If D (R) is found to be equal to a derivative previously computed, do not branch out from the node n y The difficulty with this process is that we have no procedure, as yet,

118 for testing whether two regular expressions, corresponding to derivatives DX(R) and Dy(R), denote the same event. Nevertheless, the construction y process can be made to work, provided that care is used concerning some computational details. First, we must assume that R is in such a form that no parentheses implied by associativity are omitted. The derivative Dxa(R) is computed as D (Dx(R)), where the operation D on expresxa a a sions is defined recursively as follows: Da(e) = p, a D (cp) = p, a D (b) = Cp a if b 4 a, D (a) = e,, (Da(P) oQ) if A(P) cp D(PQ) = ((Da(P)'Q) U Da(Q)) if A(P) = e, Da(P*) = (D (P) P*). Let two expressions be called similar, or of the same similarity class, if each can be converted to the other solely by means of the following identities: R UR = R P U (Q U R) = (P U ) U R PU Q =Q U P cpP = Pcp = eP = Pe = P.

119 It is clear that similar expressions denote the same event, and that it is effectively decidable whether two regular expressions are similar. Theorem 3 Let R be regular. Then, among all the expressions for the derivatives of R, only a finite number of similarity classes occur. PROOF: We use the same induction scheme as in Lemmas 3 and 4. Basis. For i(R) = 0, the result is obvious. Induction step. Let R be of index n, and take the induction hypothesis that the theorem holds for all expressions of index less than n. If R = P UQ, then, by induction on lg(x), it follows that each expression D (R) x takes the form Dx(P) -U D(Q). If R = P.Q, then, by induction on lg(x), each expression D (R) is of the x same similarity class as (Dx(P)~Q) U U D (Q)w 2 where the union is taken over all words w2 such that x is of the form wlw2, where wl e P. If R = PP* then, again by induction on lg(x), each expression Dx(R) is

120 similar to D (P) P* 2 where the union is taken over all w1w2 such that x = wlw2 and w C P*. In all three cases, the induction hypothesis is sufficient to show that the derivatives of R fall into a finite number of similarity classes. Since the number of similarity classes of derivatives of R is finite, the process of labelling nodes of the free tree generated by A with derivatives must terminate, provided that no branches out of n are constructed if Dy(R) is similar to some derivation previously computed. Once this construction process has been completed, an automaton representing the event R can easily be constructed. Let [D (R)] denote the similarity class of D (R). The automaton is as follows: ([ = (A,([Dx(R)], x c A*}, [De(R)], F, \), where [D (R)] c F if and only if A(Dx(R)) = e, and x([D(R)], a) = [Dxa(R)] Example 20 Suppose A = (ab}, and R = ((ba*)(ab*a v ba*b)*) The labelled tree showing the calculation of derivatives is as follows:

121 ((a*b)(ab*a U ba*b)*) (abq*a -U ba*b)* ((b*a)(ab*a U ba*b)*) (a*bab*a ba*b))* \ N b/ (ab*a U -5a*b)* (b*a)(ab*a U ba*b)* /< / ((b*a)(ab*a U ba*b)*) A finite automaton representing R has the following state diagram. b [De(R) ]/R [D (R).a]/A b [ b Dab(R) ]/R a

122 In this particular case the algorithm leads to a reduced machine. This is not necessarily true in general, since the computation process may produce derivatives which denote the same event, but are not similar. Now that we have a complete characterization of representable events, it is an easy matter to characterize representable functions. Theorem 4 Let A, and A0 be finite alphabets. The function. f: A, - Ao is representable by a finite-state sequential machine if and only if, for each element b e AO, (x f(x) = b) is regular, PROOF: First we establish the necessity of the given condition. Suppose f is represented by the finite-state machine'Z7, with the set of states Q = {%o90'_~9. }~l Then, by an argument previously given, for each j, the set X. = (x rp (x) = q}. is regular, Therefore, (x f(x) = b) = U Xj {j I (qj)=b} J which is a regular seto To show sufficiency, suppose each set Ix | f(x) = b} is regular. Since A0 is finite, for some sufficiently

123 large integer 2 there is a one-to-one mapping g from AO onto a subset of (0,1)}; b $ (al(b), o2(b),..., * (b)). Then, for 1 < k < 2, (X I (f(x)) =1 = U (x f(x) = b). (b | c(b)=l) This set is therefore a regular event, and by Theorem 1, there exists a finite automaton a(k) such that res a k(X) = Ck(f(x)). Now, consider the finite-state direct product machine ( _ / ) X a,(2) (2) Then, for all x, res? (x) g(f(x)). Replacing each output symbol s of an accessible state of by the symbol g (s) c AO, one obtains a finite-state machine' representing f.

10. PAIR ALGEBRAS AND STATE ASSIGNMENT Let = (Ae, Q, A^ o \, 5) be a reduced machine representing the function f: A - A o 1 o Then if = (A, Q, A, q, \, 5) is a machine with all its states accessible which also represents f, A - A -, and, because -r7is reduced, 1, ^, 1, o Therefore, there is a strong homomorphism from rf onto -. Let the functions associated with this homomorphism be ho Q - Q and p: A - A o0 o Then we have the identity res (x) = c(res J x); and, since 17X and 7X both represent the same function, cp must be the identity functiono Therefore, the function h satisfies the following equations: i) h(qo) = ii) h(X(qa)) = 7(h(q),a) iii) g(h(q)) = 5(q) 124

125 The existence of a function h satisfying these conditions is a necessary and sufficient condition for 7t to represent the same function as'1-. When such a function exists, we say that 7i realizes'7. Although the motivation for this concept depends on the fact that 7t is reduced, the definition makes sense whether 7X is reduced or not. The concept of "realization" is important in connection with the logical design of synchronous sequential circuits. A design specification for such a circuit may be given by a representable function f: A -4A Of course, since the input and output alphabets of sequential circuits are given as binary tuples, the set Al in this case will be of the form 0,l}P), and A will be of the form (0,1](r) Let f be such a design specification, and let -7x be a reduced machine representing the function f. Then any sequential circuit satisfying the design specification can be described by a machine n which realizes o. Moreover, since each internal state of a circuit is represented by a binary s-tuple, it follows that Q, the set of (accessible) states of 7X, is a subset of (0,1} (s) for some s. The problem of specifying V7, which involves giving the "coding" of its states as binary s-tuples, so that the resulting sequential circuit will have a simple structure is called the state assignment problem. In this section we shall discuss the algebraic approach that Hartmanis and Stearns have taken to the state assignment problem [11,12]. A finite-state machine Pt = (, Q, Ao, q, An )

126 is state-assigned if Q C R R2 sX. XR, where s > 1, and each of the sets R, has at least two elements. Then any state q c Q is an s-tuple (rl(q), ooo rs(q)). The transition function k: Q X A -Q induces partial functions (undefined for some domain elements).: R1 X R2 X. X RR X AI -4 Rj,2.2 s where j.(r (q) r2(q),.o r (q), a) = rj((q, a) ) 2 s s A state-assigned machine X7T is said to exhibit "reduced state dependence" if the values of some of the functions \. can be determined from the input a, together with only a proper subset of the set of arguments [rl, r2,..., rs } In the case where'71 is a state-assigned realization of ~h, and each set R. = (0,1), the existence of reduced state dependence means that, in a sequential circuit corresponding to 7Z1, some of the "next state" outputs of the combinational part (inputs to latches) can be determined independently of some of the "present-state" inputs (outputs of latches). When a state assignment is made which exploits the possibilities for reduced dependence, the complexity of the resulting combinational circuit is usually considerably less than it would otherwise be. The concept of reduced dependence in state assignments is closely allied with fundamental ideas about number representations~ Thus, for

127 any integer b > 2, the base b number representation is a "state assignment" of the integers which allows reduced dependence in the formation of sums and products. For example, the k least significant bits of a sum can be determined from the k least significant bits of the two addends. The following example shows a connection between the concept of reduced dependence and another kind of number representation, the residue number system, whose application in computers has been studied by Garner, Svoboda, and others. Example 1. Consider a sequential machine Z71 for which AI = (a,b), Q= A (q0' ql..' q209 and, for all j, k(q9ja) = qj+l(mod 210)' and, (cj) = _q. Then the following state assignment is possible: q = (rl(J), r2(q ), r5(q ), r4(q)) where the four coordinates are the residues of j modulo 2, 3, 5, and 7, respectively. With this assignment, the state transitions of the machine can be determined separately in each coordinate: Xl(rl, r2, r3, r4, a) = r1 + 1 (mod 2) 2(rl, r2, r3, ro, a) = r2 + 1 (mod 3)?3(rl, r2, r3, r, a) = r3 + 1 (mod 5) kj(rl, r2, r5, r4, a) = rh + 1 (mod 7)

128 We shall investigate the properties of a machine ) which govern the existence of state-assigned realizations 75 with reduced dependence. Given such a realization 7, with QCR1 R2 >.., X R each subset S C (l,2,,, s) determines an equivalence relation pS over Q as follows: qlPS q2 ~== for all j e S, rj(ql) = r (q2). In what follows, it will be convenient to speak interchangeably of equivalence relations over a set Q and of the partitions of Q induced by such equivalence relations. An ordered pair (a,T) of equivalence relations over Q is called a partition pair if the following implication holds: qC a a2. = a, a(il8) T (q2 a). Note that (cr,) is a partition pair if and only if a is an equivalence relation with S.Po In terms of partition pairs, the following characterization of reduced state dependence may be given. Theorem 1 Let S and T be subsets of (12,ooos). Then the following conditions are equivalent: 1) for all k c T, the value of the function \k is determined completely by the arguments a and (rj I j c S); 2) (pq p ) is a partition pair~

129 PROOF: Condition (1) means that whenever, for all j C S, rj(ql) = r(q2), ik(q1la) = i(q2a) for all k c T. This is equivalent to the statement that, if qlPS q2 then, for all a, \(ql,a)PT(q2, a); and this means precisely that (PSPT) is a partition pair. This completes the proof. Now we have seen that the existence of partition pairs (PS,PT) determines the existence of reduced state dependence in a state-assigned realization'/- of Z o Let us next consider the properties of`7&S which determine the existence of reduced dependence in its realizations. Suppose that ^7( realizes -776, and that (o,T) is a partition pair of 96. Let the equivalence classes associated with a be B1,B2,...,Bt, and let the equivalence classes associated with T be C1,C2,...,C o Then, for any j E {1,2,oo.,t}, and for any a E A, X(Bj,a) = (X(q,a) I q E Bj is contained in some set C k(j a) Let h be the homomorphism from 7^' onto /, and let h(Bj) = (h(q) | q E BA}. J J Then, for any j C (1,2,.,,t), X(h(B ),a) = {((q,a) Iq C h(B.)} = (X(h(q), a) q e Bj. Because h is a homomorphism, this set is equal to (h(X(q,a)) I q E Bj}, a

130 which is contained in h(Ck(j a)). Thus, the collections (h(Bj)} and (h(Ck)} have the property that the image of each set h(B.) under an input is contained in some set h(Ck). Also, since h is onto U h(B) U h(Ck) = Q J k In the particular case where h is one-to-one, so that it and 7t are isomorphic, each of the collections (h(Bj)j and (h(Ck)) is a partition of Q9 The foregoing discussion suggests that the definition of partition pair be extended to "overlapping partitions", as follows. A collection (D.j j=l,2,o..,t) of subsets of Q is a cover of Q if U D. = Q j J The ordered pair of covers ((Dj], (Ek)) is a cover pair (with respect to ^fJ" ) if, for all j, and for all a e A, x(D.,a) =| q a D:} is a subset of some set Ckd A cover cp is said to have the substitution property (SoPJ) if (cpcp) is a cover pairo The following example indicates the use of covers with S.P. in the construction of state assignmentso

131 Example 2. qb b This machine'-' has the following two covers with S.P.: cp: 012 3 and 0: 03 13 23. The first of these is a partition with S.P., but the second is not a partition By making three "copies" of state 3, we can obtain a machine 4y7 which realizes >, and has partitions with S.P. cp' and 0', such that the homomorphism from t onto fi' maps the blocks of cp' onto the blocks of cp, and the blocks of 0' onto the blocks of 0. b cp' = 012 04 i' = 03 T 25

132 The homomorphism h is as follows~ h(qO) = qo h(ql.) = h(q2) = q h(q) = h(q4) h(q5) 93.' We can now construct a state assignment for 77 in which p' is the partition associated with the equivalence relation p{l, and 0' is associated with Pp2}~ q rl()r, 2(q) qO A a qlA A q2 A 7 q5 B B q4 B f 5 B P Because p{1} and p(2) are equivalence relations with S.P., reduced dependence exists, and as far as its state transitions are concerned, 77V can be represented as a pair of machines operating in parallel, one keeping track of the coordinate rl, and the other keeping track of r2, as follows.

155 a a a \a b bb 2 a We next present a theorem which shows how the existence of partition pairs in the realizations of 7t is dictated by the cover pairs for 1'. For any cover cp, and any state q, let n (q) denote the number of blocks of cp in which q occurs. Theorem 2 Let (cp,) = ((Dj}, (Ek)) be a cover pair for 7t. Then there exists a realization 7?t of 7i with zi max[n (qi), n0(q.)] states, having a partition pair ((A.}, (Bk)) such that, for all j, h(A.) = D. and, for all k, h(Bk) - Ek' PROOF: For each state q c Q,.t will have max(n (q), n((q)) states whose image under the homomorphism h is q. Call these states qA' qB'... Then enough "copies" of each state q are available to permit the construction of partitions (Aj. and (Bk) of Q, the set of states of 7', such that, for all j, h(A^) = D and, for all k, h(Bk) = E.k

134 For each pair (j,a) choose k(j,a) such that X(Dj,a) C Ek(ja)o The transition function of 27' is now constructed as follows: if qp A - then q C Dj. X(q,a) C E(a and, for some R, [X(qa)]R Bk(ja)' then set X(qpsa) = [X(qsa)]R. Then X(Aa) C Bk(j,a)so that ((Aj)] (Bk]) is a partition pair. Also, taking (qo)A as the initial state of ->', and setting 5(qp) = 5(q) for all states qp of M, we find that all the conditions for a homomorphism are met, so that'27X realizes -l Example 3~ Let X be given by the following table:

135 a b 1 4 Q 3 4 3 4 3 3 Then the ordered pair (q,0) = ((123, I, 2 ~), (1, 24)) is a cover pair. Performing the construction given in the proof of Theorem 2, the following realization >7S of 2fZ is obtained. a b A 1A B 2A 4A 2A 5A 4A 3A 3B4B 5^ 2B 4B 2A __ B AC 5 30 4B 3A The following is a partition pair (a,T) for 72~: ((lA 2A 3A' 3B 4 2B 3C 4B' (1A 4A' 2A 2B 3A 5B 3C 4BI3 The homomorphism from'76' onto 7? is as follows: 1A, 2A 2, 2, A 3 3' 3B 3, 3 3 4A - 4, 4 -e 4. The images of the blocks of a are the blocks of cp, and the images of the blocks of T are the blocks of 0.

136 Since the cover pairs of.7~ determine its state-assigned realizations with reduced dependence, it is of interest to consider the algebraic structure of the set of all cover pairs of -29. We shall also be interested in the structure of the set of partition pairs of n-~, which determine the existence of reduced dependence in state-assigned machines isomorphic to.t. Let (Dj. and (Ek) be covers, and let (F } and (Gm be collections of subsets of Q such that each of the sets FR is a subset of some set Dj, and each of the sets G is contained in a set Ek. Then it is easy to m k check that either all of the following are cover pairs, or none are: ((Dj), (Ekl), ((Dj, Ek (E U (Gm) U (FD U F, Ek)), and ((Dj) U (FJ, (Ek) U (Gm})) Thus, in characterizing the cover pairs of -72i, we may as well consider only those covers in which no set is included in another. Accordingly, we are led to the following definitions. A set system is a collection (D1, D2, o oo Dt) of subsets of Q such that t U D. = j=l J and, if D. CDo Y then j = j2~ If (Si) is any collection of subsets of Q, let Max ([Si)) denote the collection consisting of those sets in (Si) that are not properly contained in any other set in (Si). Then, if (Dj. is a cover, Max ({Dj)) is a set system. A systems pair is a cover pair in which both covers are set systemso The set systems associated with Q may be partially ordered as follows: by < if every block of cp is contained in a block of cp2.

137 Lemma 1 The set systems associated withQ form a distributive lattice. PROOF: The lattice operations are defined as follows: glb(cp1'cp2) = P cP2 = Max (B n B' | B e ( and B' E cPr lub(cpl'cp2) = 1 + cP2 = Max (B IB e cplor B E cp}2] We omit the simple verification that these operations define the greatest lower bound and least upper bound of cp and cp2, and that the distributive laws hold: P (C2 + CP3) = ( P' (P) + (P' C3) Pi + (2 ~ P3) = (P + 2)' (Pi + P3)' The universal lower bound 0 for the finite lattice of set systems has each element of Q in a block by itself, and the universal upper bound I = (Q}. Lemma 2 If (OlP cpl) and (cP2 cP) are systems pairs, then ( P1 P2' 2P 3 ~ cp) and (P! + cp2' ci + cp) are systems pairs. PROOF: Any element of c1 C2 is of the form B n C, where B is a set contained in cp1 and C is contained in cp2 Since (c1' cp) and (c2', 2) are systems pairs, it follows that, for any a E A:, there exist sets B' e qcp and C' C cP. such that l(B,a) CB',

138 and \(C,a) C Ct Therefore, \(B n C, a) C B' n Ct, and, since B' n C' is contained in some block of cp1 cp2, the conditions for (cp1 cP2 cp{' cpa) to be a systems pair are satisfied. Any element B cp1 + p2 is either an element of c1 or an element of cp; then (B,a) C B', where Bt is either an element of cp or an element of c2 Accordingly, Bt is a subset of some block of cp + cp, and \(B,a) is contained in this block, so that the conditions for (P1 + cP2 cP1 + cPT) to be a systems pair are satisfiedo Lemma 3 For any set systems cp and cp' (cp, I) and (0, cp') are systems pairs. Hartmanis and Stearns have introduced an abstract algebraic structure which captures the essential properties of the set of all systems pairs. Let L1 and L2 be finite lattices, and let A be a subset of L1 X Lo Then A is a pair algebra on L X L if 1 2 1 2 (a) (xyl) c A and (x2,y2) c A = (Xl o x2 y y2) e A and (x1 + x2, y + y2) e A.

139 (b) If I is the universal upper bound of L2 and 0 is the universal lower bound of L1, then for any x C L1, (x, I) e A, and for any y C L2, (O,y) c A. Theorem 3 Let L be the lattice of set systems of Q, and let A C L X L be the collection of all systems pairs for 2 t Then A is a pair algebra. The proof of this result is a direct consequence of Lemmas 2 and 3. Before going on to study the properties of pair algebras, let us establish that the set of partition pairs associated with a machine 2 is a pair algebra. Let a and a' be partitions of Q. Then a < a' if and only if every block of a is contained in a block of a'. Lemma 4 The partitions of Q form a lattice. PROOF: The lattice operations are defined as follows. The blocks of glb(ala2) = a1 a2 are the set intersections of the blocks of a1 with the blocks of a2 The states ql and q2 are in the same block of lub(al,a2) = a1+ a2 if and only if there is a sequence zzlz2,.. z of elements of Q such that zl=ql, z =q2, and for i=1,2,...,n-l, zi and zi+ are either in

140 the same block of a1 or in the same block of o2. The verification that the operations just defined yield the greatest lower bound and least upper bound of a1 and a2 is essentially the same as the corresponding step in the proof that the right congruences over A form a lattice (Theorem 5-3). The following lemmas will establish that the partition pairs of 7%b form a pair algebra. Lemma 5 If (o19T1) and (c2rI2) are partition pairs, then (d1. 29 T1 T2) and (d1 + 12 I1 + T2) are partition pairs. PROOF: Every block of at ~ 2 is of the form B n C, where B is a block of ol and C is a block of a2. Then %(Ba) C B', and X(C,a) C C, where Bt is a block of T1lock of is a block of T2; thus k(B n C, a) C Bt n ct, and Bt n Ct is a block of T1 ~ T20 This completes the proof that (c1 ~ o2' T1 T) is a partition pair. If ql and q2 are in the same block of al + cr2 then there is a

14l sequence Zlz2,* z n such that Z1 = z = Zn q2' and for i=l,2,...,n-1, zi and z+l are either in the same block of Cl or in the same block of a2. For any given input symbol a, consider the sequence (\(zl,a), k(z2a)..., k(zn,a)). If z. a1 zi, then (zia) 1 %(zi+la), and if zi a2 z+, then (zina) 2 (zi+la), since (r1,T1) and (2rT2) are partition pairs, Since Zi = ql and qn =q2' we have proved that, if ql and q2 are in the same block of a1 + c2, then for any a, k(ql,a) and \(q2,a) are in the same block of T1 + T2. This completes the proof. The universal upper bound I of the lattice of partitions of Q has a single block containing all the elements of Q. and the universal lower bound 0 has each state in a block by itself, Lemma 6 Let a and T be partitions of cp. Then (a, I) and (0, T) are partition pairs. Combining Lemmas 5 and 6, we may state the following theorem.

142 Theorem 4 The partition pairs associated with a machine 79 form a pair algebra. We shall next derive some properties of pair algebras in general. Once this has been done, we shall consider how some of the results apply to the class of systems pairs and the class of partition pairs. In what follows we assume that A is a pair algebra on L1 X L2. Lemma 7 If (x,y) C A and x1 < x, then (x',y) c A; if (xgy) C A and y < y' then (x,y) C A. PROOF: Since (x,y) e A and (xs,I) e A, (x~xX9 yI) - (x,y) c A, Since (x,y) C A and (O,y') C A, (x + 0, y + y) =(x,y) C A. For any set S of lattice elements, let iS denote the greatest lower bound of S, andES, the least upper bound of S. For x C L1, let m(x) be defined as n (y I (x, y) C A), and for y e L2, let M(y) be defined as z2 xi I (xiy) E A)} The interesting properties of pair algebras all seem to involve these operators. In deriving these properties, we can cut our work in half by noting a certain duality~ If L is a lattice with ordering relation

143 <, then a dual lattice L is obtained by taking the elements of L with the ordering relation > defined as follows: y > x x < y. Then the ~ operation in L is the + operation in L, and the + operation in L is the' operation in L. Universal upper and lower bounds in L, if they exist, are respectively universal lower bounds in L. If A is a pair algebra on L1 X L2, then a dual pair algebra A on D D L2 1 is obtained as follows: D (y,x) C A A=# (x,y) E A. D D D Let the operations m and M in A be denoted m and M, to distinguish them from the corresponding operations in A, which we continue to deD note as m and M. Then the functions m and M are identical, and the functions MD and m are identical. Thus, any theorem T about the class of all pair algebras applies to both A and A, and when T, as it apD plies to A, is translated into a statement about A, a dual theorem T is obtained. The properties given in Table 1 are listed in dual pairs.

144 (i) (x,m(x)) c A (M(y),y) C A (ii X () m(x) y 2 = M(yx) < mM(y2) (iii) m(xl+x2) = m(xl) + m(x2) M(y1 y2) M(y) M(y2) (iv) m(xl.x2) <m(xl) m(x2) M(y1) + M(y2) < M(yl + y2) (v () ym(x ) < y < A x < M(y) < = (x,y) e A (vi) x < M(m(x)) m[M(y)] < y (vii) m(M[m(x)]} = m(x) M(m[M(x)]} = M(x) TABLE 1 We shall prove that the first of each dual pair of properties holds. (i) Let y2,Y.y ~, n be the elements Yi of L2 such that (x,Yi) C A. Since (x, y) e A and (x,y2) e A, (x.x, Y1Yy2) = (x, y1y2) C A. Repeating this argument n-1 times, (x,yl. Y2,..' y = (x,m(x)) C A. (ii) Since (x2, m(x2)) E A and x1 < x ^ (x1, m(x2)) e A; since m(xl) < y whenever (xlY) e A, m(xl) < m(x2). (iii) Since (x1, m(x1)) E A and (x2, m(x2)) C A, (X1 + X2, m(xl) + m(x2)) E A; therefore, m(x1 + x2) < m(x ) + m(x2) On the other hand, since x < x1 + x2 and x2 < x + x2, m(x,) < m(x, + x2) and m(x2) < m(x1 + x,);

145 therefore m(xl) + m(x2) < m(x1 + x2). Combining the two results, m(x1) + m(x2) = m(x1 + X2). (iv) Since (x1, m(xl)) C A and (x2, m(x2) e A, (x1 x2, m(xl) m(x2)) c A; therefore m(xl x2) < m(x1) ~ m(x2). (v) If (x,y) E A, then m(x) is a lower bound for y. Conversely, if m(x) < y, then since (x, m(x)) E A, we know by Lemma 7, that (x,y) E A. (vi) Since (x, m(x)) E A, the result (v), in dual form, ensures that x < M(m(x)). (vii) Since x < M(m(x)), m(x) < m(M(m(x)). But, by (vi), in dual form, taking y as m(x), m(M(m(x)) = m(x). It is interesting to note that the functions m and M establish a D Galois connection between the lattices L1 and L2. In general, a Galois connection between two partially ordered sets S and T is a pair of functions into a: S - T into S and T: T -4 S

146 such that: (a) xl < x2 = (x2) < o(x1) (xl,x2 E S) (b) Y1 < Y2 = T(Y2) < (1) (YlY2 c T) (c) x < T a(x) (x C S) (d) y < a T(y) (y c T) It is evident that, with appropriate allowance for dualization of L 2 properties (a) and (b) correspond to our identity (ii), and (c) and (d) correspond to (vi). An element (x,y) c A is called an Mm-pair if y = m(x) and x = M(y). By virtue of (vii), the following statements hold: for any x c L1, (Mm(x), m(x)) is an Mm-pair, and for any y c L2, (M(y), mM(y)) is an Mm-pair. As the following useful lemma shows, the Mm-pairs associated with a pair algebra A determine the structure of A completely. Lemma 8 The ordered pair (x,y) is an element of A if and only if there is an Mm-pair (x',y') such that x < xf and y' < y. PROOF: The sufficiency of the condition is obvious: (x',y ) c A == (x,y) c A. To prove necessity simply note that, if (x,y) c A, either (Mm(x), m(x)) or (M(y), mM(y)) can serve as the required Mm-pair (x',y).

147 If (xl,yl) and (x2,Y2) are Mm-pairs then, clearly, x1 < x2 > Y1 -< Y2 Therefore, the Mm-pairs are partially ordered in a natural way: (x1, Y) < (X2,Y2) if and only if xl < x2 Lemma 9 The set of Mm-pairs forms a lattice in which glb((xlYl), (x2,y2)) = (Xl X2, m(Xlx2)) and lub((xl,yl), (x2,y2)) = (M(Y1 + Y2), Y1 + Y2). PROOF: Since x1 = M(m(xl)) and x2 = M(m(x2) ) x ~ x2 = M(m(x1) o m(x2)) Since m(x1 x2) < m(x1) m~ (x2), M(m(x o x2)) < M(m(xl) m(x2)), so that M(m(xl < x2)) < x1 x2. On the other hand, since (xl o x2, m(xl x2)3 c A, x1 ~ x2 M(m(x x2))o Therefore

148 x1 ~ x2 = M(m(xl "x2)), and (Xl'X2, m(xlox2)) is an Mm-pair, and a lower bound for both (xlvl) and (x2,Y2); it is also the greatest lower bound since (x3,y3) < (xlY1) and (x3,y3) < (x2,Y2) implies x3 < x1 ~ x2. By a dual proof, it is verified that lub((x1,yl), (x2Y2)) = (M(Yl + Y2), y1 + y2). Let A be a pair algebra on L X Lo An element x c L is selfsufficient (with respect to A) if and only if (x,x) C A. Then x is self-sufficient if and only if m(x) < x < M(x), In particular, the elements 0 and I are self-sufficiento A nonempty subset R of a lattice L is a sublattice of L if a,b c R = abb C R and a + b c R, where. and + are the meet and join operations for L. Note that it is possible for R to be a lattice, and yet not be a sublattice of L. For example, the partitions of an n-element set S form a lattice, but for n > 3, the partitions of S do not form a sublattice of the lattice of set systems on S. Lemma 10 The self-sufficient elements with respect to A form a sublattice of L.

149 PROOF: If x and y are self-sufficient then (x,x) c A and (y,y) c A. Therefore, (xoy, x.y) c A, and (x + y, x + y) c A, so that x y and x + y are self-sufficient. Let us now apply some of the foregoing general results to two particular pair algebras::, the pair algebra of parttiton pairs of a finitestate machine, and -, the pair algebra of: systems: pairs of Yo. It is immediate that the self-sufficient elements of H are the partitions with S.P., and, analogously, that the self-sufficient elements for - are the set systems with S.P.: a set system (Bj) has S.P. if, for all j, and for every input symbol a, there is a k such that 3(Bja)c BkO Let us consider how the functions m and M are determined for II The equivalence relation m(a) is to be chosen as "fine" as possible, subject to the condition that (o,m(a)) must be a partition pair. Thus, we are led to the following two-step specification of m(c): (i) Let the symmetric, reflexive relation t(ao) be defined as follows: ql ((o) q2 if and only if';:= - 2.. or ithere exist qi. and qj.C Q, and a e AI, such that qi cr qj (qi,a) = ql, and 7(qja) = q2 (ii) m(o) is the transitive closure of t(oa); i.e. ql m(o) q2 if and only if there is a sequence of states zl, Z2,.,zn such that ql = z l q2 = z, and, for i=l,2,..,n-l, z i( ) i^l

150O It is clear that the equivalence relation specified by this construction is equal to m(a); for two states are equivalent in this relation if and only if the definition of a partition pair, together with the condition of transitivity, implies that these states are equivalent in any equivalence relation T such that (c,T) is a partition pair. The function M(T) associated with the pair algebra I is also easy to specify: ql M(T) q2 if and only if, for all a, \(qla) T K(q2,a). This equivalence relation is clearly the coarsest of all the equivalence relations a such that (a,T) is a partition pair. Example 4, Consider a machine -' with AX = (al,a 2, Q = (1,2,3,4,5,6,7, and the transition function A specified by the following table. A a1 a2 1 6 5 2 6 2 3 3 5 4 1 2 5 2 1 6 4 7 7 4 7

151 Let the equivalence relation a be specified by the following partition: (T 23 5-67). The images of the blocks of this partition under the inputs a1 and a2 form the following set of blocks: ([ 3T 2 -; 2- 2 1-7). Then q i () cqj if and only if qi and qj occur together in one of these blocks. Therefore, m(a), the transitive closure of c(a), is given by the following partition: (17, 245, 3-}. Let us now compute M(m(a)). Denoting the blocks 17, 245, 36 by A, B, and C, we can form a table which specifies, for each state q and input a, the block which includes \(q,a). This table is as follows: A, QT Q2 1 C B 2 C B 3 c B 4 A B 5 B A 6 B A 7 B A Then two states are equivalent in M(m(r)) if and only if they correspond to identical rows in the table. Therefore, M(m(a)) is given by the partition (123 T 57], and the ordered pair ((123 I 56], (T7- -37:) is an Mm-pair. The function m(a) associated with the pair algebra = is quite easy to specify: given a = (Bj}, m(a) = Max ((B(Bj)) U 0). The determination of M(T) is a bit more complicated. Let T = {c,},

152 and suppose that A, = (ala2 *.,,am}; for any m-tuple (kl,k2, o.o,km) of indices of blocks of T, let Bkk2.o k = i)j I ): Ck Then, if (Bk k.. } is the collection of all sets obtained in this Then, if [B~l'k )kl' k2Y... way, M(T) = Max(Bk k 2 ee m Example 5. We continue to consider the machine introduced in Example 4. Let a = (12 -I-467). The images of the blocks of a under the inputs a1 and a2 form the set (1 - 12- 25 127). Therefore m(a) = ( 1 25 127). Let the blocks of m(a) be denoted as follows: v = Ca1 124 = c0, 25 = C39 127 = C40 As a first step in computing M(m(a)), we form a table giving, for each state q and input symbol a, the indices of all blocks of m(c) which include x(q^a)

153 A, a1 a2 1 1 3 2 1 2,3,4 3 1 3 Q 4 1,2,4 2,3,4 5 2,3,4 1,2,4 6 2 4 7 2 4 From the table we see that B1 = 134 1,3 that B 4 = 4567, 2,4 and that every set B k is contained either in B1,3 or in B2 4 Therefore M(m(a)) = {i234, 4567), and the ordered pair ((1234, 4567}, (136, 124, 25, 127)) is an Mm-pair in the pair algebra The next computational problem we consider is that of determining the lattices of Mm-pairs for the pair algebras Tr and - associated with a finite-state machine 7t. For this purpose we shall need a few elementary lattice-theoretic concepts. An element a of a lattice L is meet-irreducible if a cannot be expressed as boc, where b and c are different from a. Similarly, a is join-irreducible if a cannot be expressed as b+c, where b and c are different from a.

154 Example 6. For the finite lattice shown below, the set of join-irreducible elements is (1,2,3,5,6}, and the set of meet-irreducible elements is (4,5,6,7). 7 5 l 6 2 3 Lemma 11 Any element of a finite lattice L is a join: of join-irreducible elementso PROOF: Let a be a least element of L which is not a join of joinirreducible elements. Then a is not join-irreducible, so a = b + c, where b f a and c / a, Clearly, b and c cannot be joins of joinirreducible elements. But if, for example, b is not such a join, then since b < a, the minimality of a is contradicted, Corollary 1 Any element of a finite lattice L is a meet of meet-irreducible elements. Now, suppose that AC L1 x<L is a pair algebra. Let the set of join-irreducible elements of L1 be (al..oooar)} and let the set of meet-irreducible elements of L2

155 be (bl,...,b. Then any element x e L1 is equal to 1 - 1 (ai j a. < x) and m(x) = m( E ai). (a. a. < x} But, from identity (ii) of Table 1 it follows that m( Z ai) = m(ai). (ai a. < x} (a. I a < x) Thus, every element of L2 of the form m(x) is a join of elements of the form m(ai), where each element a. is join-irreducible. By duality, any element y e L2 is equal to (b J|y <b) } aj -aj and M(y) = M(b ). (b y < b} Two algorithms for computing the Mm-pairs associated with a pair algebra A now suggest themselves; the first of these makes use of the join-irreducible elements of L1. The second algorithm, whose application to A is equivalent to the application of the first algorithm to A, uses the meet-irreducible elements of L2. An outline of the first

156 algorithm follows: (1) Determine the join-irreducible elements al,a 2,...,ar of L1; (2) Find m(al), m(a2), o oo m(ar); (3) Form all the distinct joins of elements m(ai); (4) For each element Z m(ai), form the Mm-pair (M(m(Z ai)), m(E ai))o This algorithm yields all the Mm-pairs contained in A. By Lemma 8, the Mm-pairs determine A completelyo To assess the usefulness of the two dual algorithms in determining the Mm-pairs of the pair algebras it and -, let us characterize the join-irreducible and meet-irreducible elements of the lattice of partitions of a finite set Q with n elements, and the lattice of set systems associated with Q; these lattices will be denoted, respectively, P(Q) and S(Q) Let I and 0 denote the universal upper and lower bounds of P(Q). Also, let aab denote the partition with one block equal to (ab), and each other block containing only a single element. Finally, for any proper subset R C Q, let cR denote the two-block partition having R as one block, and the complement of R in Q as the other, Thus, Gab / [a,b). Lemma 12 An element a C P(Q) is join-irreducible if and only if = 0 or a = aab, where a E Q and b e Q.

157 PROOF: 0 is join-irreducible, and the only elements a such that a < aab are 0 and aab, so that aab is join-irreducible. On the other hand, any partition a is equal to I ~ab' ab I ab < C} so that no elements except the specified ones are join-irreducible. Lemma 13 An element a E P(Q) is meet-irreducible if and only if a = I, or a = ao, where R is a proper subset of Q. PROOF: This proof parallels the previous one. I is meet-irreducible, and the only elements a such that aR < a are I and aR, so that aR is meet-irreducible. On the other hand, any partition (cR J a < aR) so that no elements except the specified ones are meet-irreducible. Lemma 14 An element cp C S(Q) is join-irreducible if and only if cp contains at most one block with more than a single element.

158 PROOF: The universal lower bound of S(Q) is join-irreducible, and the join of any two incomparable elements of S(Q) necessarily contains at least two blocks with more than one elemento On the other hand) any element cp of S(Q) is the join of all those elements 0 such that 0 < cp and 0 has only one block with more than one element, Lemma 15 An element cp c S(Q) is meet=irreducible if and only if each block of cp has at least n4l elements. PROOF: Let cp contain a block B. with less than n-l elements, and let J 1 k and k2 be elements of Q not in B.o Let cp be obtained from ep by replacing B. with B, U (kLj and let cp be obtained by replacing BJ J J ~ with B. U (k I Then j 1 2 Therefore, in order for ep to be meet-irred.ucible every block of cp must contain at least n-l elements~ 0n the other hand, let cp and P2 be two incomparable set systems each having n-l elements in every blocklo Let B. be a block of p ci ot contained in c2, and let B. be a block of p2 not in p1 o Then B F B. B is a block of T1 o cp and B. n Bo has n-2 elementso Therefore3 an.y set system n which every J1 02 block has at least n-l elements is not a meet of incomparable set systems, and every such set systeml is therefore meet-irreducibleo This completes the proof,

159 Thus, we see that P(Q) has (2)+1 join-irreducible elements, and 2n- meet-irreducible elements; S(Q) has 2n-n join-irreducible elements, and 2n -n meet-irreducible elements. Thus, it seems practical to obtain the Mm-pairs of the pair algebra iT using the join-irreducible elements of P(Q), but the determination of the Mm-pairs of -, using either the join-irreducible or meet-irreducible elements of S(Q), seems impractical except in the case of machines with a very small number of states. Example 7. Consider a machine mf with Q = (0,1,2,3), A = (al,a2,a3), and the following transition function A1 1 3 0 2 2 2 3 0 3 2 1 2 m(0) = 0 m(ao) = 0- 2 m(o02) = 23' m(ac03) = I m(al3) = O2 m(al3) = 1-23 m(23) = 02 13 No additional partitions are obtained.as joins of the given ones. The

16o Mm-pairs are as follows: (0,0), (51 23, 02 1 3), ( (), (, I), (012 3, 023 T) (2 13, 01 23 ), and (01 23, 02 -13) A state-assigned machine isomorphic with V~, and with the set of states R1 X R2, where R1 = R2 = (a, D), can be so constructed that Pi{} = 301, and {2} = 02 13, using the following encoding of the elements of Q: 0 <-~ (a,a) 1 - (aP) 2 (, a) 3 - (P sP ) Then (P(1)' P (2 ) and ( (2, p (1} ) are both partition pairs. Accordingly, the second coordinate of the next state can be determined from the input symbol and the first coordinate of the present state; also, the first coordinate of the present state can be determined from the input and the second coordinate of the present state. Even for this simple example, the determination of the Mm-pairs of - is arduous, and we omit it. The next problem we consider is that of finding the self-sufficient elements of a pair algebra A C L X Lo For any element x c L, let x = [(x Ix < xK and (xk,xk) c A};

:161 A then x is the least self-sufficient element having x as a lower bound. We shall give an algorithm for determining x give-n x,' on: the: assumption that, given any element y E L, m(y) can be computed. We begin by noting that an element y is self-sufficient if and only if m(y) < y, or, equivalently, y + m(y) y Now, let x be self-sufficient, and suppose z < x. Then m(z) < m(x), so that z + m(z) < x + m(x) = x. Then, unless z is self-sufficient, z < z + m(z) < x. Therefore, x may be computed by constructing a sequence (1) x(2), (h) where (1) X (j+l) (ij) + m(x ) x +m(x() and h is the least value of j such that xi-1) = xi); such a value of h must occur, since X (1) < X (2) < e< (j) < (j+l) < and L is a finite lattice. By a dual argument, y, the greatest self-sufficient element less than or equal to y, may be obtained by applying the following iteration process:

162 y1 = y; forj l, y+) = y M(y()) One special case of this iteration process was useQ in Section 7, where an algorithm was given for determining the indistinguishability relation associated with a sequential machine C9- Problem 7-5 characterizes this relation as the coarsest equivalence relation with S. P. which refines the relation p given by q1pq2' > 6(ql) 6(q2) The algorithm given in Section 7 determines p from p, using the iteration process given above, specialized to the case of the pair algebra 7To Exampl e 8 Let t be a machine with Q = {0, 1,2,3, 4,5,6,7}, AI - {aa2 a3 a a 5}, and the following transition function. AI a a a a a 1 2 a3 4 a5 0 1 0 4 7 2 101 5 6 3 2 3 2 5 5 0 3 2 3 4 4 1 4 4 5 2 3 6 5 5 4 3 2 7 7 6 2 3 05

163 We consider the pair algebra a, and choose x = o06' The computaA tion of x is as follows. j xj) m(x() 3 017; 0167 __ The result of the iteration is that x = 016 2345. For any join-irreducible element a. of L, let a(1) denote ai, and for j > 1, let (3+1) = (j) + (J) a. a +. m(a ); A also, let a. denote the least self-sufficient element greater than or equal to a.. If (i a. < x} then x(l) - i (1) (i a. < x)

164 x(2) x() + m(x(l)) a ) + m(a (i | ai < x} (i ai < x) E, Z a(1) + m(a(1)) {i I a i (< i aI < x} a() + m(a ) { i ai < x}! (2) (i~ a. xa (i ai < x) Similarly, for all j, x(J E a(j); o (i | a < x} therefore, A V~ A x= a.o (i f a, < X} 1 Since every self-sufficient element is equal to x for some x, the lattice of self-sufficient elements may be generated by taking all possible joins of elements a.o 1

165 Example 9. For the machine given in Example 8, the equivalence relations with S.P. of the form.i, where a. is join-irreducible are deter1 mined as follows: a. a. 0 0 12,"34[,c56,c78 12 37 7 ~14,~23' 58,'67 A 3 5'7 17',18', 27' 28,'35',36,f45,'46 12-78 35 13' 24' 57, 6 1234 5678 al'a1l6,'25,'26' 37' 38' a47' 48 In this case, no further equivalence relations with S.P. are obtained by taking joins of the elements a.. The lattice of equivalence relations with S.P. for the given machine is therefore given by the following diagram. I 1278 3 12 3 57 1 23 57 7 0

166 If A is a pair algebra over L X L, then the operations m and M each map L into Lo Let m[ (x) be defined recursively in the usual way: m ((x) m(x); jn]/ = (m[n,1]/ m "(x) = m(m (x)), n=2,3,.oo; let M[n] (x) be defined similarly. Lemma 16 For n=l,2,..oo x <M[n](m n](x)). PROOF: We proceed by induction on n. Property (vi) in Table 1 on page 144 establishes the result for n=l. Assuming as an induction hypothesis that the result is true for n=k, consider M[k+l]m[k+l](x), which is equal to M[ (M(m(m] (x)))). But, again using property (vi), m[k](x) < M(m(m[k] (x))); and, by repeated application of the dual form of property (ii), M[k] ([k] (x)) M[k] (M(m(m[k] (x)))) = M[k+l](m[k+l](x)) By the induction hypothesis, x <M[k] (m[k](x)) <M[k+l] (m[k+l](x)). This completes the proof. The following corollary is obtained by duality.

167 Corollary 2 For n=l,2,o.. mn] (Mn](y)) < y Corollary 3 m[ (I) = 0 if and only if M[] (O) = I PROOF: Suppose m (I) =. Since I < Mn (m (I)), I < Mn] (O). This is possible if M[](O) = I Therefore m[n] (I) = o M[n]() = By duality, the reverse implication also holds. The result given in Corollary 3 is applicable to the problem of testing whether a transition system is n-definiteo A transition system T = (AI, Q, q0o ) is said to be n-definite if, for any two states q and q2, and any word y such that lg(y) > n, \(qJy) = X(q2,y)o For any word y C AT, let

168 L(y) = ({(q,y), q c Q}. (k) For k=0,l,2,..2, Let C be the following set system: C(k) = Max ((L(y) | lg(y) = k) U 0). Then T is n-definite if and only if c(n) _ oO C = 0. Also, C(k+l) = Max ((L(ya) | a E A1 and lg(y) = k) U 0) = Max ((x(L(y),a) | a EA and lg(y) = k) U 0). Therefore, c(k+l) = m((k) where'm' is the m operation in the pair algebra =. Since (O) = I C(n) _ m[n](i) Therefore, T is n-definite if and only if m[n]) = 0 Applying Corollary 3, we now immediately obtain a second characterization of n-definiteness; T is n-definite if and only if M[ ]() = I Let us review what has been accomplished so far in our discussion of the state assignment problem. First, we defined the relation 7"' realizes 72 ", and defined a state-assigned machine 7t/ as one in which the state set Q is a subset of some Cartesian product RI >< R X o X R o We observed that i a state ssigned mchine We observed that, in a state-assigned machine.7, the transition

I69 function \ can be represented by s partial functions \l,'2,'~~'s\ where k(rlr2,...,r,a) has its domain in R1 X R2 X... X R X AI, and has Rk as its range. For any set S C (l,2,...,s}, the equivalence relation PS on Q was defined, and it was established that, given S C 1(,2,...,s) and T C (1,2,..os}, all of the functions k, k e T, are determined by the arguments ri, i E S, together with the argument a, if and only if (pS,PT) is a partition pair. Next, it was shown that, if (a,T) is a partition pair for 72', a realization of 17), then the homomorphism h: Q - Q, maps the blocks of a onto the blocks of a cover cp, and the blocks of T onto the blocks of a cover 0, such that (p,0) is a cover pair for?2 Conversely, it was shown that, given a cover pair for 72, one could construct a realization 72 of 7rt having a partition pair (a,T) such that the homomorphism h maps the blocks of a onto the blocks of cp, and the blocks of T onto the blocks of 0. Thus, the cover pairs of T27 determine, and are determined by, the partition pairs for the realizations of 7 o Systems pairs were then introduced, and the systems pairs for'7

170 were shown to determine the cover pairs for p"1. With appropriate lattice ordering relations for the partitions of Q and the set systems associated with Q, the sets of partition pairs and systems pairs for a machine "2t' were seen to form pair algebras. The properties of pair algebras were then studied. Of particular importance were the functions m(x) and M(y), the lattice of Mm-pairs, and the lattice of self-sufficient elements. An approach to the determination of the Mm-pairs and the selfsufficient elements of a pair algebra was then given. These procedures appear to be fairly satisfactory in the case of the pair algebra n of partition pairs, but unsatisfactory in the case of the pair algebra - of systems pairs. We shall now consider the problem of constructing state assignments with reduced state dependence, once the partition pairs and systems pairs of'7t have been determinedo To simplify matters we shall restrict our attention to state-assigned realizations of -2.'which are isomorphic with ~7 o Thus, if. = (AX Q, A0o to \, 5), where Q CR >< R2 X... X R 1 2' the homomorphism from 7 onto 72 is determined by a function ho Q -Q which is one-to-one and onto. If p is a partition of Q, then h determines a partition h(p) of Q; the blocks of h(p) are the images under h of the blocks of p. Clearly, (P1lP2) is a partition pair for 7~ if and only if (h(p), h(p2)) is a partition pair for 2't

171 In order to obtain a well defined combinatorial problem, we shall further restrict consideration to p-minimal state assignments. For any integer p > 2, a p-minimal state assignment is one in which Q R1 X R2 X... X R where (i) for each i, Ri < p, where'J R. |' denotes the number of elements in Ri; and (ii) s = F logp | Q |1, where Frx denotes the least integer greater than or equal to x. Thus, in particular, 2-minimal state assignments correspond to sequential circuits which have the least number of latches of any circuits associated with realizations of 7to, Given a collection ((0i, N(i)} of partition pairs for 772, let (a, 2,... } = { (i i )L i Then, given p > 2, we may ask whether there exists a p-minimal state assignment which "uses" all the given partition pairs, in the following sense: for each index j, there is a set S. such that a. = h(PS ). where, of course, pS is the partition induced on Q by those coordinates J of the state assignment with indices in S.. J In order to discuss this question, we shall make a few definitions and elementary combinatorial remarks. Given a partition a of a finite set, let b(a) denote the number of blocks of a, and let c(a) denote the

172 largest number of elements in any block of a. If a < p, let p/c denote the following partition of the set of blocks of a: B. and B. are elements 1 J of the same block of p/a if and only if they are subsets of the same block of p. Example 10. Suppose p = 12-34 5 and a = 123 5 Then b(p) = 2, c(p) = 4, b(a) 4, c(a) = 3~ Also, p/a has the following two blocks: (123') and {( ~7}; b(p/c) = 2, c(p/a) = 2. Lemma 17 Let a and p be partitions of Q, with a < po Then the minimum value of b(T), over all partitions T such that pT = a, is c (p/c) o PROOF: Let T be such that pt = a. Then any two blocks of a contained in the same block of p must be in distinct blocks of T; therefore, b(a) > c(p/ac),

173 To construct a partition T such that pT = a and b(T) = c(p/a), order the elements (blocks of a) of each block of p/a. Then, for 1 < i < c(p), let the i-th block of T be the union of all those blocks of a which are i-th elements of blocks of p/a. Corollary 4 Let a be a partition of Q. Then the minimum value of b(T), over all partitions T such that rT = 0, is c(a). We also remark, that, in any state-assigned machine, and for any set S of indices, Ps = j P i} ieS Since, for any two partitions a and T, b(acT) < b(a)b(T), b(ps) <TT (p i}) ieS In the particular case of a p-minimal state assignment, b(pri) _P,

1714 so that b(ps) < p I Also, if S is the complement of S in the set [1,2,...,s), then PS. = P1,2, p.s = } 0 Therefore pP 1 >b(p) > (pS) Accordingly, b(PS) c c(PS) < p S I s +l P < pr lgplIl 1 If a h(Ps)= then b(a) = b(ps) and c(a) =- (ps)o Therefore, unless b(a) c(c) < p rFlog Q 1 a is not used in any p-minimal state assignment. Since b(ps) < p and c(PS) < I F logp b(pS)1 < IS and log c(p ) i < i S We shall show that these inequalities must hold as equations. Assuming

175 that one of the inequalities is:strict, we have the following: f logpl^ l 1 = |S + Sf > Flog b(ps)1 + flog c(ps)1 Then [logp I I j - rlog b(Ps) 1 + [ogp C(ps)1 and Flog1 I - 1 log b(ps)l + flogI c(pS) 1 p > > b(PS) ~ c(PS) > IQK But r logp 1 Q -I < iogp Q so that a contradiction has been reached. Summing up,9 we have the following resulte Lemma 18 If a is a partition of Q and' has a p-minimal state assignment such that a h(PS ) then S| = [flog b(ps)] and rlogp b(ojl + [log c(c) = Q l p ~ + oPc(a

176 Example 1io Consider a sequential machine having the following transition functiono a a2 a3 1 3 6 1 0 2 47 1361 2156 3042 6 l o 4 7og2Q 3 This machine has the partition pairs (CT^) 2 = (1 2 5 537; 0 5I 7 ) and (ar y ) (75 17 7 21; 3W o0-126) Let us determine whether' has a two-minimal state assignment such that, for appropriate sets S,2 h(Ps L i i=1,2,3. Since rlog2IQ = 1log28 = 3, the assignment is such that QC Q X>< R X R, where, for i=121,3, I Ri = 2 To check that the conditions of Lemma 18 are satisfied, we make the following tabulation,

177 7 =s rlog2b(a) 1 rlog2c()1 F | log2b(i. + Flog2c( ai) 1 2 1 3 2 2 1 3 3 1 2 3 The condition that rlog2 b(i) + rlog2 c(ci) = 3 is satisfied for all i, and the quantities |Si I have been determinedo Before continuing this example, we need the following combinatorial lemma, which is a generalization of the well-known principle of inclusion and exclusion. Lemma 19 Let S1,S2,...S be subsets of a finite set U, and let T be a nonempty S-ubset of [1,2,...,m. Let S. denote the comJ plement of S. in U, and let T denote the complement of T in (1,2,.. m}. Then ns n S | = (_l)J R l +1 ) jT je R R C T jcT U R Example 12. Let the sets S1 S2, and S3 be represented on a Venn diagram, as follows.

178 S1 S3 Applying the lemma to the case T = (1,2), we obtain s1 n s2 n3 - I s3 1 + I S1 U S3 1 + I 2 U S3 1 - S u s2 U S3 This may be checked by putting a plus sign in a region of the Venn diagram for every set containing the region which occurs with coefficient +1, and a minus sign for every set containing the region which occurs with coefficient -1. S1 The plus and minus signs cancel out, except in the region S1 n S2 n S3, which has one more plus sign than minus sign. Now we may apply Lemma 19 to the set of partitions introduced in Example 11.

179 Example 13. We have already determined that IS11 = 2, IS2I = 2, and Is3 1. Also cr' 2 =0 h(p ) h(p2 ) S. S2 h (p 1 US Therefore IS S2 = | log2b(0) | = 3. Similarly, it is determined that I -U S31 = 2, IS U S3 = 3, and IS1 U S2 U S = 3. Applying Lemma 19, we find that Ism n s2 n s 31= i Sl n s2 n S3 = s. n s n s3 - 1, and [ s n s2n s 3 I = 1 I S n s2 n s I S n S n s2 I s n s n s3.W o With no loss of generality, we may index the sets R1, R2, and R in any

180 2-minimal state assignment using ola, 2, and a3 so that S I n s2 n s^ins-n s^ - (2), 1ln s2 -n 3 (23, and s n s2.'n 3 {3). Then S1 = (1,3), S2 = (1,23, and S3 = (3). Using the symbol T-. to denote h(p,i), we obtain 1 h(P(1,31) = h(pP)h(p(3)) 2 = T2' and d3 = 3' Thenr1 = T1 3 < T1' and d2 = 1 2 < 1' therefore, 1 + a2 = 0157 23T < T1, and either 1 = 0157 2346 or 1 =I. The latter case is clearly impossible, since the necessary condition of Lemma 18 would be violated. Forming the product T'TU3 we obtain c1;

181 if a1 were not obtained, it would be known that no 2-minimal state assignment using ([l,a 2, 3} exists. Finally, choosing for 2 a two-block partition such that'1' 2 = a2' we obtain 2 = 0546 1237. This determines the state assignment completely. The function h: R1 X R2 X 3 Q is as follows: r1 r2 r3 h(rl,r2,r3) 000 5 O 0 1 0 010 7 0 11 1 100 4 101 6 1 1 0 3 1 1 1 2 The function h determines the transition function for ib, the desired state-assigned realization of 7?t, as follows. AI a1 a2 a 001 111 101 010, 011O 101 011 111 011 000 101 110 001 100 111 100 001 101 110 000 001 111 100 101 011 001 100 010 000 111 101 It is easily verified that (P {,3}' P( 1,2) and (P(1,2' P(3)) are partition pairs for. Thus, the transition fn n function of

182 determines transition functions kl, k2, k3 for the individual coordinates such that k1 = l1(rl,r3,a) 2~ = \2(rlr3,a) 3 = \32(rl,r2, a).

11. LOOP-FREE DECOMPOSITIONS OF SEQUENTIAL MACHINES We shall next turn our attention to a special class of state-assigned realizations called loop-free decompositions. A loop-free decomposition of i7 may be reviewed as a realization of 7 by an interconnected set {^1', 2''''' * s} of machines such that, for any k, the inputs to'Y2k consist of the external inputs, together with the outputs of 1' 2''' 72 k-l The main theorem of this section is due to Krohn and Rhodes [13] and Zeiger [19]. It asserts that every machine has a loop-free decomposition into two-state identity-reset machines and machines whose function monoids are simple groups. Let A = (I, Q, Ao, q0', 5) be a finite-state sequential machine. A loop-free decomposition of Af is a state-assigned realization 7ft of ~ such that -7r =- (AI Q, AO qO', 5), there exist finite sets R1,R2...,,R such that Q C R1 X R2 X... X Rs, and for k=l1,2,o.,s P(1,2,..,k} is a partition with S.P.. Such a decomposition is strict if'7t is isomorphic with I. A loop-free decomposition of ~ is said to be a parallel decomposition if, for each k, Plk) is a partition with S.P.. 1185

.184 It follows that, in a loop-free decomposition, each coordinate transition function k is of the form k(rl,...rk,a). Accordingly, the first k coordinates of a loop-free decomposition 27 determine a sequential machine k = (A' Qk' Qk' (rl(qo)rr2(q0)..,rk(qo)), k' k) where Qk R1 XR2 ><... X R k(( 2rl, r2,a,,rk,a) =), and ck is an identity function. If ~ (9>k) is the transition system derived from 77 by disregarding outputs, and' ( 2k) is similarly derived from k then ~ ( P ) is isomorphic with (7Z)/P (1,2,...,k}' If S7t is a strict realization of 7X, then k ( ik) is also isomorphic with.. i(')/h(pn,2,.~l) It is also convenient to introduce the machines -1, %7 t2' to e 2k' which are the components of the loop-free decomposition; =k (A X >Qk-l' Rk, Rk, rk(qo),'k, pk)' where (r k(a,'rlr2'~'..,rk )) = Xk(rl r2,..,ro a) and pc is an identity function. If the loop-free decomposition is

185 parallel, then 1k is determined solely by rk and a. In this case, we define the parallel component k = (.A, iR, \, rjk(0)O) Pk)'T where O (rk,a) = \k(rka); in this case, <(~I) is isomorphic with ( )/pk]. If the input alphabet of i7 is (0,1 ), the output alphabet is (0,1 (r), and 79 has a loop-free decomposition in which, for each k, (k) Rk = {0,1) then there is a sequential circuit corresponding to 7rt which has the following form. s+l C ---— 1 x-2 -s, _i-C 1E -X1'w' ^i- - L L s~~~~~~~~~~~~~~~~~~~~ 1 ~ ~ ~ ~ ~ ~ -- 2 —---

186 Each component' k is represented by a subcircuit consisting of a k combinational part Ck and a set of latches Lk. The output is computed in the combinational circuit C s The symbol x denotes a "bundle" of s+l p binary signals representing an element of the input alphabet. Similarly, z represents an output r-tuple, and for each k, Yk is an lktuple giving the state of the component 7k5 It follows from the definitions of a loop-free decomposition, and of the machines Ok and 7 k that, for 0< < k < k< s, ~ k2 has a loop-free decomposition with components "2 kl' kl+l' "' k2 Also, decompositions may be iterated, as follows. Suppose that ~7 has a loop-free decomposition with components 1' 2'''' and, for some k, tk has a loop-free decomposition - with components Let the set of states of 1. be S., and let h be the homomorphism from the state set of$f onto Rk, the state set of 7 k. Then -/ has a loop-free decomposition with the ordered set of components 21 2 e'2 Zk-l' l' 2'''t' A A,k+l''' s A The component m is a modification of 2u determined as follows: P P

187 A p (AI > Qk- > S1 X S2 X... X St X Rk+l... p I -112 t k >< Rp Rp Rp rp(O) Up, v p, where A / monoid of A is the same as that of 7', and that ^^ has the same P rP number of states as 17 ~ If Xt is a loop-free decomposition of 7t, then'7/ has the following chain of partitions with S.P.: P(1} s P(1,2) - P1,2,3} -'' P{1,2,..o } s If the set system Max[h(p 1,2,,k)] is denoted by ak, then 7t- has the following chain of set systems with S.Po c 2. >a = 0. ~- 2-~- s The strict loop-free decompositions of ZZ' may be obtained by inspection of its lattice of partitions with S.P. o In the case of a strict loop-free decomposition, each element of the chain ala2...'a s is, of course, a partition with S.P.. Also, if the partition h(P(k)) is denoted Tk, then the following system of equations holds: 1 = 1,

188 and for k=2,...,s rk = k-' k Conversely, given a chain of partitions with S.P., 1 >c >c >.. s = 0, 1- - - s one can clearly construct a sequence of partitions T2,,..'T such that, for all k, k l\i = ck; i=l by Lemma 10-17, Tk can be chosen so that b(k) = C(kk-l/k)' A state-assigned realization 7X- of 7 can then readily be constructed such that ~p' is isomorphic with 7t- and, for all k, h(P (k) = Tk It then follows that, for all k, h(P(1,2,l..,) = k so that P(l,2,o.,kJ is a partition with S.P. for 7.Thus Thu is a strict loop-free decomposition of. Example 1. For the machine 7k- considered in Examples 10-7 and 10-8, the lattice of partitions with S.P, is as follows.

189 I 1278 346 X 1234 5678 1-2 7 5X X35 23 7 7 0 Let us consider the following chain of partitions with S.P.: a = 1278 3456; Or = 12 53 5 7; a = 0. Then 1T = 1278 3456, and we may choose 2 and T3 as follows: Th = i 557., The following state assignment is such that, for k=1,2,3, h(P(k) = Tk. r1 r2 r3 hCr2 h(r!r2,r3)r 0 0 0 1 0 0 1 2 1 0 3 10 1 4 1 1 5 111 6 0 10 7 0 1 1 8

190 The transition function X for the state-assigned machine 72? is determined from the transition function A for 77, using the basic identity characteristic of homomorphisms: \(h(q),a) = h(%(q,a)). The table for the function X is as follows: A^ a1 a2 a2 4 a3 000 001 000 110 oi0 100 001 000 001 111 010 101 100 101 100 111 111 000 101 100 101 110 1 001 110 110 111 100 101 010 111 111 110 101 100 011 010 010 011 101 001 110 011 011 010 100 000 111 It can now be checked that P a1 n and P[l 2] have S.P.. Also, since T has S.Po, P(2) has S.Po, so that 7//p1,2 2 has a parallel decomposition with parallel components isomorphic to -"/p[1] and 2/P[2]. We turn our attention next to the proof of a basic theorem about loop-free decompositions, The statement and proof given here are essentially due to Zeiger [19]; the original statement and proof are due to Krohn and Rhodes [13], A sequential machine -7(A = ( Q, Ao q0', 5) is an identity-reset machine if, for every input a, one of the following holds: (i) X(qa) = q, for all q, (ii) there is a state q' c Q such that, for all q, k(q,a) = q'.

191 Theorem 1 Every finite-state machine has a loop-free decomposition in which each of the components 7tk satisfies one of the following two conditions: a) k is a two-state identity-reset machine; b) the function monoid of k is a simple group. In proving this theorem, we shall use some concepts and results from elementary group theory. Let G be a finite group with multiplication operation. A subset H of G is a subgroup of G if H is closed under the operation. If H is a subgroup of G then H contains 1, the identity -1 element of G; also, if a c H, then a c H. If G is a finite group, the order of G is the number of elements of G. If H is a subgroup of G, then for any element g c G, the set gH = (gh h H) is called the right coset of g relative to H. Left cosets are defined similarly. Any two right cosets associated with H have the same number of elements, and two right cosets are either identical or disjoint. The number of right cosets is called the index of H in G. These statements also hold for left cosets. Clearly, the order of G is the product of the order of H and the index of H in G. A subgroup H is called normal if and only if, for all g, gH = Hg. An equivalent statement is that, for all g, gdHg = (gfig 1h g H} is equal to H. If H is normal, then the equivalence relation gl - g2 < > glH = g2H

192 is a two-sided congruence over G, and determines a homomorphism G - G/H. The factor monoid G/H is a group, and is called the factor group of G modulo H. All two-sided congruences over G are induced by normal subgroups and therefore all homomorphic images of G are groups of the form G/H. A group G is simple if the only normal subgroups of G are (1) and G itself. A composition series for G is a sequence G G1 G D ooo G = (1), where, for i=l,2,..,s-l, Gi+l is a normal subgroup G., and Gi+l is maximal; i.e,, there is no normal subgroup H of G such that Gi D H D G, - - _ i+l G. H, and Gi+1 / H. In a composition series, all the factor groups Gi/Gj+l are simple. The Jordan-Holder Theorem states that, if G has two composition series G = G D G D...D G (1 1- 2- - s and G = Hi H2 D... H = (1 then there is a one-to-one correspondence between the sets {Gi/Gi+l, i 1 2,.@,s-} and (H/Hil+l i=19 2 so,, t-1 } such that corresponding factor groups are isomorphic. This implies, in

193 particular, that all composition series for G are of the same length. The factor groups associated with a composition series for G are called the composition factors of G. A finite-state sequential machine with states S and transition function [ is a permutation-reset machine if, for every input symbol a, the function. defined by the equation, a a (s) = p(s,a) is either a permutation of S or a constant function, independent of s. PROOF: (of Theorem 1): As a first step, we shall construct, for an arbitrary finite-state machine = (A, Q, Ao, q,' \, 5), a loop-free decomposition' = (AI' Q, o A q,, ) in which all components are permutation-reset machines. If Q is n, then 7XI will have n-l components, so that Q c R1 X R2 X... X R-1; also, for k=l,2,...n-1, Rk will be equal to (l,2,.,.,n-k}. The state assignment underlying this construction can be represented schematically by a tree. Part of the tree for the case:n=.4 is shown in the following figure. In the general case, the root of the tree is labelled'123...n' and each vertex at distance k from the root (also called a k-th level vertex) is labelled with a set of n-k distinct elements of (1,2,...,n). Each k th-level vertex v is connected to

194 k (k+l)-th level vertices, labelled with the distinct (n-(k+l))-tuples obtained by deleting an element from the label of v; the edges from v to these vertices are numbered 1,2,...,n-k, in. an arbitrary order. The states of the realization 72? are in one-to-one correspondence with the leaves of the tree (the (n-l)-th level vertices); the name of a state is given by the sequence of labels of edges in the path from the root to the associated leaf, and the image of a state of 77t under the homomorphism h from 72 to 72V is the label of the corresponding leaf. Thus, in the tree of Figure 1, h(l,3,2) = 3. 1 25- ( 2 -2Figu 3 Figure 1

195 Let [al,...,' ] be the block of P(l,,k consisting of all states of 7 whose first k coordinates are A,...,'. Then h([c1,...,c]) is given by the label of the unique k-th level node connected to the root by a path with Cal',2.. ",k as its sequence of edge labels. Consequently, the subsets of Q are obtained by deleting precisely one element from h([a1...,. 0 ]) are the following: h([l,...,aK l]), h([{ o,, a2]),..., h([ol,...,on-k]). The construction of the transition function x of the realization will be recursive. The functions 1(r'a), 2(rl r2'a),..., 2k(rlr2,.,ra) will be defined successively. It will be convenient to recall that the function pk is defined as follows: k(rlr 2..,r,2a) = (\l(rla),X2(rl,r2,a),..., k(rlr,...,rka)). The function x must satisfy the identity (1) h(X(q,a)) =- (h(q),a). From this it follows that, if S C Q, then (2) h(\(S,a)) = \(h(S),a), where X(S,a) = x(q,a) | q e S} and h(S) = (h(q) q C S). But, if lk('...,0'a) = (,a..k) then \(h( [a',aa,..,k] ), a) C [l',..9 k ], so that h[h([a',a,.. o,]),a)] c h([1' l,''' ]) Accordingly, the function J1 must satisfy the following condition.

196 if ik(~lc,2'9~.~',.ka) = ({19,o,'~~' k ), then! (3) x(h([aoa2,. o,o]), a) C h([p,. oo~k]) In the case k = n-l, this condition is equivalent to (1). We shall now construct the functions k1\,2'o' 1 so that (3) holds for all k and each component is a permutation-reset machine. The construction of \k is illustrative of the general method. For a C AI, two cases are possible: (1) k(Qa) is a proper subset of Q; (2) \(Qa) =Q Case o1 In this case, there is a set h([a]) such that (Q, a) C h([a])o For all j c R1, set Xl(j,a) = a. Then (3) is not violated and the input a causes a reset to the state a in the component i 1 Case 20 Because the input a permutes the states of Q, it follows that there is a unique permutation it of R such that, for j=l12,, oOn \(h[j]) = h[H(j)]o For all j c R1, set xl(j,a) = (j)o Then (3) is not violated, and the input a permutes the states of l.

197 Suppose now that k satisfies (3), that the components 1' 2'' j k are permutation-reset, and that k+l is to be constructed. If lk(,'2' *.,0, a) = (PlP,2...' k), then two cases can occur: (1) k(h[c6l,c2,..,ik]) is a proper subset of h([PlP2,...,"k]). (2) k(h({[ala2 2,...,ck]),a) is equal to h( [P,P2..,Pk]). The construction of Lk+l( l,(a 2,,..,Ok,k+l,a) in these two cases is as follows. Case 1. * In this case, there exists 3k- such that k3~l {(h( [%,a2,...,*%]), a) C h[~,p2,..,P^k+l ] For all coordinates 0k+1, set kk+l(l,2, equal to k+l Then (3) is not violated and the action of the component fi under k+l the input (a,cl,. a Ok) is to reset to the state (k+l Case 2, In this case, there is a unique permutation i of Rk+l such that, for all j, \(h([,',2', ~.o,,j]), a) = h(~[l,,e o..,i,h(j)]). Setting kk+l (alC2,..,Ckja) equal to nt(j), we find that condition (3) is not violated, and that the input (jO,,2,...,k,,a) permutes the states of k+l k+l~o

198 Thus, we have given an inductive construction of a transition function X such that all components are permutation-reset, and such that the identity h (h[(4iQa2,..., nil,a) h(= ( X2(h[ (.Q.. ~ ~ n-1 = ['2' n-1) n holds. By appropriate choice of the initial state and the output function of 71-, the remaining conditions for a realization may be satisfied. Thus, we have shown how to produce a loop-free decomposition of'ItZ into permutation-reset components. Example 1. Let`7I be a four-state machine with the following transition function. a b 1 2 2 2 4 3 3 4 4 4 2 1 We shall compute k([l,2,2],a). First, since 2(Q,a) Ch([4]) = (1,2,3, we set Xl(ja) = 4 for all j; in particular, \l(l,a) = 4. Since \(h(,2]),a) = 1,2,4) = h([42]), we set X2(l,j,a) = 2

199 for all j. In particular, k2(1,2,a) = 2. Since (h([l,2],a)) = h([4,2]), for each j there is a unique element JT(j) such that (h([1,2,j]),a) = h([4,2,(j)]) In particular, t(2) = 2, so that x3(1,2,2,a) 2, and \((1,2,2),a) = (4,2,2). Example 2. Let -/ be a three-state machine with the following transition function. a b 1 2 3 Q 2 3 1 3 3 2 The state assignment to be used is exhibited by the following tree.

200 1 2~~~~3~ K(Q,a) = (2,3) = h([31). Therefore.l(j, a) = 3 for all j. x(Q,b) - Q. Therefore, the input b permutes the states of. as follows: 1 -> 2, 2.-> 3, 3 1. The table for X1 is as follows.

201 A, a b 132 R1 2 3 3 3 3 1 The function k2 is then determined as follows. 2 2AI X R1 a1 a2 a3 b b2 b3 _ 22 3 1 12 3 1 1 1 2 2 2 1 2 2 2 2 2 1 1 2 The next step in proving Theorem 1 is to show that any permutationreset machine has a loop-free decomposition into two components: a permutation machine, and an identity-reset machine. A finite-state machine with state set S and transition function pi is a permutation machine if every function pa is a permutation. Let 7Y = (AI' Q, Ao A A, O ) be a permutation-reset machine. Let PC A, be determined as follows: a E P if and only if \ is a permutation of Q a (of course, for x E A, 2x(q) = (qx)). If P is empty, then 7- is a reset machine, and does not have to be decomposed further at this step. Otherwise, let G be the group of permutations of Q generated by (ha | a e P }, and let ~ be the multiplication operation

202 of G. We shall construct a loop-free realization 7- of 7t in which'7,1 the first component, is a permutation machine with the set of states G, and 72, the second component, is an identity-reset machine with Q as its set of states. In specifying A, the transition function of N9, it will be convenient to represent the identity h(A(q,a)) = \(h(q),a) by a commutative diagram, of the following form: h - Q —--- Q h Q --—. Q. In this case, Q = G Q, and the homomorphism h maps the ordered pair (P,q) c G X Q onto the element r (q) c Q. In specifying \, two cases must be distinguished. Case 1. a c P. Then A is a permutation. The specification of. is a a shown in the following commutative diagram. (P,q) —. rF(q) a a (^*,<l -- V r(a

203 Thus, in this case, \((r,iq),a) = (. rq). Case 2. a, P. Then ka is a reset, mapping every state onto some state r. The specification of a is shown in the following diagram. a h a a (r, r (r)) h r. The loop-free decomposition of-~Y7 may now be described as follows. If the first component 7Z] is in the state /, and receives an input a e P it goes to the state k a r; thus, the input a produces a permutaa tion of G, the set of states of Xt 1 If 7t is in the state /, and receives an input a $ P, it remains in the state r; thus, the input a produces the identity permutation of G. If the second component 72 is in the state q and receives the input (a, /), where a c P, it remains in the state q. If -712 is in the state q and receives the input (a, / ), where a / P, it resets to the state r -(r), where r = \(q,a) for all q. Thus, 72 is an identity-reset machine. Example 3. Let 7' have the transition function X given by the following table.

204 Q 2 3 1 1 3 1 3 1 is a permutation-reset machine, and the group of permutations generated by \ and a is S3, the symmetric group of degree 3. Let the elements of this group be assigned names as follows: 123 123 123 a: (123) b: (132) c (213 123 123 123 d: (231) e (312) f: (321). For example, the permutation e maps 1 into 3, 2 into 1, and 3 into 2. The transition function for the state-assigned machine 7~- is as follows. A, al- a2 a3 ____, ___ 3 a,l d,l c,l a,l a,2 d,2 c,2 a,l a,3 d,3 c,3 a,l b,l c,l d,l b,l b,2 c,2 d,2 b,l b,3 c,3 d,3 b,l c,l f,l a,l c,2 c,2 f,2 a,2 c,2 Q c,3 f,3 a,3 c,2 d,l e,l b,l d,3 d,2 e,2 b,2 d,3 d,3 6,3 b,3 d,3 e,l a,l f,l e,2 e,2 a,2 f,2 e,2 e,3 a,3 f,3 e,2 f,l b,l e,l f,3 f,2 b,2 e,2 f,3 f3b,3 e,3. _3_

205 The first component of 77? has the following transition function: A1 1 a1 Ea a3 a d c a b c db c f a c 1 "1. d e b d e a f e f b e f The second component performs the identity mapping under any input (a, r) or (a2, ); for the other inputs, it performs resets, as follows..^:.... a322a a3,b a3,c a3,d a3,e a34f 1 1 1 2 3 2 3 2p R2 2 1 1 2 3 2 3 3 1 1 2 3 2. 3 Suppose a permutation-reset machine'L has been decomposed, in the manner just described, into two components: a permutation machine /~ and an identity-reset machine'- 2 Let z be an element of A, and let x be the subsequence of z obtained by deleting those elements of z not in P (Recall that a c P if and only if a is a permutation.). Then, for all rF G, xl(r,z) = xl(,x), since each deleted input leaves the state of -1 unchanged. Therefore, every function (kl)z(r ) is identical with some function (%l)x(/7), where x C P. If x and y are elements of P, then, for all r E G, \l(/,x) = x.r,

206 and xl(ry) = iyr. Accordingly, the functions (l)x and (l)y are identical if and only if X and k are identical. It follows, then, that the transition monoid x y of /1 is (abstractly) isomorphic to the group G of permutations \x, *x x c P. Since 71 is reduced, the function monoid of V is also isomorphic to G. As we shall see later, further decomposition of this permutation machine will be possible unless G is simple. The second component >2 in the decomposition of 27 is an identityreset machine. It is immediate that, in an identity-reset machine, every partition of the set of states has S.P. Therefore, any stri.ct stateassigned realization of /2 in which the states are binary s-tuples will be a parallel decomposition of'2 into two-state components; it is easy to see that these components are necessarily identity-reset. At this point we have established that every finite-state machine has a loop-free decomposition in which all components are two-state machines or permutation machines. Moreover, each permutation machine is of the form, = (A, G, G, 1, 1, 1 ), where the finite group G is the transition (and function) monoid of H, 1 is the identity element of G, 6 is the identity function, and for each a E A, there is an element /r E G such that for all / G, a t(r,a) r. r. Suppose that G has a proper normal subgroup N (i.e., N ~ G and N ([1)). Then G is the union of the right cosets associated with N;

207 i.e., G N U y2N U.. U YkN, where k is the index of N in G. Let the set (l,y2,...,yk} of coset representatives be denoted by Y. Then F has a strict loop-free decomposition >2- with the two components &1 and p2' such that the function monoid of' is isomorphic to G/N, and the function monoid of i2 is N. The set of states of'^ is Y X N. The homomorphism h: Y X N -- N, and k\ the transition function of D7i, are displayed in the following commutative diagram. h (yn) h _> yon a a -a h a-l. ww r an a y —---- n. Here, w is the representative of the coset containing F'y, so that a C ay = wn1, for some nl E N, and -1 l w a ~y = n1 When' 1 is in the state y and the input is a, the next state is the representative of the coset containing y.a. In other words, the input a, applied to ~' permutes coset representatives exactly as premultiplication by [ r ], the coset of /a, multiplies cosets in a a G/No Also, since the elements of /a a c A, generate G, the cosets [ a], a c A, generate G/No Therefore, the transition monoid (or, equivalently, the function monoid) of ~1 is isomorphic with G/N. 1

208 It remains to be shown that the transition monoid of t 2 is N. We have already seen that the action of the input (y,a) on t2 is to maps each state n to nl n, where -l n = w 7 ay, an element of N. To complete the proof that the transition monoid of X2 is N, we shall show that, for every element n' E N, there is an input sequence to 72t that maps each state n to n'.n. Since the elements a, a c A, generate G, there is a sequence x e A such that, for all g C G. g(g,x) = n'g. Since the homomorphism h from Y > N onto G is one-to-one, it is an isomorphism, and h G - Y > N is also an isomorphism. Since, for all g E G, F(g,x) = n'gg, it follows that %(h (g),x) = h-l(n.g). Setting g=n, where n is an arbitrary element of N, and noting that h (n) = (l,n), we find that \((l,n),x) = (l,nn). Therefore, there exists an input sequence to 2 mapping each state 2 n c N to n'n. This completes the proof that the transition monoid (and function monoid) of!2 is N. 2

209 Example 4. Consider a permutation machine p having the following transition function: A The transition monoid G of f is the cyclic group of order 4, which can be represented by the set of integers (0,1,2,33, with group multiplication the same as addition module 4. Then al and a0 = 2. / Q (1 2 3 Q2 N (0,2 p G i t c 0, 1 The transition r monoi G of e is the cyclic group of order 4, which can be represented by the set of integers (0,1,2,3), with group multiplication the same as addition modulo 4. Then 1 a! and 2. a22 The group G has the normal subgroup N = {0,2), which partitions G into the cosets (0,2} and (1,3). Let Y, the collection of coset representatives, be (0,1). The transition function for ~77 is: (0,0) 1,0 0,2 (0,2) 1,2 0,0 & (1,0) 0,2 1,2 (1,2) 0,0 1,0 The transition functions for the components ~! and W2 are as follows: 1 2

210 2 l a1 a2 1: 0 1 0 1 0 1 2 a1,0 a,, 1 a2,V 0 0 2 2 2 R2 2 2 0 0 0 We have now specified a procedure by which a machine with state set G and function monoid G can be decomposed into two components: t 1, with state set in one-to-one correspondence with G/N and the function monoid G/N, and 1t2' with state set N and function monoid N, where N is a normal subgroup of G. The procedure can be iterated until all the components remaining are permutation machines whose groups are simple. Although various choices of a decomposition are possible at each step of this iterative process, the collection of simple groups ultimately obtained is uniquely determined by G; the simple groups which occur are the composition factors of G. Thus we have completed the proof of Theorem 1. Every finite-state machine has a loop-free decomposition in which every component is either a two-state identity-reset machine or a permutation machine whose group is simple. By arguments that we shall not give, it is possible to specify exactly the simple groups that have to occur in any loop-free decomposition of a given finite-state machine 2A. Let,>/(7) be the set consisting

211 of all the simple groups that are.composition factors of subgroups of the function monoid of 72'. Theorem 2 Let'.W have a loop-free decomposition with components'l" I 2'....'' If G E ( ), then G E /A('7 i), for some i. Theorem 3 t2- has a loop-free decomposition in which each component AfV is either a two-state identity-reset machine, or a permutation machine whose group is an element of (~). Corollary 1 7t?- has a loop-free decomposition in which all components are two-state identity-reset machines if and only if A (i-5) contains only the group with one element. The cyclic groups of prime order form an important class of simple groups. A group G is solvable if all its composition factors are cyclic of prime order. All finite abelian groups are solvable, and it has recently been shown by Feit and Thompson that all finite groups of odd order are solvable. Let p be a prime, and let mtX be a sequential machine whose function monoid is Cp, the cyclic machine -9' which realizes 7%, for which the set of states is 0,1,...,p-1}, and for which each function a (q) = q + c(a)(mod p). a

212 Let such a machine be called a modulo p counter. Corollary 2'7X has a loop-free decomposition in which all components are two-state identity-reset machines and counters of prime modules if and only if every subgroup of the function monoid of & is solvable.

12. TURING MACHINES AND RECURSIVELY UNSOLVABLE PROBLEMS In this chapter we begin the study of classes of automata with greater capabilities than finite automata. Such a class C has the property that, with suitable definitions, every regular event is represented by an automaton in C. On the other hand, it will also be true that, with suitable definitions, every automaton that we consider will be capable of being simulated by a Turing machine; in fact, every digital, discretetime, finitary automaton that has been conceived of up to now can be simulated by a Turing machine, The fact that Turing machines seem to set an upper limit on the capabilities of automata would be sufficient reason for introducing them at this point. A second reason is that, as we begin to study more powerful automata, it will turn out that algorithms do not exist for solving certain problems of analysis. This was not the case in our study of sequential machines, where all the algorithms we sought turned out to exist. For example, we can determine algorithmically whether a given state of a sequential machine is accessible, whether the set of words accepted by a finite automaton is empty or infinite, whether two sequential machines represent the same function, and so forth. More complicated problems, such as deciding whether a sequential machine exists such that the function seqeg (x) maps one given regular set onto another, also turn out to be solvable. So long as the algorithms we sought could always be found, no general concept of "algorithm'" had to be defined precisely; through experience, we have learned to recognize an algorithm when we see one. To show 213

214 that a certain problem is not decidable or not effectively solvable requires a higher degree of precision. The generally accepted definition of decidability hinges on the concept of a recursive set; and this concept can be developed in terms of Turing machines. Turing machines and the concepts associated with recursive solvability are not exclusively the concern of automata theory; important undecidable problems occur in logic, number theory, -and algebra. For this reason, we shall give only the development which is essential for our purposes, and refer the reader elsewhere (Davis [ 3], for example) for more information. We shall follow the formulation of Turing machines in terms of quadruples introduced by Post [34] and used in the book by Davis. Because we are more concerned in automata theory with recursive sets of words than with recursive sets of integers, our approach to the definition of a recursive set of words will be more direct than the usual ones (Davis, for example). A Turing machine T consists of a control unit with a finite set of states Q (the internal states of T), together with a tape. The tape consists of a two-way infinite sequence of squares, each of which can contain any symbol from a finite alphabet A; A has a distinguished element a, the "blank" symbol. Initially, all but a finite number of squares of the tape contain a0, and one square is being scanned by the control unit. The operation of T is specified by a finite list of "commands" called quadruples. Each quadruple is of one of the following three forms: Type 1 q.i aj ak q

215 Type 2 a. R qF Type 3 q. a. L q. —.r1 j For (qi'aj) c Q X A, there is at most one quadruple containing qi and a. in the first two positions. When T is in state qi and the square being scanned contains the symbol aj, the command associated with (qi,aj) is executed. If no such command exists, T halts. The result of executing the command depends on its type, as follows: Type 1 write ak in the square being scanned, and enter state q; Type 2 enter state q, and scan the square to the right of the present one; Type 3 enter state q~, and scan the square to the left of the present one. To make the definition of T quite precise, we introduce the concept of the complete states of T. A complete state of T is a triple (qcixny) c Q X A X At Such a triple corresponds to the situation in which the internal state is qi, the tape contains the word xy, bounded on each side by an infinite sequence of copies of a, the "blank" symbol, and the square containing the first symbol of y is being scanned. Suppose y = ajy' where a. c A. Then, if there is no quadruple with (qi,a.) in the first a'a

216 two positions, the computation stops. If there is a quadruple of the form qiajakq, then the next complete state is (qP,x,aky'). If there is a quadruple of the form q.iaRq~, two cases may occur: 1) if y' p e, then the next complete state is (qJxaj,y'); 2) if y' = e, then the next complete state is (q,xaj,ao). If there is a quadruple of the form qia Lq, two cases may occur: (a) if x = x'ak, ak c A, then the next complete state is (q2 x, x a, y); (b) if x = e, then the next complete state is (q~, e, ady). Thus, the initial complete state of a Turing machine determines whether the machine will ever stop, and what the final complete state will be if the machine does stop. Let P be a subset of Ai. The set P is recursively enumerable if there exists a Turing machine T which, given the initial complete state (q0, e, x), halts if and only if x E P; P is said to be recursive if there exists a Turing machine T having symbols a and a in its alphabet, such that, if T is given the initial complete state (qO, e, x), it will halt with the final complete state (q0, e, al) if x c P, and will halt with the final complete state (q,0 e, a2) if x ~ P. It can be shown that P is recursive if and only if both P and its complement in A* are recursively enumerable. The problem of deciding whether a given word x is an element of P is decidable, or recursively solvable, if P is recursive; otherwise this problem is undecidable, or recursively unsolvable.....

217 We shall often speak about decidable or undecidable problems without introducing all the details that would be needed in order to specify the set P precisely. For example, consider the Post correspondence problem, which is usually stated as follows: Given two equally long ordered lists al,a2,..,an and b1,b2o..,bn of words over an alphabet A, to determine whether there exists a nonempty sequence of indices il,i2,...,ik such that the following two words are identical: a. a... a. and b. b 1 2 k'1 i2, o b. o Post [33] has shown that, if A contains at least two elements, k there~is no algorithm for solving all instances of the Post correspondence problem. This result will prove to be a basic tool for establishing undecidability results in automata theory. As the result is stated, however, it does not appear to have a precise meaning, since it does not seem to involve the recursiveness of a set of words. This can be rectified, for those who are fussy, by defining a format in which the data for any instance of the Post correspondence problem can be presented as an input word to a Turing machine. For example, let a be a symbol not in A, and let the data be presented as a word over the alphabet A U {aj, as follows: a1 a a2 oa. a a a b a b2 a o. a b ob 2n 1 2 n Then the unsolvability of the correspondence problem has the following precise formulation: the set consisting of all words in the above format, such that there exists a nonempty sequence i, i2a ooo~ i for which

218 a. a... a. b b b. 11 12 1k 1 2 k is not recursive. It may appear unsatisfactory that the statement of the result involves a particular data format. To see that this is not a serious objection, one should bear in mind that the result is invariant under any format conversion that can be effected by a Turing machineo So far we have seen the use of Turing machines in formulating the concept of undecidability, or recursive unsolvability. In addition to this, Turing machines form the subject matter for a number of interesting undecidability results. For example, the problem of deciding whether a Turing machine, starting from the initial complete state (qgo e, a ), will halt, is recursively unsolvable. In order to give a precise formulation of a question of decidability concerning Turing machines, we must reformulate that question as a question of the recursiveness of some set of words over a finite alphabet. One standard way of doing this is based on the technique of Godel numbering. A Turing machine T is described by its set of quadruples, involving the symbols R, L, ao0 al,. oo qo ql,... Let a be a function associating distinct positive odd integers greater than or equal to 3 with these symbols. For example, a might be chosen as follows: (R) = 53 a(L) = 5,

219 a(a ) = 7 + 4i, c(qi) = 9 + 4i. Now, with each quadruple S = (sl s2 s3,s4) we may associate an integer gn(S), the Godel1 number, of S, defined as U(s ) a(s ) U(s3) a(s) 2 3 5 7 Since we know how to obtain the unique prime factorization of an integer, and since the finite function a is one-to-one, there is an algorithm for recovering S from gn(S). Now, let the Turing machine T be specified by the quadruples Sl S2,.o, Sn. and let Pr(k) denote the k-th prime. Then a Godel number for T is t-r gn(S) )'1Pr(k) k=l different Godel numbers result from the n different orderings of the quadruples. Clearly, there is an algorithm for recovering T from any of its Godel numbers. The word 1 0 over the alphabet (0,1) is said to be a representation of T if and only if n is a Godel number for T. With this convention, we have a means of associating sets of Turing machines with sets of words, and thereby obtaining precisely stated undecidability questions and results. For example, the result that it is undecidable whether a Turing machine will eventually halt if presented with an

220 initially blank tape has the following precise formulation: (x x is a representation of T and T halts if its initial tape is blank) is not recursive.

13. READ-ONLY AUTOMATA In this section we discuss various classes of automata which can read their tapes, but cannot write on them. Many of the results to be described are due to Rabin and Scott [17]. Other contributions may be found in Minsky [30 ], as well as in Shepherdson [37], Moore [15], pp. 92-97, Rosenberg [36], Elgot and Rutledge [25], and Elgot and Mezei [24]. Within the restriction to read-only automata, there seem to be two ways to introduce a class of automata more powerful than finite automata: i) allow the input tape to be read in either direction, rather than only left-to-right; ii) provide more than one input tape. Two-way automata The first apparent generalization leads to the following definition. A two-way automaton 62 is a 6-tuple 6 = (AI, Q, qO, F, X, D), where the first five elements are defined in the same way as for a finite automaton (cf. Chapter 9), and D is a function from Q > A to (-1, +1]; D(q,a) specifies the direction of tape motion when is in state q and the symbol a is read. Let x = a a 1 *' anl be an element of A,. Then, associated with x, there is a unique sequence (bOnSO), (bls ),.,. (b, r) 221

222 of elements of (-1, O, 1,..., n) X Q. Intuitively, b gives the number of the square being scanned after r steps, and s denotes the state which occurs after r steps. The formal definitions are as follows: (b0' o) = (0, q0) for i=1,2,... (bisi) = (b_1 + D(si,ab), (siab )). i 1 The sequence terminates if, for some r, b = -1 r or b = n; r of course, the sequence may be infinite. The word x = a0 a1.. anl is accepted by ( if the associated sequence terminates in a pair (b,sr), where b = n r and s e F. r Surprisingly, two-way automata are no more powerful than finite automata, as the following theorem shows.

223 Theorem 1 Let a be a two-way automaton. Then T(Q ), the set of words (tapes) accepted by, is regular. The following proof of Theorem 1 is due to Shepherdson. PROOF: It is sufficient to show that T(f/) has only a finite number of derivatives. Let a be a symbol not in Q. We shall introduce two functions, f: Ai -Q U (a), and g: Q X - Q U (a). Let x = a0 a1... an-l and consider the sequence (bol So) 0,** (b, sr) defined above. If the sequence fails to terminate, or if it terminates with b = -1, r then f(x) = a; if the sequence terminates with (br sr), where b = n, r then f(x) = s From the definition, it is clear that, if the word xy e A1 is presented to (, and

224 f(x) = a then xy is rejected without any symbol in the subsequence y ever being scanned; if f(x) = sr then 62 is in the state s when the first symbol of y is scanned for the first time. Let g(q,x) be defined in the same way, except that the starting point of the sequence is (n-l,q) instead of (O,qo). Suppose the word xy is presented to 6; if g(q,x) = a, then, if 2 is in state q scanning the rightmost square of x (i.e., square n-l), then 6 will never again scan the leftmost square of y (i.e., square n); if g(q,x) = r, then, if 4 is in state q scanning square n-l, the next time it scans square n, it will be in state r. Now, suppose x1 and x2 are elements of A such that f(xl) = f(x2) and the functions gx (q) = g(q,xl) and g (q) = g(qx2) x2 2 are identical. Then, for any y, xly c T( )'= x2y T( 6) (I.e., D (T(,)) = D (T(2))). x1 X2

225 To see this, note that the computation of 6 when presented with the word xy alternates between phases in which symbols of x are read, and phases in which symbols of y are read; and, whether x = x1 or x = X2, the results of the phases involving x will be the same. If IQ I = n, n+l then there are only (n+l) ways to specify the functions f(x) and gx(q), so that T(, ) has at most (n+l) distinct derivatives. This completes the proof. Two-tape automata Since the provision of a single two-way tape does not augment the capabilities of finite-state machines (although it may result in the saving of states), we turn to an alternative possibility: a one-way two-tape read-only automaton. Such an automaton operates exactly as a finite automaton does, except that the state determines which of two input tapes is to be read next. Initially, tapes 1 and 2 contain the words xla and x2a, respectively, where x1 C A1, and x2 c AC, and a A; a serves as an end marker which indicates that all the remaining squares on the tape are unmarked. The automaton stops when it tries to read an unmarked square beyond the end symbol a. The ordered pair (xl,x2) is accepted if the automaton stops in a state which is an element of the designated set F of final states.

226 Formally, a two-tape automaton is a 7-tuple 4 = (A, Q q0, F, C1, C2, \), where A, Q, q0, and F have the usual meanings, X is a function \: Q X (AI U (a}) - Q, and (C1,C2] is a partition of Q. A complete state of 2 is a triple (s,y,z) E Q X (e U AiC) X (e U Aia); s gives the internal state, and y and z give the words remaining to be read on the two tapes. The successor (s',y',z') of a complete state (s,y,z) is determined as follows: if s c C1 and y = ay, where a c A U (o), then (s',y',z') = ((sa),,z); if s C C2 and Z = az, where a c A, U (oa, then (s,,z') = (x(s,a),y,z); if s e C and y = e, or s c C2 and z = e, then (sy,z) has no successor.

227 The operation of t when presented with the inputs (xlo,x2r) is specified by a sequence (SOYO, Z ), (sly1,Z), *.., (srYrZr) of complete states such that (a) (SOyOZO) = (q0ox1Ax2a) (b) (Si+l'Yi+.1Zi+l) is the successor of (siYi,zi), i=Ol, o r-l1 (c) (sr Yr zr) has no successor. The pair (xl,x2) is accepted if and only if sr F. Let T2( ) be the set of pairs of tapes accepted by 6. A two-tape automaton may be specified by a state diagram in which those states in C2 are distinguished by being shaded. Example 1. For the following automaton A1 = (0,1}, and T2( ) is the set of all pairs (x,x) such that x E A. F = (q4}

228 Example 2. For the following automaton, A = (0,1), s n m, ~'a and T2(,) is the set of all pairs (010, 010 where, m.and n range over the nonnegative integers. 0,1, These examples should indicate the importance of the end marker as a "warning" that the tape has been exhausted. The first result of Rabin and Scott about two-tape automata establishes that the projection of T2(ci ) on either the first or second coordinate is a regular set. Let E(6 ) be the set of all words x such that, for some y, (x,y) c T2(d ).

229 Theorem 2 For any two-tape automaton 2, E,( ) is a regular event. PROOF: For each state qj, we define the events Wj, Xj, Yj, and Zj, each a subset of -A, as follows: (i) x E W. if there exists y c A, such that for all t c -A and u E A, the following implication holds: if 6 has the initial instantaneous description (qO, xto, yuo), it reaches the instantaneous description (qj, to, uc). a (ii) x c X. if there exists y c AL such that, for all u in A1, the following implication holds: if 6 has the initial instantaneous description (q0, xc, yuo), it reaches the instantaneous description (qj, e, uo). a (iii) x c Y. if there exists y E A, such that, for all t in A, the following implication holds: if 2 has the initial instantaneous description (qO, xta, yo), it reaches the instantaneous description (qj, tc, e). (iv) x c Z. if there exists y c A, such that the following implication holds: if 6a has the initial instantaneous description (qo, xa, yc), it reaches the instantaneous description (qj, e, e). The idea of the proof is to write a system of simultaneous linear event equations from which it will follow, by Theorem 9-2, that each of the events defined above is regular. The desired result will then follow easily. To set up the equations, we need a few more definitions. For q c C1 and a C Q U ([D,

230 let R(q,a) be the set of states qj such that, for some y c AI, \(q,ay) = q, J and, if z is a prefix of y and is unequal to y, %(q,az) E C2. *x2 Finally, let the set Q be defined as follows: q ~ QO if there exists y c A, such that rp(y) = q and, if z is a prefix of y which is unequal to y, rp(y) C C2. With these definitions, we may write the desired equations. If q. Q0 Wi = e U Wi.a acQ (qi I q. R(qi,a)) If qJ / QO = W.aU aeQ {qi I qj R(qi,a)} Xj = U) W. (qi I q j R(qj,o)j Y= w. U Y.ia (qi I qic C2 and acQ (qi qi C1 and %(qi, ) = qij} (qi,a) = qj

231 Z YUi U U Xi qi | qi c C1 and (qi qi C C2 and x(q-i,) = qj} x(qicr) = qj It now follows from Theorem 9-2, together with the closure properties of the class of regular events, that each of the events W., Xj, Y., Zj, for all j, is regular. To complete the proof, we note that E,(Oa) = U i U Yj U Uj jcC1 jcC2 jcQ With Theorem 2 in hand, we can study the closure properties of the class of relations T2( Z) associated with two-tape automata. It is apparent that the class is closed under complementation. For, if 2 = (A, Q, q, F, C1, C2, %) and 62, = (A, Q, qo, Q-F, C1, C2, x), then T2( ) is clearly the complement of T2(i) in the set A1 X A> The class of relations definable by two-tape automata is not closed under set intersection, however. Referring to Examples 1 and 2, we have: T2() = (x,x) (x ) A, T2() = (On 1 0m, O 1 n)}, where ~, m, and n range over the nonnegative integers. Then T2( ) n T2(,) = (On1 0, nO On)}, where n ranges over the nonnegative integers~ But the projection of

232 T2(a) n T2(X) on the first coordinate is (On 1 On, n=l,2,... }, which is easily seen not to be regular. Since the class of relations definable by two-tape automata is closed under complementation, but not intersection, the class cannot be closed under union. For, by the identity A U B = An B, closure under union and complementation implies closure under intersection. Summarizing, we have the following result. Theorem 3 The class of relations definable by two-tape automata is closed under complementation, but not under set union or set intersection. Since the class of events representable by nondeterministic finite automata is the same as the class of events representable by deterministic finite automata, one might expect that the classes of relations representable by deterministic and nondeterministic two-tape automata would be the same. We shall show that this is not the case. A nondeterministic twotape automaton is a 7-tuple -= (AI, Q, Q0, F, C1, C2, L), where A, Q, F, C1, and C2 have the same meanings as for two-tape automata; also, Q0 C Q denotes the set of initial states, and L, the transition function, has domain

253 Q X (AI U (o)), and range 2Q, the set of all subsets of Q. A complete state of f is a triple (x,y,z) c Q X (e U A a) X (e U AI ). If (s,y,z) and (s',y',z') are instantaneous descriptions of %', then (si,y',z') is a successor of (s,y,z) if: (a) s c C1, y = ay', where a c A, U (ca, z1 = z, and s' c L(sa), or (b) s e C2, z = azt, where a e A, U (o}, y' = y, and ss C L(s,a). The ordered pair (x1,x2) C A X A is accepted by ~ if there is a sequence (SoYO, ZO), (slyi Zl)l...o (Sr'Yr,'r) of complete states of <6 such that: (i) (sOyOzO) = (qO,xZcrx2); (ii) for i=O,l,.,r-, (Si+lyi+i zi+l) is a successor of (silyiZi); (iii) either sr C1 and y = e, or sr C2 and zr = e; and (iv) sr c F. Let the set of pairs (xl,x2) accepted by Z be denoted T2(7 ), and referred to as the relation represented by;.

234 Theorem 4 Let - = (Ai Q, QO, F, C1, C2, L) and CZ = (A1, Qt' QO, F', C{, Cf, L') be nondeterministic two-tape automata. Then there exists a nondeterministic two-tape automaton " such that T2 ( ) U T2(') = T2( "). PROOF: We may assume without loss of generality that Q n Q' = cp. Set = (AI, Q U Q U',, F U F, C1 U C U C, L"), where L"(q,a) = L(q,a) if q C Q. and L" (q,a) = L' (q,a) if q c Q'. Then T2 ( U-) U T2(2') = T(" ). Let D2 be the class of relations representable by two-tape automata, and let N2 be the class of relations representable by nondeterministic twotape automata. Corollary 1 D2 is a proper subset of N2

235 PROOF: Clearly, D C N2 since every two-tape automaton is also a nondeterministic two-tape automaton. But D2 N2, since N2 is closed under set union, and D2 is not, By a proof similar to the proof of Theorem 2, it can be shown that, for any nondeterministic two-tape automaton 7J, E1,( ) = (x 1 3 y, (x,y) c T2(YI)} is regularo It then follows, by the same argument used in the deterministic case, that N2 is not closed under intersection. From this it can be inferred that N2 is not closed under complementation; for, if it were closed under complementation as well as union, N2 would have to be closed under intersection. We shall next establish that a certain problem of analysis concerning two-tape automata is recursively unsolvableo In all cases, the strategy we use in proving that a problem R is recursively unsolvable is to show that, if R were solvable, so also would be some other problem P which is known to be recursively unsolvable. The recursively unsolvable P that we shall most frequently resort to in such proofs is the Post correspondence problem. Theorem 5 There is no algorithm which, given any two-tape automata 1 and 6,o determines whether T2(21) n To2( 2) is empty.

236 PROOF: Let AI = (0,1}, and let al,...,a and b...,b be two lists of words in AI. For 1n n I1' I n i=l,2,...,n, let i be the sequence consisting of i l's followed by a zero. Clearly, there exist two-tape automata a and 7 such that T2( ) = [(12... a a...a ) } and T2( 3 ) = ((ili2. i bk, bil...b ) }, where, in each case, the first member of an ordered pair may be any nonempty sequence obtained by concatenating words from the set (1, 2,..., n}. To determine whether T{( ) n T( d ) is empty is equivalent to solving that instance of the Post correspondence problem given by the lists a,...,a and b,...,b. Therefore, there is no general decision procedure for the question "is T2( i ) T2( 3 ) cp, since the existence of such a procedure would imply that the Post correspondence problem was recursively solvable. As a second exercise in the application of the Post correspondence problem, let us consider a question concerning generalized sequential machines. We recall that a generalized sequential machine is specified by a 6-tuple = (, Q, Ao, q, \, ),

237 all of whose elements, except for 6, have the same interpretation as in * Moore machines; the output function 6 has domain Q X> A and range A.o The function rp(x) is defined exactly as it was for Moore machines, and seq(x), denoting the output sequence generated when an input sequence x c At is applied, is defined as follows: if x = a1 a2..e a then seq(x) = 5(q0,al), 5(rp(al),a2), ~ ~o 5(rp(al ~ ooo, anl), an) Theorem 6 It is undecidable whether, for two generalized sequential machines Y4 and 4 with the same input and output alphabets, there exists x c A such that seq (x) seq (x). PROOF: Let 1 and / be one-state generalized sequential machines, each with the input alphabet (1,2,,.,n); let q0 and q0 be the (initial) states of and A, let 5(q0,i) ai) and O(q0,i) = b. i=l,2,.o,no Then seq (ii2,o,ik) = a. a., A^ 11122 k 1 2 ik and seq -(ii2, ooo oik) 1= bi /^ ~1 2 k ib

238 Therefore, the problem of deciding whether, for arbitrary one-state generalized sequential machines ~ and,, there exists x such that seq, (x) = seq, (x), is equivalent to the Post correspondence problem, and is therefore recursively unsolvable. Two-counter automata In this section we discuss a class of read-only automata equipped with two counters of unbounded size, The main result, due to Minsky [30], shows that any Turing machine can be simulated by such an automaton, Using the known result that it is undecidable whether a Turing machine with blank initial tape will halt, a corresponding problem concerning twocounter automata is shown to be undecidable. Finally, a class of two-way two-tape automata is defined by "merging" the properties of the two-way and two-tape automata previously considered, and, two problems of analysis concerning these automata are shown to be recursively unsolvable. A two-counter automaton is equipped with a finite-state control unit, together with two "almost-blank" semi-infinite tapes, denoted 0e and 02o On each tape, square 0 contains an end marker a, and each of the other squares contains the "blank" symbol a o The machine never writes on its 0 tape, so that this state of affairs is never changed. Since two tapes are used, the set Q of internal states is partitioned into two sets, C1 and C2o The quadruples describing the operation of the automaton are of the following types: (qiaj,R,q) and (qi,aj,L,q), where a E (aa, a(o a.0

239 The interpretation of (qiaj.R,q ), in the case where C CO (a -l1,2), is as follows: if the internal state is qi and the symbol in the square of 0 currently being scanned is a., then move one square to the right on 0, and enter state q.o To insure that the automaton never goes beyond its end marker, no quadruple of the form (qi,o,L,q ) occurs. Thus, the tapes may be viewed as unbounded counters, each representing a positive integer by the index of the square being scanned. The complete state of a two-counter automaton is given by a triple (qm,n), where q is an internal state, and m and n are the indices of the squares being scanned on tapes 1 and 2. Minsky has described a sense in which any Turing machine T can be simulated by a two-counter automaton W. In discussing this simulation, we make the restriction (for convenience only) that the alphabet of T is [0,1 ) With any complete state (qx,y) of T, we may associate two nonnegative integers, as follows: I lg(x); if xy = a a1... al a-1 then n-l k = a,-2'. i=O For example, the integers associated with the complete state (q,0110,1010) are ==4, k=86. We can now describe the sense in which W simulates To Let Q be the set of states of T, and Q' the set of states of W. Associated

240 with the simulation is a one-to-one function h from Q into Q', and a one-to-one function cp from the set of complete states of T into the set of complete states of W, as follows: p[(q,x,y)] = (h(q),2k 32 o) The properties of the simulation are as follows: (i) if the initial complete state of T is (qOXOyO), then the initial complete state of W is p[(qo~XOyO)]; (ii) if the complete state (q,x,y) of T is followed by (qs,xl,y') then, if W is in the complete state cp[(qx,y)], it eventually reaches the complete state cp[(q',x',y) ]; however, no uniform bound can be given on the number of steps W requires to pass from cp[(q,x,y)] to p[(q',x',y) ]; (iii) W halts if and only if T does~ When W is in the complete state cp[(q,xy)] it must first perform a sequence of operations to determine which quadruple of T is to be simulated, and it must then carry out the simulation. Both of these processes are composed of'subroutines' which will be designated C(pq), where p and q are relatively prime integers. The subroutine C(pq) may be described as follows: whenever W reaches the instantaneous configuration (q,m,0), where ~ m = p r and p r, then the subroutine C(p,q) produces the final instantaneous configuration (q, q r,0) The subroutine operates as follows:

241 (1) Read 01 backwards, beginning at square m; for every p squares covered, move 02 to the right once; (2) If, when 01 is exhausted, it is found that m was not a multiple of p, restore 01 to its initial condition (using 02 to keep count of the number of times p must be added back) and stop; (3) If it is found that m was a multiple of p, read 02 backwards, and move 01 forward q squares for every square read on 02; return to (1). All the operations of T can be simulated using a finite number of subroutines. For example, moving to the right on the Turing machine tape leaves k unchanged and increases 2. Therefore, if such an operation takes T from (q,xy) to (q',x',y'), where k ~ P[(q,xy)] =(h(q),q.3,0), then k 2+1 p[(q',xsy)] = (h(ql),2 32 ) This can be simulated in W by applying C(355), followed by C(5,9). Similarly, motion left (except in the special case where x=e) can be simulated by applying C(9,5) first, and then C(5,3)o Writing on the tape of T can change the instantaneous configuration in two ways: replacing 0 by 1 in square 2, or replacing 1 by 0 in square lo If the former change replaces the configuration (q,xy) by (qa,x1,y'), where cp[(q,x,y)] = (h(q),2,3,0), then ~[(,i yt] = (hq') 2ak+2 2 0) = (h()2k62' 20) Cp[(q0,xyJ = (h(qg),2 0(,O)

242 Therefore, the required change can be effected in W by applying C(355), followed by C(5,6). The replacement of 1 by 0 is handled similarly. We have now shown that each primitive operation on T can be simulated by a sequence of primitive operations of W. To complete our summary of the simulation process, we must show how W determines which operation of T to simulate. Suppose W has the instantaneous description @p[qxy)] = (h(q),2.3,). The operation of T is determined by q, together with the symbol on the square being scanned. If the word xy is equal to cO,c1..',c n 1- where each symbol c. is zero or one, then the symbol being scanned is c~. We shall give a subroutine for W which enables it to determine the value of c, and thereby to determine the operation of T which is to be simulated next. By definition, n-l n-1 -1 k = ci2i = 2~(Z ci2 ) + c2. i=O i= i=O Therefore, the quotient obtained upon dividing k by 2 is n-1 ci21 i-Q It is apparent that this quotient is odd if c = 1, and even if c = 0. To determine whether this quotient is even or odd, we subtract 2 from k as many times as possible, keeping track of the parity of the number of substitutions performed.

243 The subroutine to perform this process is as follows: (1) The first step is to make a "spare copy" of k available, to be used eventually in restoring 01 to its initial k 2 position. Initially, 0 is at position 2 3. Apply 1I C(2,35), placing 01 at position 5'7 32; then apply C(7,2), obtaining 2k5 3 3 Go to step 2. (2) Whenever (2) is reached, the position on 0 is of the k r 2 form 2"5.3 2 Apply C(15,7); two results are possible: t2 k r-2 2 (a) 2 < r 2.5 r 7 2 k 22-r r (b) 2 > r 2 2 7 The occurrence of result (b) can be tested by a simple subroutine which checks whether the position on 01 is a multiple of 3, and then restoring 01 to its initial position. In case (a), go to (3); in case (b), go to (4). (3) Whenever (3) is reached, the position on 01 is of the k s 2 form 2 ~5 7. Apply C(35,3); two results are possible: 2 ks-2x 2 (a) 2 < s 2ks.2 3 (b) 2 >s 2 7.s3 In case (a), go to (2); in case (b), go to (5). (4) C2 = 0. To restore 01 to its original position, apply c(7,3). (5) C = 1. To restore 01 to its original position, apply c(7,3).

244 This completes our outline of the simulation of T by W; clearly, the number of primitive steps required by W to simulate one step of T is astronomically large. If W is started with the initial instantaneous description (h(q0),3,0) = p(qOeao)], then W halts if and only if T halts when started with an initially blank tape. Since it is not effectively decidable whether a Turing machine with tape initially blank will halt, it is also not effectively decidable whether a two-counter automaton with initial instantaneous description (q,3,0) will halt. We shall show that this undecidability result for two-counter automata yields some undecidability results concerning two-way two-tape automata of the type studied by Rabin and Scott [17]. A two-way two-tape automaton is obtained, loosely speaking, by combining the properties of the two-way automata and the two-tape automata that we have studied previously. Such an automaton is specified by an 8-tuple Q = (AI, Q, qo0 F, C1, C2, \, D). Each tape has a left end marker a1 and a right end marker a2; X is a function \: Q X (A U (c1,a2}) 3 Q, (C1,C2) is a partition of Q, and D is a function D:Q X (A U {(al,2}) (-l,}1) If q c CC, ca e (1,2}, then, upon entering state q,; reads tape c. If the symbol being read is a, then I2 moves tape ca one square in the

245 direction determined by D(q,a), and enters the state \(q,a)o A pair of words (x1,x2) C A X A is accepted by 6 if, upon starting in state q with clxla2 on tape 1 and x2-2 O on tape 2, scanning the left end marker of each tape, ( eventually stops by entering a state q e F n CQ while scanning the square to the right of 2 on tape a, where a may be either zero or one. Now let W be a two-counter automaton which simulates a Turing machine T. We shall define a two-way two-tape automaton W derived from W; the alphabet AI for W will contain a single letter 0. The operation of W, when presented with n n2 (xx2) = (0 l0 2 is as follows. To begin its operation, W moves three squares to the right on tape 1 (corresponding to the fact that W begins its operation scanning square 3 of tape 1). W then simulates W exactly (with o1 corresponding to the end marker for W) until one of the following situations occurs: (a) W reads the symbol 52 on tape a. In that case it moves to the right on tape a, and enters a state q not in F; the nl n pair (0,0 ) is rejected; (b) W reaches a configuration corresponding to a "halting configuration1" for W; W then moves off one of its tapes to the right,

246 while remaining in a state q c F; in this case, the pair 1 2 (0,O ) is accepted. Under what conditions is the set of tapes accepted by W empty? If W does not halt, then no tapes are accepted by W; W either shuttles its tapes back and forth perpetually or moves off a tape to the right in some state q, F. On the other hand, suppose W does halt. Then, in the course of its operation, W passes through a finite sequence of complete states: (qlmlnl),.., (qArmrnr). Let M = max mi, i and N = max n.. i 1 1 1 2 Then, given the tapes (0, ), W can faithfully simulate the entire computationL of W if and only if 1 > M and 2 > N; therefore W accepts I1- 12 (0,O ) if and only if W halts, ~ > M and 2 > N. The set of tapes accepted by W is nonempty (and, in fact, infinite) if and only if W halts. Since there is no general algorithm to decide whether W halts, we maydraw the following conclusion.

247 Theorem 7 It is undecidable whether the relation represented by a twoway two-tape automaton with one symbol in its alphabet is empty; it is also undecidable whether the relation represented by such an automaton is infinite. We have now completed a discussion of the events and relations representable by various automata that are not permitted to write on their tapeso It was found that two-way one-tape automata can do no more than one-way one-tape automata (finite automata) The capabilities of one-way two-tape automata are less clearly understood. We did find, however, that the class of relations representable by nondeterministic automata of this type is larger than the class representable by deterministic automata, and the closure properties of both classes were explored. Using the Post correspondence problem, it was shown that there is no algorithm to determine whether T( ) n T2( ) cP. It was also shown that two-counter automata can simulate Turing machines, although they seem to do so quite inefficiently. Finally, using our result about two-counter automata, we were abe to obtain some undecidability results concerning two-way two-tape automatao

14. GRAMMARS AND LANGUAGES It is of central interest in automata theory to characterize the classes of sets of words, or n-tuples of words, accepted by various classes of automata. When a class of events is defined naturally in some manner apparently unrelated to automata, it is of interest to determine whether this class is the class of events represented by some interesting class of automata. When such a happy coincidence is discovered, the discovery is a contribution to automata theory, and also confirms the original importance of the class of events. In this section we study the events defined by certain types of generative grammars. The most general such grammars, sometimes called unrestricted rewriting systems, first arose in computability theory, where they provide one method (Turing machines are another) for specifying the class of recursively enumerable sets. A particular class of such systems, the context-free grammars, will be our main object of study. Context-free grammars suggest themselves in several connections. They provide a formalization of constituent-structure analysis of sentences in natural languages. They also may be used to specify the well-formed formulas in many mathematical and logical systems, and in many computer programming languages. Finally, the study of noncommutative formal power series leads naturally to the introduction of context-free grammars. Let us introduce generative grammars by means of an example. Suppose we wish to give a finite specification of the set of all well-formed (and fully parenthesized) arithmetic expressions composed of left and right parentheses, the operations +, X, -, -, and the variables x, y, and z. 248

249 Such a specification is given by the following set of rewriting rules: S - (S + S) S - (S X S) S - (s s) S (-S) S -x S -- y S - z. Suppose we start with the string consisting of S alone, and repeatedly replace the left-hand side of some rewriting rule by the corresponding right-hand side, until finally a string is reached in which there are no occurrences of S. The set of all strings that may be derived in this way is precisely the desired set of arithmetic formulas. The following is an example of the derivation of a formula. S (S + S) (S + (S S)) ((-S) + (S - S)) ((-S) + (x S)) ((- (S + S)) + (x S)) ((- (y + S)) + (x S)) ((- (y + x)) + (x S)) ((- (y + x)) + (x ~ z)).

250 It may be well to indicate how the same kind of rewriting rules that we just illustrated suggest themselves in the analysis of natural languages. Consider the sentence "The young lady comes from Brooklyn." This sentence may be viewed as being composed of the noun phrase "The young lady," and the verb phrase, "comes from Brooklyn"; these may be called the immediate constituents of the sentence. These constituents may then be broken down further, with the complete breakdown shown in the following tree diagram, which gives a structural description of the sentence, with appropriate abbreviations defined below. S NP VP ARTICLE NP VERB P The comes Adj. NOUN Prep. NOUN young lady from Brooklyn

251 S = sentence NP = noun phrase VP = verb phrase PP = prepositional phrase Adj. = adjective Prep. = preposition Alternatively, the sentence may be generated by application of the following rewriting rules: S - NP VP VP -> Verb PP NP -> Article NP NP -> Adj, Noun PP -, Prep. Noun Article -T The Adj. - young Noun - lady Verb -4 comes Prep. -4 from Noun -> Brooklyno An important, and largely unsolved, class of linguistic problems is to find sets of rewriting rules that generate all and only the sentences of particular natural languages, or to show that, within certain constraints, such sets do not exist. Let us now enter into the formal development. A grammar is a quadruple

252 A = (V, T, S, P) where V is a finite alphabet called the vocabulary, T is a proper subset of V called the terminal vocabulary, V-T is the nonterminal vocabulary, S is a designated element of V-T called the initial symbol, and P is a set of ordered pairs (xi,yi), where xi E V*, Yi c V*, and x. A T. These ordered pairs are called productions. Sometimes it is convenient to write a production in the form'Xi Yi' The relation - (to be read'directly yields') over V* X V* is defined as follows: x -_ y if there exists words u c V* and v c V*, and a production (xi,yi), such that x = u X. v, and y = u yi v. The relation -- (to be read'yields') over V* X V* is the transitive closure of —. In other words, x > y if there is a sequence ZlZ2'".z such that x = zl y = zn and Z- z i=ly 2y I n-l zi Zi+ll Such a sequence is called a derivation or a zl-derivation of length n. L(), R the language generated by J, is the set of all words x such that x c T* and S=>x. Thus, grammars generate languages, whereas automata may be viewed either as recognizers of languages or, if they have output tapes, as generators of languages. Grammars are very similar to semi-Thue systems (Davis [3 ]), which can represent arbitrary recursively enumerable sets. The following theorem

253 should therefore not be surprising to those familiar with semi-Thue systems. Theorem 1 For any recursively enumerable set R, there is a grammar A? such that R = L(4 ). PROOF: Since R is recursively enumerable, there is a Turing machine T that halts if and only if its initial complete state is (qO, e, x), where x e R. Let A be the alphabet of T, and let Q be its set of internal states. The construction of, is based on the structure of T. The vocabulary V for/ is V = Q U A U (Sart). The terminal vocabulary of 1 is A, and its initial symbol is S. The productions of,- may be divided according to their function, into three subsets, P1, P2, and P3o The set of productions P1 enables the derivation from S of every word of the form u q v a such that (q,u,v) is a complete state in which T halts. The productions in P2 simulate the operation of T running backwards in time. They enable the derivation from every word u q v a corresponding to a final complete state (q,u,v), of all words a qO a such that, starting in the complete state (qO9 e, x), T halts in the complete state (q,u,v). Finally, the productions in P5 generate, from

254 each word a qo x da the word x. The specification of the productions can now be given. P1 S a ara r ra For all a c A, r ->ar. For all pairs qi.,a that are not in the first two positions of any quadruple of T, r - q.a.. P2 For each quadruple of the form q ajakq q9 ak 3 9iae For each quadruple of the form q.iaRq: (a) for all a c A, ajqa - qiaj a; (b) aj-qaoa- qiacr. For each quadruple of the form q iaLq: (a) for all a c A, q aaj - aqiaj; (b) caqaaj a - oqa.

255 P3 aqo -4 t For all a E A, ta -> at to -> e. It is easily verified that R = L(,). Theorem 2 For any grammar /, L() is recursively enumerable. PROOF: We give an informal description of a Turing machine T that halts if and only if its initial complete state is (q,e,x), where x E Lc ) T generates the S-derivations of~ in some effectively calculable order; for example, it might generate the derivations in order of increasing length, using a lexicographic rule to order the derivations of the same length. T halts if and only if it obtains a derivation whose final word is S. Theorem 3 Let Y be a grammar. Then there exists a grammar ~/' such that L( ) L(/): and such that every production of,4' is of the form (uAv, uyv), where A is a nonterminal symbol, lg(uAv) < 2, and lg(uyv) < 2.

256 PROOF: Every nonterminal (terminal) symbol for/Q will also be a nonterminal (terminal)symbol for'. Additional nonterminal symbols for / as well as the productions for (', are determined from the productions for j. For each production (al a2... am, b b2.. bn) of' 1 has the new nonterminal symbols al'2',am and Pl,2,.,r, where r = max(m,n). These sets of new symbols are disjoint for the different productions of A. Corresponding to the production (al a,.. am, b1 b2 *.. ) of, y' has the following productions: al -' a1 1la2 - 5152 I /: m-2 m-1 m-2 m-1 a.a p -3 PP m-l m m-1 m _m-2m-1 -' 3m-2m-1 II ~l12 -,152

257 III if m < n if m = n if m > n K1 - bl 1 - bl 1 - bl b b b b bl2 1 blb2 bl2 - blb2 2bi2 i blb2 bmlm bmm+l bm-l - b-lim n-ln n -ln 1m+l m+l m+2 bnn+l -i bn bnn+2 bn n- 1 bn-lin 6 b I -b b n n n n Of course, if m=l, the sets I and II given above are replaced by al - 1 It is easy to verify that L(/') = L( = ). This completes the proof. We have shown that the class of languages associated with grammars is precisely the class of recursively enumerable sets. Any class of machines which includes a machine to recognize each language must be as "powerful" as the class of Turing machines; in fact, Turing machines may be considered to perform this task only if rejection of a word is equated with failure to halt. It appears, then, that if we wish to

258 derive from the study of grammars information about new classes of automata not equivalent to the class of Turing machines, we must introduce restricted types of grammars. These considerations motivate the following definitions. A Type 1 grammar is a grammar in which, for every production (xi yi), lg(x ) < lg(yi). A context-sensitive grammar is a Type 1 grammar in which every production is of the form (uAv, uyv), where A is a nonterminal symbol. The following corollary is immediate from the proof of Theorem 3. Corollary 1 Let 4 be a Type 1 grammar. There exists a context-sensitive grammar /' that generates the same language as a in which, for each production (xi,Yi), Ig(x ) < Ig(yi) < 2. Theorem 4 If / = (V, T, P, S) is a Type 1 grammar, then L( /') is recursive. PROOF: Let x be an element of T*. If x c L(A/), then there is an S-derivation of x in which no word is repeated. Moreover, since / is of Type 1, every word in such a derivation is of length less than or equal to lg(x). Since the number of words z c V* such that lg(z) < lg(x) is finite, the number of sequences of such words without repetition is finite. An algorithm to determine whether x c L(/ ) is to enumerate all

259 such sequences, and check whether any of them are S-derivations of x. Since there obviously exists a Turing machine that can perform this process, L(. ) is recursive. Corollary 2 There exist languages generated by grammars but not generated by Type 1 grammars. PROOF: It is known (Davis [3 ]) that there are recursively enumerable sets that are not recursiveo Theorem 5 There exist recursive sets that are not Type 1 languages. PROOF: The proof is based on a simple diagonal argument. Let the terminal alphabet T be fixed, and equal to (a, al,.o.,apl)}. It is easy to construct a 1-1 function f from T* onto the nonnegative integers. One way to do so is as follows: set 0(a.) = i, i=0,1l2, o.,p-l. Then f(e) = 0, and n-l n f(b0obl,. o ebnl) = - + 1 + (bi)P p-l 1+ i=0 Clearly, the function f (n) can be computed by a Turing machine. On the other hand, if a language over the terminal alphabet [a, al...,ap. 1 } is generated by a Type 1 grammar, then it is generated by a Type 1 grammar

260 in which the nonterminal symbols are chosen from the infinite sequence Sl'S2Y.aSS n. o. But, by a device similar to Gbodel numbering, we can construct a one-to-many mapping g from the set of all such grammars into the nonnegative integers, as follows: let a(a.) = 3 + 2i, i=0,l,. 0,p-l let a(->) = 3 + 2p let a(Sr) = 3 + 2p + 2rt Any production P = x-Yi can be viewed as a word b b2 o.b over the infinite alphabet 2" s r U (-j} U {S1,S2,` ~}, and its G6del number is a(bi) gn(p) = TT Pr(i) i=1 Then, if y has productions lP2,2 p~r p, one of the n! possible values of g(/ ) is ( gn(P) 7T Pr(i) i=l Clearly, for any integer N, there is at most one grammar A such that g(k) = N, and it can be determined by a Turing machine whether such a grammar exists, and if so, what it is, Now, consider the following set: I -I T*: x c H if either

261 (a) there is no grammar, such that f(x) = gC ) or (b) there is a unique grammar / such that f(x) = g(), and x, L( )o. Clearly, H is a recursive set. To decide whether x E H, compute f(x). If the inverse image g (f(x)) is empty, x c H. If g-(f(x)) = test whether x C L(e ). If not, x C H. On the other hand, H is not Type 1. For, suppose H were equal to L(/), where/k is Type 1. Consider x = f (g(L()); x E H if and only if x,- L(l)o This completes the proof. Even though the Type 1 languages form only a proper subclass of the class of recursive sets, there is a sense in which they are closely related to the class of recursively enumerable sets. Consider a grammar 4/ = (V, T, S, P); /1 is not necessarily Type 1. We shall construct a Type 1 grammar such that L(y') is empty if and only if L(J ) is empty.,' will have the terminal vocabulary T U (r), and nonterminal vocabulary (V- T)U (a}, where a and T are new symbols not in V. The productions of /' are as follows: if (al... am, bl o.. bn) is a production of a1, with m < n, then (a. a, b.. bn) is also a production of'; if then 1 m n (a1... am b1 o.. bn) is a production of /A, with m > n, then (a1.. am, b1... b a ) is a production of Y'; for a c V, (ta, aa) is a production of DA'; finally, (a,T) is a production of.'.

262 For any word y E (T U (T))*, let h(y) be the word obtained by deleting all occurrences of T. Then, if y C Lc( ), h(y) c L(e ); also, if y e L(A)), then = h(y), for some word y in L'). Therefore, L(tY) is empty if and only if Lt ) is empty. Using this fact, we can show that it is undecidable whether Lj(1') is empty. We begin by recalling that there is no algorithm to determine, of a given Turing machine T, whether there exists an initial tape that causes T to halt. But, for every Turing machine T we can construct a grammars/ such that x c L( i) if and only if T halts when it starts in the complete state (qO, e, x) Therefore there is no algorithm to determine whether LQ() (or L(')) is empty. Thus, we have established the following result. Theorem 6 It is undecidable whether the language generated by a Type 1 grammar is emptyo Corollary 3 It is undecidable whether two Type 1 grammars generate the same language. PROOF: Choose one of the grammars so that its set of productions is empty; the result then follows from Theorem 60

263 Later we shall use the Post correspondence problem in the derivation of several undecidability results concerning context-free grammars, which will be studied in the next section. Having reached some understanding of Type 1 languages, we now undertake to characterize the class of automata that recognize such languages. In proving Theorem 1, we showed how to construct, for a given Turing machine T., a grammar ~ that generates the set of words accepted by T; that is, T halts if and only if its initial tape configuration is an element of L( /). Inspecting this construction to determine why 4 fails to be Type 1, we find that the productions of A that do not meet the requirements of a Type 1 grammar include the following: for each quadruple of the form qia.RqV, the production ajqa -o- 4 qiaja; for each quadruple of the form q.iaLq, the production aq a -, oqiai. Both these productions occur because of the possibility that T may read blank squares of tape that did not initially contain data; one might expect that, if an automaton could be defined that had all the capabilities of T except the ability to "grow new tape", then the events recognized by such automata might turn out to be the Type 1 languages. Landweber [28] and Kuroda [27] have shown that this is indeed the case; the desired automata are the nondeterministic linear bounded automata. A (nondeterministic) linear bounded automaton (l.b.a.) 2 is specified by its alphabet A, its set Q of internal states, its set of initial states Q0, its set of final states F, and its commands, in the form of

264 Turing machine quadruples. The set of symbols occurring on a tape is A U [al',a2 where al and a2 are, respectively, left and right end markers. Among the quadruples of the form qiajak,qV, the following conditions must be met: if a. = then if a. =- then k= 2; if a. c A, then ak c A. In other words, end markers are neither created nor destroyed. Since U is nondeterministic, there is no restriction on the number of quadruples having a given pair (qi.aj) in the first two positions. The lob.ao 2 begins its operation in some state q C Q0, with a word of the form alxo2 on its tape, where x c A, and with the square containing c1 being scanned. At each step, 6 executes one of the quadruples associated with its internal state qi, and a., the symbol on the square being scanned. Thus a operates in the same manner as a (nondeterministic) Turing machine, except that it halts upon entering that portion of the tape beyond its end markers; in other words, ( halts upon executing a command of the form qjo2Rq0 or q.cLq.o Thus, the

265 operation of d is similar to that of a two-way one-tape read-only automaton with end markers, except that z is capable of writing on its tape. The word x is accepted by 6 if, starting with the initial tape configuration alXc2, i has some sequence of operation such that it eventually runs off its tape to the right in some state qi e F; in such a sequence, the final quadruple is q.i2RqI. Sequences in which 6 fails to terminate, runs off its tape to the left, stops without running off the tape, or runs off the tape to the right in a state belonging to Q-F, do not contribute to the acceptance of any words. The following theorem is a variant of a theorem due to Landweber [28] whose definition of'linear bounded automaton' differs slightly from ours. Theorem 7 Let B be the set of nonempty words accepted by a nondeterministic linear bounded automaton 6. Then there exists a Type 1 grammar such that B =L( ). PROOF: To clarify the idea behind the proof, we first give a construction of a grammar4" that does not quite satisfy the Type 1 restriction, and then modify it. The first construction is similar to the construction used in showing that every recursively enumerable set is a language. In making this construction, we think of a complete state of e as represented by a word olxqiya2, where xy c A*. This word indicates that the tape contains the word alxyo2, that the internal state is qi, and that the square containing the first symbol of yo2 is being scanned. The productions of I may be partitioned into the three subsets P1, P2, and P3. P1 generates all those complete states that can occur

266 immediately prior to acceptance of a nonempty word; i.e., all sequences of the form olxq.i2 such that x e At and C has a quadruple of the form (qi, 2,Rqq), where q c F. P2 then "runs 1 backwards", and P3 takes all words of the form qialxr2 such that qi c QO, and removes the symbols qial, and a2o It is in this final phase that the Type 1 condition is violated. The productions of XA are as follows. Pi S - a1 r o2 For all a c A, r -- ar. For all a c A, and all qi such that e has a quadruple qia2Rq,, where q CE F, r - a qio P2 For each quadruple of the form q ajakq qak - qiaio For each quadruple of the form qiajRq, a. a/ U2 aj.q -- qiajo For each quadruple of the form q a Lq j a.,l and for all a c A, q~ a aj - a qi aJ. P3 For all qi E Q^0 and for all a C A, qA a - a.

261 For all a c A, a2 -X a. We omit the proof that this grammar generates the set B. In supplying the proof, one may not presuppose that the productions in P1 are applied before those in P2, and the productions in P2, before those in P. Because of the productions in P3, A fails to be Type 1. An equivalent Type 1 grammar can be obtained, however. Note that, in each production of, the number of occurrences of elements of A on the left-hand side is less than or equal to the number on the right-hand side. Therefore, if u -- v in 6, then u has no more occurrences of elements of A than v does. Suppose, then, that u and v are "factored" into subsequences, each containing only one occurrence of an element of A, and the same is done for v; then u will contain no more subsequences than v does. Moreover, only a finite number of distinct subsequences are required in all of the factorizations. This suggests the construction of a new grammar' with a nonterminal element serving as a "coded representation" of each required subsequence. For clarity, we use the following mnemonic device: if x is one of the required subsequences, then (x) denotes the corresponding nonterminal symbols of'. The following is a list of the new nonterminal symbols:

268 For all a c A, for all q c Q: (q a1 a a2) (q, al a) (a aR) (al q a c) (al q a) (a a) (or a aq a) (q a c2) (a q) (ar a q) (q a) (a q 2) We shall not give the full details of the modification of the grammar given above, using these new methods. As an illustration, however, we note that the set of productions P3 is replaced by the following set of productions P'3: P'3 For all a c A, for all qi E QO (qi c1 a a2) -e a; (qi a1 a) - a; (a 2) As a second illustration, we remark that each quadruple of the form (qiajRq ), a. c A, gives rise to the following productions, in which'a' is understood to range over the elements of A. (Clajq. 2) - (a1qiajq2) (alaj) (qaac2) e- (crlqiaj) (a2) (larq) -e ( Lqi a) (ola )( a) -) (5lqaj)a (qakka2) - (qiaj2) (C aq )(ak 2) - (oqla)(aCT) (Iqa) I (i )

269 (claq) ak - (alaqi) a (qak'2) (- (qiaja2) (aq) (aka2) - (a qi)(a o2) (a q2)ak 4 (a qi)a. The truth of the converse of Theorem 7 is quite apparent. It was first pointed out by Kuroda [27]. Theorem 8 Let / = (V, T, S, P) be a Type 1 grammar. Then there exists a linear bounded automaton C for which the set of words accepted is L( ). In specifying the l.b.a. 2, we do not require that the alphabet of / be restricted to the terminal vocabulary of; A finite number of additional symbols may be present in the alphabet of; these symbols may be written in the course of the calculation, as a uses its tape for "scratchwork". PROOF (of Theorem 8): We shall describe informally how 6 determines whether a word x is an element of L(, ). The alphabet of 6 is V U (Op), where cp $ V. The symbol cp serves as a "blank"1 Initially, 6 reads its tape from left to right, checking that all symbols between the end markers are terminal, and that the end markers are not adjacent. 6 then moves back to the left end marker a1, and enters the "search" mode. Again 6 scans the tape from left to right. So long as no symbol read is the first symbol of the right-hand side of a production, the left-to

270 right-'scan is coatinued. Suppose, however, that' a' symbol isencountered that is the first symbol of the right-hand sides of the productions (xi. )' (x'y )'...' (x y )Yi 1 1 2 2 r r Then 6 has r+l possible courses of action (remember that a is nondeterministic!). 2 may either continue searching, or it may enter a state q. which corresponds to the "guess" that the symbol being scanned is the first symbol of y.. Suppose (x.'i ) = (al...,am, bl,'**, bn). 0 S When the state q. is entered, the left-to-right scan is continued, to determine whether the symbols bl,b2,...,b occur successively. (The answer is automatically "yes" if lg(yi.) = ) In this process, occurrences of the symbol c are ignored. If the answer is "no", then a moves off the right end of the tape in a non-final state. If the answer is "yes", then a. is written in place of b., j=l,2,..,m, and p is written in place of bj, j=m+l,...,n. Since A is Type 1, this is always possible: lg(x ) < lg(y ). J J c9 then moves back to the left-hand end marker, and reads the tape from left to right to determine whether the word between end markers is of the form cPSqcp; if so, 6 moves off the right end of the tape in some state q c F. If not, C returns to the left end of the tape, and reenters the search modeo It is easy to check that 6 accepts x if and only if there is an S-derivation of x in.

271 Theorems 7 and 8 establish that a language..is Type 1 if and only if it is the set of all nonempty words accepted by some nondeterministic linear bounded automaton. From this, one can easily see that the following events are Type 1 languages: (1) the set of all words with more zeros than ones; (2) the set of all nonempty words whose length is a square; (3) the set of all words whose length is a prime; (4) the set of all words of the form akb ck, k=l,2,...,n. Deterministic linear bounded automata suffice for the constructions required in these examples. It is not known whether there exists events representable by nondeterministic linear bounded automata, but not by deterministic linear bounded automata. Myhill [ 31] has obtained a result which, when taken together with Theorem 7, shows that the Type 1 languages form a very substantial subclass of~the class of recursive sets. His result is that the class of events represented by deterministic linear bounded automata includes the class of rudimentary events. The rudimentary events (cf. R. M. Smullyan, "Theory of Formal Systems") are those that can be expressed in a vocabulary consisting of: (i) the alphabet A; (ii) the three place predicate uv = w, with the interpretation "the word w is the concatenation of u and v"; (iii) the Boolean connectives'and''or', and'not'; (iv) the bounded existential quantifier 3 v < w, with the interpretation "there exists a word v such that lg(v) < lg(w)".

272 For example, the set of all words of the form xx is expressed by the formula 3 x < y (xx = y). It can be shown then, in any grammar, the set of all derivations is a rudimentary event, and therefore a Type 1 language. Similarly, the set of proofs in any of a large class of formal systems is rudimentary, and therefore Type 1. Theorems 7 and 8 may also be used in studying the closure properties of the class of Type 1 languages. Theorem 9 The intersection of two Type 1 languages is a Type 1 language. PROOF: Let L1 and L2 be Type 1 languages over the alphabet T. By Theorem 8, there exist nondeterministic l.b.a.'s 61 and 2 that represent the events L1 and L2, respectively. Let the vocabularies of these l.b.a.'s be V1 and V2; of course, T CV n V. We shall specify a nondeterministic l.b.a. 63 that represents the event L1 n L. The alphabet of a3 is (V1 >< 2) U T; the operation of 63 is as follows. First, ~ reads its tape from left ro right, checking that each symbol encountered is an element of T, and replacing each symbol a by (a,a). Throughout the rest of the operation of a3b all symbols on the tape are elements of V1 X V2. We may

275 view the tape, therefore, as having two parallel "channels", one for the first components of ordered pairs, and the other for the second components. After its initial reading of the tape, 63 resets the tape to its initial position, and simulates a, using only the first channel (first component of each ordered pair); however, at the point where 1 would run off the tape to the right in a final state, 3 returns to the left-hand end marker and simulates 2' using the second channel. Clearly, a word is accepted by 3 if and only if it is an element of L1 n L2. Therefore, by Theorem 7, L1 n L2 is a Type 1 language. Some of the other closure properties of the Type 1 languages are quite easy to prove. We derive them by means of a sequence of lemmas. Lemma 1 For any Type 1 grammar / = (V, T, S, P), there is a Type 1 grammar A' in which no terminal symbol occurs on the lefthand side of a production, such that L(') = L( ). PROOF: Let T' be a set of symbols not in V, in one-to-one correspondence with T. Let a be a homomorphism from V* into ((V-T) U T')*, defined as follows: if A C V-T, a(A) = A if a c T, a(a) is the corresponding element of T'. Then -' = (V U T', T, S, P'), where the set of productions P' is defined as follows: P' = ( (((xi),a(Yi)) l (xiyi) C P} U ((a(a),a) | a c T).

274 It is now clear that L(') = L( ). Let L1 and L2 be Type 1 languages. With no loss of generality we may assume that L1 = LC ) and L2 = L 2) where / = (VlTl'Sl'Pl), 2 2 (v2,T2,S2,P2), (V1 - T1) n (v2 T2) =, and neither 1 nor R2 has a terminal symbol on the left-hand side of any production. Lemma 2 L U L2 is a Type 1 language. PROOF: Let S be a symbol not in V1 U V2. Then L1 U L2 is generated by the grammar (V1 U V2 U (S}, T1 U T2, S, P1 U P2 U ((S,S), (S,S2)). Lemma 3 L1 = ( x E L) is a Type 1 language. PROOF: L1 is generated by the grammar (Vl9 T S1, ((Xii) (XiYi) P)). Lemma 4 L1'L2 is a Type 1 language.

275 PROOF: Let S be a symbol not in V1 U V2. Then L1 L2 is generated by the grammar (V1 U V2 U (S), T1 U T2, S, P1 U P2 U ((S, Sl S2)). Lemma 5 Lt is a Type 1 language. PROOF: With no loss of generality, we may assume that S1 does not occur on the right-hand side of a production. Let 35 = (V3s Tl, S3, p) be a grammar that generates L1, obtained from /1 by renaming each element of (V1 - T1). Then V1 - T1 V3 -T I and v n V3 = T. Let S and S' be symbols not in V1 U V3. Then the following grammar generates L. (V1 U V3 U (S,S', T1, S, P1 U P3 U ((S,SS'), (S',S S), (S,S), (S',S3)}). It is not known whether the class of Type 1 languages is closed under complementation. Kuroda [27] has shown, however, that the class of events represented by deterministic linear bounded automata is a Boolean algebra of sets. From this we can draw the following inference: if every event representable by a nondeterministic l.b.a. is representable by a deterministic l.b.a., then the Type 1 languages are closed under complementation.

276 Context-free grammars and languages A grammar' = (V, T, S, P) is a context-free grammar if each of its productions is of the form (A,y), where A c V-T. Since there is no requirement that y ~ e, there are context-free grammars that are not Type 1. If the additional requirement y A e is imposed, a context-free grammar is called a 1-context-free grammar. Every 1-context-free grammar is a Type 1 grammar. The language generated by a context-free grammar is called a context-free language. The grammars that were used as examples at the beginning of Chapter 14 are both context-free. The properties of context-free grammars and languages have considerable relevance to the study of natural languages, as well as artificial languages used in programming digital computers. Although natural languages appear not to be context-free (In fact, Postal [35] has proved this for the Mohawk language.), reasonable approximations to natural languages can be generated by context-free grammars. Oettinger [31] and Yngve [38] have made use of this fact in their studies of automatic translation. Also, large portions of ALGOL and other compiler languages for digital computers have been specified by context-free grammars. Our discussion of context-free grammars and languages is based, to a considerable extent, on Bar-Hillel.Perles, and Shamir [20], Landweber [29], and Chomsky [21, 22, 23]. In thinking about properties of context-free languages, it is useful to bear in mind a natural method of representing a derivation in a contextfree grammar by a tree. An example should suffice to explain this representation.

2-77 Example 3 Consider the following S-derivation of the word abac. Word Production used in obtaining word S AB S -- AB abB'A -4 ab abAc B - Ac abaBc A - aB abac B - e. The tree representing this S-derivation is as follows: S //B a b A c a e

278 The length of a longest path from the root to a leaf in a tree representing a derivation is called the depth of the derivation. Theorem 10 Every regular event is a context-free language. PROOF: Let R be the regular event represented by the finite automaton = (AI, Q, q' F, \). The context-free grammar, z = (AI U Q, AI, q P) may easily be seen to generate R: For all (q,a) E Q X AI, q -4 a X(q,a). For all q C F, q -4 e. Bar-Hillel, Perles, and Shamir [20] have shown that, if ~ is a one-way two-tape read-only automaton, and if a is a symbol not in the alphabet of, then (x a | (x,y) E Ta( )} is a context-free language. We shall obtain this result quite easily later, once we have defined the class of automata that accept all, and only, the context-free languages. Using the methods of proof employed in Lemmas 1, 2, 3, and 4, the following closure properties of the context-free languages are easily obtained.

279 Theorem 11 The class of context-free languages is closed under the unary operations t, *, and -, and under the binary operations U and PROOF: Let L1 and L2, respectively, be generated by the context-free grammars 1 = (V1, T1, Si' P1) and 2 (V2,( T2, S2, P2). With no loss of generality, we may assume that the nonterminal vocabularies of i1 and 2 are disjoint, and that neither S1 nor S2 occurs on the right-hand side of a production. Then LI is generated by the grammar / t = (v1, T1, S, P1 U ((Sl,SlS )}) * L1, by the grammar * -= (VlY T1, Sl1 P1 U ((Sl,SlS), (Sle)}), and L1, by the grammar /~~1 = (V1, T1, Sl' P1) where l -= ((A,y) | (A,y) c P1}. Also, L1 U L2 is generated by the grammar (V1 U V2 U (S), T1 U T2, S, P1 U P2 U ((Ssi), (, S2), where S A V1 U V2, and LiL2 is generated by the grammar (V1 U V U (s), T1 U T2, S, P1 U P2 U [(S,SS )}.

280 We omit the proofs that these grammars have the properties that are claimed. It turns out that the class of context-free languages is not closed under set intersection or complementation. In order to prove this, we must first develop a technique for proving that certain events are not context-free languages; the development of this technique will require some preliminary results. Lemma 6 It is effectively decidable whether the null sequence is in the language generated by a context-free grammar Y _= (V, T, S, P). PROOF: Let the following sequence of subsets of V-T be defined V = (AJ A - e) for k=l1,2,.. Vk+ V U (A | there exists (A,y) c P such that y E Vk}. Thus, A Vk if and only if there is an A-derivation of e of depth less than or equal to k. Therefore v1 C V. C Vk C Vk also, if V = Vkl, then V = V for any r greater than or equal to k. k k+1I k r Therefore, if V-T = n, then e c L(,7) if and only if S E V n The required decision procedure, then, is to construct V and determine whether S c V n n

281 Theorem 12 Let 7 = (V, T, S, P) be a context-free grammar. Then one can construct a 1-context-free grammar A(' such that LW'.) = L(.j) nTt PROOF: There is nothing to prove unless the set V defined in the proof n of Lemma 6 is nonempty. If V is nonempty, the productions of 4' are all the pairs (A,z) such that z ~ e, and there is some element (A,y) e P such that z = y or z can be obtained by deleting from y some occurrences of elements of V. Clearly, B' is a 1-context-free grammar. Also, if S - x in / and x e e, then S x in To see this, consider the tree T specifying an S-derivation of x in A. Associated with each internal node n of T is a subtree T(n) containing all the nodes connected to the root through n. Delete from T every leaf labelled e, every node n such that all the leaves of T(n) are labelled e, and every branch connected to a deleted node. Then the resulting tree T' represents an S-derivation of x in A'. The following diagram illustrates the construction of T' from T.

282 S S / A B. A B C a C B A a A ~e~~~ e e b c b c T T' Conversely, to every S-derivation of x in', there corresponds an S-derivation of x in k. Every time a production (A,z) is used in the former derivation, where (A,z) is obtained from some production (A,y) of,, then the latter derivation uses (A,y), followed by additional productions to replace by the null sequence each of the nonterminal elements deleted in passing from y to z. Corollary 4 Every context-free language is a recursive set. PROOF: Every 1-context-free language is Type 1, and therefore recursive. Taking this together with Lemma 6 and Theorem 12, one obtains the desired conclusion.

283 Let us define a 2-context-free grammar as a contest-free grammar in which, for every production (A,y), lg(y) > 2. Theorem 13 Let ( = (V, T, S, P) be a 1-context-free grammar. Then one can construct a 2-context-free grammar /' such that x C L(/') if and only if x E L( ) and lg(x) > 2. PROOF: For each A c V-T, let the following sequence of sets be defined: W1(A) = (A) for k=l,2,... Wk+l(A) = Wk(A) U (B i(B c V and 3 xlx c Wk(A) and (x,B) c P)J. Thus, for any B c V, B e Wk+(A) if and only if there is an A-derivation of B of length less than or equal to k. It also follows that w1(A) CW2(A).. CWk(A) C.. and that, if V-T = n, there is an A-derivation of B if and only if B c W (A). In particular, n+l a c L( /) if and only if a E W,+(S). The productions of the grammar /' are defined as follows: (A',a' a a'), n > 2, is a produc1 2 n tion of /' if and only if b has a production (A, a a2.o a ) such that A c W n (A') and, for i=l,2,,..,n, i Wn+l i We omit the proof that /' has the desired properties. As a substitute, we give the following tree diagrams, which illustrate the correspondence between derivations in 1< and derivations in'.

284 S S A D AI' D a B B A B B A C C b a a b b a c b B pB c a c C ~~~/ \ ~b a b Thus, the class of context-free languages and the class of 2context-free languages differ very little from one another. The relationship can be summed up as follows: an event P is a contextfree language if and only if there exists a 2-context-free language such that L(a) C P and every word in P and not in L(^) is of length less than or equal to 2o

285 The following theorem, due to Bar-Hillel, Perles, and Shamir [20], provides the tool that we require to prove that certain languages are not context free. Theorem 14 Let ~ = (V, T, S, P) be a 2-context-free grammar. Then there exists an integer p such that, if x c L( /() and lg(x) > p, then x can be written in the form x = rstuv, where s and u are not both null and, for all k > 1, kt k C L /A rs tu v E L ). PROOF: Let n = V-T. Then the number of S-derivations of depth less than or equal to n is finite. Let p be the maximum length of a word over the terminal alphabet T generated by an S-derivation of depth less than or equal to n. Let x be an element of LQ) such that lg(x) > p. Then any S-derivation of x is of depth greater than n; therefore its tree includes some path from the root to a leaf which includes two occurrences of some nonterminal symbol B. Let the occurrence of B closer to the root be at node nl, and the second occurrence, at n2. Then the tree for the Sderivation of x has the form indicated in the following diagram.

286 S ~~~~~~~~~~~~~~~~~~~~~~r V. s B^ 2 u,\ t

287 Thus, the following derivations exist in I: S = —- rBv, B = — sBu, B at, where x = rstuvo Since / is a 2-context-free grammar, it follows that s and u are not both null. But, any word rsktukv, k > 1, has an S-derivation in as follows: apply the derivation S === rBv, followed by k iterations of B - sBu, followed by B =- t. This completes the proof of Theorem 14. Using Theorem 14, we can now continue our studies of the closure properties of the class of context-free languages. Theorem 15 The class of context-free languages is not closed under intersection. PROOF: The language L1 = On 1 On, n > 1} is a context-free language generated by the grammar. S - 0S 0,

288 S -4 010; the language L2 = 1 0 * is regular, and is therefore context-free. By Theorem 11, the language L1.L2 [O n 1 n 1 k k is context-free, and so is k On n L ~L = (O 1 O1 n >l, k > 1. Suppose L!L2 n L1'L2 on 1 o n > 1} is context-free. Since it contains no words of length less than 3, it must be the language generated by some 2-context-free grammar Y Theorem 14 therefore applies, and it asserts that, for large enough n, the word 0n 1 On 1 On = rstuv, where s and u are not both empty and, for all k > 1, r s t u v C L(/ ), Since every word in L(/) contains exactly 2 1s, neither s nor t contains a 1. Then rs tu v must be of the form n n~ n 0 1 0 1 0 where max(nn2,nj3) > n1 and Min(n,n2,n ) = n. This contradicts the assumption that L1~L2 n L1~L2

289 is context-free; and this contradiction completes the proof. Corollary 5 The class of context-free languages is not closed under complementation. PROOF: By the identity L n L = L1 U L closure under union and complementation would imply closure under intersection. Corollary 6 There exist Type 1 languages that are not 1-context-free languages. PROOF: The languages L1 L2 and L1'L2 are 1-context-free, and therefore Type 1. Since the intersection of Type 1 languages is Type 1, L1'L2 n L1'L2 is Type 1, but not context-free. The following theorem, first given in [20], is a fundamental one in the study of context-free languages. Theorem 16 The intersection of a context-free language and a regular event is a context-free language.

290 PROOF: Let = (V, T, S, P) be a context-free grammar and let 6 -= (T, Q, q0 \, F) be a finite automaton representing the event P( ). We shall construct a context-free grammar 7' that generates the language L(/) n P(f), The vocabulary of -' will be (S3 U T U (Q X (V-T) X Q), and the productions of' will be as follows: (i) For all q C F, S -4 (qo0sq). (ii) For every production of, of the form A -a alaa2,o..,am and for every sequence of m+l (not necessarily distinct) states q,q i,...,'qi qi 11 1' i m+1 (qg' Aq ) - (i,alq ) (i,a2',q) ~~ (qi a,qi I1 1-m+l 1 i 2 2 3 m M m+l (iii) For every triple (qoaqj) such that a e T and \(qia) = q., (qJaj) a, Then, if q c F. and al,. O^,a is a word over the alphabet V, the grammar k' generates all strings of the form

291 (qI,aliq ) (j,a2,qj )... (,am), (1) j ji jJ 01 1 m-2 where q.,...,q are arbitrary elements of Q, provided that there is J1 0m-l an S-derivation of al,...,am in q; otherwise, no string of the form (1) is generated. Suppose, further, that al,...,am is a string over the alphabet T, and that rPA(alo..,a ) = q Then, in particular, "A' generates the sequence (qO, alrPA(a )),(rA(a l),a2,rpA(a2)),.,. (rPA(al,.o,am_,), am q). And, from this sequence, productions listed above under (iii) yield al.9.,ame On the other hand, if al,..,a eC L(m) but rpA(al,..,am) is not an element of F, there can be no derivation of a1,...,am in,'. Therefore, L( -' ) = L(/) n P(d') By a proof quite similar to the proof of Theorem 16, we shall now establish what Ginsburg [ 9] has called the Machine Mapping Theorem: a (nondeterministic) generalized sequential machine maps languages onto languages. A (nondeterministic) generalized sequential machine (g.s.m.) is a 5-tuple "P = (AI, Q, A0, Q0, R), where the symbols AI, Q, A0, and QO have their usual meanings, and R is a set of quadruples of the form (qiaj.,xq ), where qi and qI are elements of Q, a. E A, and x E Ao The interpretation of the quadruple is as follows: if 2iX is in state qi and receives the input a., it may produce the output sequence x and enter state:q.o The number of quadruples with 0

292 q. and a. in the first two positions may be 0, 1, or greater than 1. 1 J If al,...,a is an element of Ai, then the set 1 m I * seq(al,.oo,am) C A0 is defined as follows: X1X2 OXX m seq(a, o o.,am) if and only if R includes a set of quadruples of the form {(q0,alx, xql) (qla2,x2,q2).o, (qla mXm,)}, where qo E QO. A g.s.m. is sometimes called a transducer. Theorem 17 (Maching Mapping Theorem) Let I = (V, T, S, P) be a context-free grammar, and let - = (T, Q, A0, QO, R) be a g.s.m. with input alphabet T. Then the set L' = seq(x) x c L(@f) is a context-free language, PROOF: We shall exhibit a context-free grammar,' that generates the language L'o The nonterminal vocabulary of X' is (S) U Q > V < Q, its terminal vocabulary is A0, and its initial symbol is S. The productions of' fall into three sets, as follows: (i) For each pair (q,q') QO x Q, S -- (qS9qS') O (ii) For each production of the form A - alaa. o a, 12^ m

293 and for every sequence of m+l (not necessarily distinct) states q.,q.... q. (q0,A,qi ) -4 (qi,al'qi )(qi a2,q )'' (qi,a mq' ) o1 m+1 1 2 2 5 m m+1 (iii) For each quadruple (qi,aj,x,q ) e R, (qi ajq) -4 X. The verification that this grammar generates L will be omitted; it follows the lines of the discussion given in the proof of Theorem 16. Decision problems of context-free grammars In this section we show how the Post Correspondence Theorem may be applied to prove the undecidability of certain questions concerning the analysis of context-free grammars. Let A be a finite alphabet, and let (al,...,an) and (bl,.obn) be ordered lists of words over the alphabet A. Let Z = [l,...,zn be a set of additional symbols not in A. Let L(a) be the language over the terminal vocabulary Z U A generated by the following context-free garmmar J (a). S - Z1 S a i=l2oo S -_ z, ai i=1,2,.,n Then L(a) consists of all nonempty sequences of the form z, Z.... z a.... a.. r r-l 11 11 r Similarly, let L(b) be the language over the terminal vocabulary Z U A generated by the following context-free grammar /(b).

294 S' - z. S' b. i=l,2, n 1 1 S' - z. b. i=l,2,.o.,n. Then L(a) n L(b) is nonempty if and only if the instance of the Post Correspondence Problem given by the lists (al,.,.,an) and (bl,...,bn) has a positive solution (i.e., there exists a nonempty sequence i i2, ~o ik such that a. a. a. b b. 1 2'k l 12 k Also, if L(a) n L(b) is nonempty, it is easily seen to be infinite. Thus, if there were a general procedure which, when given any two grammars of the form /(a) and </ (b), could determine whether L(a) n L(b) is empty (or infinite), then the Post Correspondence Problem would be recursively solvable. Thus, we are led to the following conclusion. Theorem 18 There is no algorithm to decide, given context-free grammars ~>1 and A2 whether L(A i) n L( 2) is empty, nonempty and finite, or infinite, Let = (V, T, S, P) be a context free-grammaro Two S-derivations of a word x c L(1 ) are equivalent if the labelled trees associated with the derivations are identical. The grammar I4 is unambiguous if, for all x c LCA), all S-derivations of x are equivalent; otherwise, A is ambiguous. To

295 illustrate this concept, suppose, has the following three S-derivations of the word abed. S - AB -~ abB -> abcd; S - AB -> Acd -* abcd; S - aC - abD - abcd. The first two derivations are equivalent; their tree is as follows. S A B a c -d The third derivation is not equivalent to the other two; it has the following tree. S a C \ D b gc c d

296 Because abed has two inequivalent derivations the grammar a is ambiguous. The concept of ambiguity is important in the application of contextfree grammars to natural and artificial languages. From the point of view of such applications, an ambiguous context-free grammar is one that does not attach a unique structural description to each of its sentences. Its explanatory value is therefore diminished. It has been shown that there exist inherently ambiguous context-free languages, all of whose grammars are ambiguous. Theorem 19 There is no algorithm to decide whether a context-free grammar is unambiguous. PROOF: It is evident that the grammars, (a) and L (b) defined above are unambiguous. Consider the following grammar 4 (a,b), which generates the language L(a) U L(b): T - S T -> S' S -> z. S a. i=l92ooo.,n S -> zo ao i=l,2 oo n S' — z, S' b. i=192o o on SI — Z b i=12 oo on In this grammar, a word x has two inequivalent T-derivations if and only if it has one derivation in which the nonterminal symbol S occurs, and another in which S' occurs; i.e., if and only if x c L(a) and x c L(b). It follows that this grammar is ambiguous if and only if

297 L(a) n L(b) / P; i.e., if and only if the Correspondence Problem for the lists (al,...,an) and (b,...,b ) has a positive solution. Thus, there can be no general algorithm to decide whether a grammar G(a,b) is ambiguous. Pushdown automata In this section we introduce the class of pushdown automata, and prove that a language is context-free if and only if it is the set of words accepted by some pushdown automatono This result provides a deepened understanding of the technique of "predictive analysis" used in computer processing of natural language text. For, since predictive analysis is equivalent to the use of a nondeterministic pushdown automaton, the result implies that the class of languages analyzable by a predictive analysis is precisely the class of context-free languages; ioe., predictive analysis is equivalent to immediate constituent analysis. A (nondeterministic) pushdown automaton has a one-way input tape, and a one-way infinite storage tape called a pushdowno If we call the right-most square of the pushdown that is not blank the top of the pushdown, then the characteristics of a pushdown may be described as follows: (1) the only square that may be read or erased is the top. (2) the only square that may be written upon is the square to the right of the top. Thus, a pushdown is a kind of last-in first-out memory.

298 Formally, a (nondeterministic) pushdown automaton is a 6-tuple. = (Ap Q, Ap qo, F, I), where A is the input alphabet, Q is the set of internal states, A I P is the pushdown alphabet, q0 C Q is the initial state, F C Q is the set._of final states, and I is the set of instructions, to be defined below. A complete state of 6 is a triple (x,q,z), where x c AI, q e Q, z is of the form ay, where a is an end marker not in AI U A and y c AP; x, of course, denotes the word on the input tape, and ay denotes the word on the pushdown; the rightmost symbol of ay is at the top of the pushdown. The instructions are of the form (a qijb) W) (qj 9) where (i) Either a=e or a c A. (ii) qi c Q and qj c Q. (iii) Either b=e or b E Ap or b = a. (iv) Either w c Ap or w is the special symbol p; if w=p, then b c Apo The action of the pushdown automaton 6 is dictated by its instructions as follows. If 1 is in the complete state (x^qi,z) where X = x1a z = z'b, and has an instruction (acSb) -> (qj w), w / pw then (2 may enter the complete state

299 Thus, this instruction corresponds to the operation of writing on the pushdown. If is in the complete state (x'a, qi, z'b) and has an instruction (a,q,b) -4 (qj,p), then d may enter the complete state (x, qj) zt); thus, the occurrence of p in an instruction indicates that the top of the pushdown is to be erased. A word x is accepted by Z if and only if, starting from the complete state (x, qO' a), aji has a sequence of operations that causes it to move beyond the end marker of the pushdown in a state q. c F, after having read all of the J input tape. More precisely, the requirement is that g should be able to reach a complete state (e, qi a) such that 6 has an instruction (e, qi, o) -) (qj9 p), where qj c F. Example 1. Consider the pushdown automaton 6 having the following instruction set:

300 (a,qo9e) - (ql,a) (aql1e) - (qla) F = (q3) (b,qle) - (q2,e) (aq2,a) -r (q2,p) (e,q2, c) - (q3,p) The set of words accepted by 6 is [(anba, n > 1}. A sequence of operation leading to the acceptance of the word a2ba is as follows. Input tape State Pushdown Instruction a a b a a qO (aIq Oe) - (qla) a a b a q1 ca (aqle) - (ql,a) a a b ql caa (bq1,e) -> (q2,e) a a 2q aaa (a,q2,a) I (q2,p) a q2 aa (a,q2, a) - (q2,p) e q2 c (e,q2,a) - (q3,p). Theorem 20 Let' = (V, T, S, P) be a context-free grammar. Then one can construct a nondeterministic pushdown automaton = (T, Q, V-T, S$ I) for which the set of words accepted is L(, )o PROOF: We shall give the construction of CZ and an indication of why it workso Without loss of generality, we may assume that each of the productions of ~ is of one of the following forms: A - a A - e,

301 or A ->BC, where a ~ T. A c V-T, B e V-T, C E V-T, and no nonterminal symbol X is contained both in productions of the form A - XB and productions of the form C - DXo Corresponding to each element A c V-T, there are two internal states, A and At The elements of I are determined from the productions as follows: Production Instructions A - a (a,A,e) -( (A,e) A - e (e,A,e) - (A,e) A - BC (e,A,e) - (B,A) (e,B],e) - (C,e) (e,C,A) - (A,p). In addition, there is the instruction (e,St,) ~ (Zp) where E is the only element of F.

502 Example 2. The context-free language (anban, a > 1 is generated by the following context-free grammar (with initial symbol S): S BA B - CS A a C -a S -bo The corresponding pushdown automaton has the following instructions: (a,A,,e) ^ (Af,e) (a,CJ,e) - (C t,e) (b,S,e) > (S,e) (e,S,e) - (B$,S) (e,B 1,e) - (A,e) (e,A,S) - (S,p) (e,B,e) - (C,B) (e,C,e) - (S,e) (e,s,B) - (B,p) (e,9St,) -- (,p). The following tree denotes an S-derivation of the word aabaa.

303 s B A C S a a B A C S a a b A sequence of operation corresponding to the acceptance of the word aabaa is as follows:

304 Input tape State Pushdown a ab aa a S a ab a a B aS aabaa C ^ cSB a a b a C cSB a ab a S aSB a ab a B aBS a a b a C SBSB a a b C aSBSB a ab S^ aSBSB a a S aSBSB a a B aSBS a a A c aSBS a A aSBS a S aSB a B aS a A4 aS e A c aS e S. a When the pushdown automaton is in a state A,, it "predicts" what production with A as its left-hand side is being applied. If the predicted production is of the form A -_ a, it checks the input tape to determine whether a is the first symbol; if the predicted production is of the form

305 A -> BC, the automaton stores A on the pushdown for reference in later "backtracking," and goes to the state B +. The states of the form A are used in the later backtracking process. The converse of Theorem 20 is also true. We omit the proof. Theorem 21 Let. = (AI, Q, Ap, qO, F, I). Then the set of words accepted by ( is a context-free language.

References [1] G. Birkhoff and So MacLane, A Survey of Modern Algebra, Revised Edition, Macmillan, New York (1953). [2] A. H. Clifford and G. B. Preston, The Algebraic Theory of Semigroups, vol. I, American Mathematical Society, Providence (1961). [3] M. Davis, Computability and Unsolvability, McGraw-Hill, New York (1958). [4] M. Hall, The Theory of Groups, Macmillan, New York (1959). [5] D. Arden, "Delayed logic and finite state machines," The University of Michigan Summer Session on the Theory of Computing Machine Design (1960). [6] J. Brzozowski, "A survey of regular expressions and their applications," IRE Transactions on Electronic Computers, vol. EC-11, June 1962, pp. 324-335. [7] J. Brzozowski, "Derivatives of regular expressions," Journal of the ACM, vol. l, October 1964, pp. 481-494, [8] J. Ro Buchi, Mathematical Theory of Automata, notes for Communication Sciences 403, given by J. B. Wright and J. R. Buichi, The University of Michigan, fall 1960, [9] S. Ginsburg, An Introduction to Mathematical Machine Theory, Addison-Wesley, Palo Alto (1962). [10] M. Harrison, Notes for Electrical Engineering 667, The University of Michigan, spring 19635 [11] J. Hartmanis'"On the state assignment problem for sequential machines, I," IRE Transactions on Electronic Comnuters, vol. EC-10, June 1961, pp, 157-165. [12] J. Hartmanis and R. E. Stearns, "Pair algebra and its application to automata theory," Information and Control, vol. 7,, December 1964, pp. 485-507. [13] K. Krohn and J. Rhodes, "Algebraic theory of machines, I," to appear in Transactions of the American Mathematical Society. [14] E F. Moore, "Gedanken-experiments on sequential machines," Automata Studies [Annals of Mathematics Studies, No. 34], Princeton University Press (1956), pp. 129-1535. 506

307 [15] E. F. Moore (editor), Sequential Machines: Selected Papers, Addison-Wesley, Palo Alto (1964), [16] J. Myhill, "Finite automata, semigroups and simulation," The University of Michigan Summer Conference on Automata Theory (1963). [17] M. Rabin and D. Scott, "Finite automata and their decision problems," IBM Journal of Research and Development, vol. 3, no. 2, April 1959, pp. 114-125. (Also in [15], pp. 63-91). [18] R. E. Stearns and J. Hartmanis, "On the state assignment problem for sequential machines, II," IRE Transactions on Electronic Computers, vol. EC-10, December 1961, pp. 593-603. [19] H. P. Zeiger, "Loop-free synthesis of finite-state machines," Massachusetts Institute of Technology Research Laboratory for Electronics report, September 1964. [20] Y. Bar-Hillel, M. Perles, and E. Shamir, "On formal properties of simple phrase structure grammars," Zeit. f'ur Phonetik Sprachwissenschaft und Kommunikationsforschung, Band 14, Heft 2 (1961), pp. 143-172. [21] N. Chomsky, "Formal properties of grammars," Handbook of Mathematical Psychology, volo 2, Wiley, New York (1962), pp. 324-418. [22] N. Chomsky, "Context-free grammars and pushdown storage," Massachusetts Institute of Technology Research Laboratory for Electronics report (1962). [23] N. Chomsky and M. Schutzenberger, "The algebraic theory of contextfree languages," in Studies in Logic Computer Programming and Formal Systems, North-Holland, Amsterdam (1963), ppo 118-161, [24] C. C. Elgot and J. Mezei, "Two-sided finite-state transductions," Proceedings of the Fourth Symposium on Switching Circuit Theory and Logical Design, IEEE (1963). [25] C. C. Elgot and J. L. Rutledge, "RS-machines with almost blank tape," Journal of the ACM, July 1964, pp. 313-337. [26] J. Evey, "Application of pushdown-store machines," AFIPS Conference Proceedings, vol. 24 (1963), pp. 215-227. [27] S.-Y. Kuroda, "Classes of languages and linear bounded automata," Information and Control, vol. 7 (1964), pp. 207-223. [28] Po S Landweber, "Three theorems on phrase structure grammars of type I," Information and Control, volo 6 (1963), pp. 131-1537

308 [29] P.S. Landweber, "Dedision problems of phrase-structure grammars," IEEE Transactions on Electronic Computers, vol. EC-13, no. 4, August 1964, pp. 354-362. [30] M. L. Minsky,'Recursive unsolvability of Post's problem of Tag," Ann. of Math., vol. 7., (1961), pp. 437-455. [31] J. Myhill, "Linear bounded automata," WADD Tech. Note 60-165, Wright Air Development Division, Wright-Patterson Air Force Base (1960). [32] A. G. Oettinger, "Automatic analysis and the pushdown stores," Proceedings of the 12th Symposium in Applied Mathematics (R. Jakobson, editor), (1961). [33] E. L. Post, Bull. A.M.S., vol. 52 (1945), pp. 264-268. [34] E. L. Post, J. Symbolic Logic, vol. 12 (1947), pp. 1-11. [55] P. M. Postal, "Constituent analysis," Int. J. Am. Ling., Supplement (1964). [36] A. L. Rosenberg, "On N-tape finite-state acceptors," Proceedings of the Fifth Symposium on Switching Theory and Logical Design, IEEE (1964). 3[7] J. C. Shepherdson, "The reduction of two-way automata to one-way automata," IBM J. of Research and Development, vol. 3 (1959), pp. 198-200. (Also in [15], pp. 92-97.) [38] V~ Yngve, "The depth hypothesis," Proceedings of the 12th Symposium in Applied Mathematics (R. Jakobson, editor (1961)

UNIVERSITY OF MICHIGAN 3 9015 03024 4399