Requests for additional copies by agencies of the Department of Defense, their contractors, and other Government agencies should be directed to: Defense Documentation Center (DDC) Cameron Station Alexandria, Virginia 22314 Department of Defense contractors must be established for DDC services or have their "need-to-know" certified by the cognizant military agency of their project or contract.

THE U N I V E R S I T Y OF M I C H I G A N COLLEGE OF LITERATURE, SCIENCE, AND THE ARTS Department of Communication Sciences Technical Report EQUIVALENCES BETWEEN PROBABILISTIC AND DETERMINISTIC SEQUENTIAL MACHINES C. V. Page ORA Projects 03105, 06376, 06689, 06921 under contract with: U. S. ARMY RESEARCH OFFICE (DURHAM) GRANT NO. DA-ARO(D)-31-124-G558, PROJECT NO. 4049-M DURHAM, NORTH CAROLINA and NATIONAL SCIENCE FOUNDATION GRANT NO. GP-2539 WASHINGTON, D.C. and U. S. DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE PUBLIC HEALTH SERVICE, NATIONAL INSTITUTES OF HEALTH GRANT NO. GM-12236-01 BETHESDA, MARYLAND and DEPARTMENT OF THE NAVY OFFICE OF NAVAL RESEARCH CONTRACT NO. Nonr-1224(21) WASHINGTON, D.C. administered through: OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR May 1965

RESEARCH PROGRESS REPORT Title: "Equivalences Between Probabilistic and Deterministic Sequential Machines," C. V. Page, The University of Michigan Technical Report 0310537-T, April 1965; Nonr 1224(21), NR 049-114. Background: The Logic of Computers Group of the Communication Sciences Department of The University of Michigan is investigating the application of logic and mathematics to the theory of the design of computing automata. Condensed Report Contents: The concept of probabilistic sequential machines (PSM), a generalization of Rabin's concept of probabilistic automata, is defined, Such diverse devices as unreliable digital computers, slot machines, and chemical cells are presented as examples of PSM. Using the examples as motivation, various kinds of equivalences between machines are discussed. The fundamental question of when a PSM is equivalent in some sense to a deterministic machine, perhaps with random devices attached to output states, is considered. Finally various tests involving finitely many random variables are devised for each of the kinds of equivalences between PSM and for reduction, if possible, to deterministic machines. One of the tests is a further generalization of the Moore bound for deterministic machines than has previously appeared in the literature. For Further Information: The complete report is available in the major Navy technical libraries and can be obtained from the Defense Documentation Center. A few copies are available for distribution by the author.

TABLE OF CONTENTS Page O. INTRODUCTION 1 0.1 The Concept of Probabilistic Sequential Machine 1 0.2 Models of Probabilistic Sequential Machines 3 1. DETERMINING WHETHER A PROBABILISTIC SEQUENTIAL MACHINE IS EXPECTATION EQUIVALENT TO A FINITE DETERMINISTIC MACHINE 8 1.1 The Concept of Expectation Equivalence 8 1.2 The Reduction Relation RF 11 1.3 Construction of the Quotient Machine 13 1.4 The Partition of the Set of Accessible State Distributions Induced by RF 16 1.5 Necessary and Sufficient Conditions That Strings be in the Same RF Class 18 1.6 Necessary and Sufficient Conditions That RF be Non-Trivial 23 2. DETERMINING WHETHER A PROBABILISTIC SEQUENTIAL MACHINE IS NMOMENT EQUIVALENT TO A MACHINE WITH DETERMINISTIC SWITCHING AND RANDOM OUTPUTS 29 2.1 Distribution Equivalence: -D 29 2.2 Moments of the Output Random Variable 30 2.3 Special Properties of Rabin Probabilistic Automata 31 2.4 The Concept of N-Moment Equivalence: -N 32 2.5 The Relationship Between =D and =N 33 2.6 The N-Reduction Relation 36 3. THE NOTION OF INDISTINGUISHABILITY AS A CRITERION OF BEHAVIORAL EQUIVALENCE 43 31 Example of Two Distribution Equivalent Machines Which Perform Differently as Components of a Machine 43 3~2 A More Satisfactory Technical Notion of Indistinguishability 47 3.3 The Relationship Between the Intuitive and Technical Concepts of Indistinguishability 51 4. FINITE COMPLETE SETS OF INVARIANTS FOR THE BEHAVIORAL EQUIVALENCES i Ei -=N AND -I AND THE REDUCTION CONGRUENCE RELATIONS RF AND R54 4.1 The Fundamental Lemma 55 4.2 A Bound for Testing for Membership in -I 57

TABLE OF CONTENTS (Concluded) Page 4~3 Equivalence of Distributions in one Machine 58 4.4 Bounds for Testing for Membership in -E and RF 59 4.5 Bounds for Testing for Membership in and RNF 61 4.6 Discussion of the Generalization of the Moore Bound 65 REFERENCES 67 vi

0. INTRODUCTION The notion of behavioral equivalence is a fundamental part of the study of automata theory. Two definitions of behavioral equivalence occur in the literature for deterministic machines. One, due to Burks [5], calls two machines behaviorally equivalent if they define the same function from input strings to output strings. The other, part of Rabin-Scott automata theory, calls two machines behaviorally equivalent if they accept the same set of tapes. The two definitions can be shown to be the same for deterministic machines by recoding arbitrary output symbols into strings of zeros and ones. Both definitions have been generalized for probabilistic machines. However, for probabilistic machines the resulting generalizations are not equivalent. This paper is concerned with certain kinds of equivalences between probabilistic machines, Two models will be discussed later in this section in order to gain insight into the main kinds of equivalences which will be studied. Of particular interest will be when a probabilistic sequential machine is equivalent in some sense to a finite deterministic machine, 0.1 THE CONCEPT OF PROBABILISTIC SEQUENTIAL MACHINE By a probabilistic sequential machine is meant a system which satisfies one of the following two definitions: Definition 0.1: A (Moore-type) probabilistic sequential machine A is a system A = < n, I, S, Z, A(O),o..,A(k-l), F, 0> where

n: a natural number, the number of states I: a n-dimensional stochastic vector, the initial state vector S: set of state vectors = (S1 = (1,0,.,OO),..,Sn = (0ao o,0Ol)3 Z: alphabet set Z = (0,1,2 *,*.,k-l] A(i): i = 0,l,..,k-1 n x n switching matrix for input symbol'.. A(i)gm is the probability of a transition from state; to state m via symbol i. F: output vector, a n-dimensional column vector whose entries are real numbers. 0O output function 0(Si) = Si x F = Fi~ Si e S Definition 0.2: A (Mealy-type) probabilistic sequential machine. A = < n, I, S, Z, A(O),...,A(k-1), W, P > where n, I, S, Z, A(O),...,A(k-1) are as in 0ol and where the output function P satisfies P(Sitj) = Wij Si E S, j e E It is an easy matter to show that Definition 0.1 and 0.2 are equivalent in the following sense. For every Moore-type probabilistic sequential machine there is a Mealy-type sequential machine whose output is the same random variable over each input and vice-versa. Consequently, we will be concerned only with the properties of Moore-type probabilistic sequential machines, which from now on will be called "sequential machines," There seem to be many instances of systems like probabilistic sequential machines from other fields of study not generally thought to be automata theory. Braines and Svechinsky discuss a system like Definition 0 a in their paper "Matrix Structure in Simulation of Learning" [11. If one takes the

cartesian product of machines of Definition 0.2, one gets the Markov processes with rewards and alternatives as studied in sequential decision theory as presented by Howard [2]. Matrix games as discussed by Thrall [3] can be considered as instances of Definition 0.1 in which I and F are strategy vectors and game matrix A(x) is defined by a string x. A simple correspondence shows that the noisy discrete channel of Shannon [8] is equivalent to the system of Definition 0.2. One would hope that someday probabilistic sequential machines could become a unifying concept, organizing and providing results for these diverse fields. Probabilistic sequential machines are generalizations of the work of Rabin [4] for probabilistic automata. If one restricts I to elements of S and Fi = O or 1 for i = 1,2,...,n then Definition 0.1 defines probabilistic automata, Following Rabin, we remark that: Remark 1: Let x = il...irij E,j = l,..,r. Then A(x) = A(i)*...A(ir) i.e. the switching matrix for a string x is found by multiplying the matrices for the symbols of x together in order. 0.2 MODELS OF PROBABILISTIC SEQUENTIAL MACHINES We consider here two models, one of which can be considered probabilistic and one of which can be considered deterministic, although both fall within the framework of probabilistic sequential machines, Example 0.1. Probabilistic internal operation: A slot-machine A simple model of a probabilistic sequential machine is a slot-machine, The static position of the dials represents the present state of the machine,

Usually there are 20 different positions on the dial and 3 dials for a total of 8,000 states. The input consists of putting in a coin and pulling a lever, causing the machine to travel transiently through many states until it settles down in one state. An output is associated with each state. Nothing (which is associated with 0) comes out unless the dials all display the same object. In that case, some change tumbles out (which is associated with the corresponding real number) usually dependent only on the kind of object being displayed, i.e., the state of the machine. Such a machine whose output is controlled by its states is known as a "Moore machine" [7]. Each state can be associated with a number between 1 and 8,000, and the output for each state can be tabulated in a column vector or 8,000 x 1 matrix. In the formalism, this column vector will be called the "output vector" and designated by the symbol "F". The output for state i will be written as "Fi". The enormous number of distinct ways the lever can be pulled are prevented from significantly influencing the outcome by spring loading. Hence for all practical purposes there is only one kind of transition law associated with pulling the lever. If the randomness of transition of the dials caused by variable factors like dust friction, humidity, heating and small vibrations does not change over long periods of time, the probability of a transition from any state of the dials to any other can be determined experimentally to any required precision. This situation is summarized formally in the assumption for probabilistic sequential machines that the transition probabilities are stationary. Symbolizing the usual lever play of the machine by L, the transition probabilities can be tabulated in a matrix

A(L), with the entry in the i'th row and j'th column (written A(L)ij)being the probability of a transition from state i to state j via input L. If there were no other permissible way to affect the rotation of the dials than by a pull of the lever, then the behavior of a slot-machine A could be described as a finite state Markov chain with rewards and transition matrix A(L). However, sudden small external shocks during the rotation of the dials can influence the state transitions of the machine. In order to model completely how such machines are played, we can consider a finite repeatable set of such non-standard inputs to the machine. For instance, one such input might be described as the application of a kick with a prescribed kinetic energy on a certain spot on the machine occurringl/5 of a second after the lever is released, Symbolizing this manner of playing the machine by K, the transition matrix A(K) could be determined experimentally since the input is repeatable. A finite set of such repeatable inputs could be defined and their effects on the behavior of the machine ascertained, To find out how strings of S and K inputs to the machine affect its operation, it is sufficient to multiply the matrices A(S) and A(K) together in the order specified by the string, e.g., if a string X is SKKSK then the transition matrix A(X) is the product A(S) -A(K) -A(K) -A(S) -A(K). Consider how the dials of the machine might be found initially. If the dials can be completely observed, the initial state of the machine is observable. In this case, in the formalism the initial state i is represented by a vector I (or a 1 x 8,000 matrix) with a ai in the i9th component and zeros elsewhere, On the other hand, the dials may not be completely visible,

and we may wish to specify the average behavior of a large number of machines run simultaneously, or we may wish to consider the average return from playing one machine only when it is left by other players on one of a set of preferred states. In any one of these cases, I can be a stochastic vector (Ill... I8000) where Ii is the probability of being in state i at time t0. In the general case, the next state probabilities starting with an initial state vector I and an input string X are given by I.A(X). Hence the expected value of output of a machine A starting with initial state distribution I and output vector F after a string X of inputs has occurred is just EA(X) = I-A(X)-F which is a bilinear form in I and F with form matrix A(X). The variance in output and other higher moments can be defined analogously. Example 0.2. Deterministic internal structure: Chemical production cell Suppose a chemical tank A is divided into several isolated compartments Al,...,An by partitions which are interconnected by an electronically controlled system of pumps and valves. Suppose that there is a finite set of controls Z = 0,l,...K-1 and that for each control c a fixed fraction of the chemical in compartment Ai, vCij is pumped into compartment Aj. For all controls c in Z, the full influence on redistribution of liquid in the tank can be described in a n x n matrix A(c) with vcj being A(c)ij. Furthermore, suppose that the liquid being pumped between compartments is a catalyst which causes production of a desired end product in each compartment with a different efficiency, i.e., if the mass fraction of catalyst in Ai is Pi and Fi is the efficiency of Ai, then the output of end product is PiFi. Note that it

is assumed that the output of the compartment depends linearly on the catalyst present. The initial state I is an n component vector with the i1th component Ii n being the mass fraction of catalyst in compartment i. Note l Ii = 1 since i=l the tank is a closed system as far as the catalyst is concerned. The distribution of mass fractions of catalyst over the compartments after a sequence of controls X = il..o im is just I-A(il)...-A(im) = I-A(X) That is, (I-A(X))i is the mass fraction of catalyst in compartment i after starting with initial distribution I of catalyst fractions over compartments and the string of control inputs X = i...im. The total end product from the tank is the sum of the outputs from each n compartment: (I'A(X)) iFi which can be written I. A(X) ~F in matrix notai=l tion. This expression has the same form as the expectation of output for the probabilistic slot-machine, but there are no overt probabilities involved here. The mass fractions of catalyst play the same role as the probabilities in the first example. However, the output will still be written like an expectation as EA(X). The total end product accumulated, TX, for the string of controls X from time t0 to time t0 + m is given by adding the output from each substring, TX = EA(il) + EA(ili2)+-. +EA(ili2v —im)

1. DETERMINING WHETHER A PROBABILISTIC SEQUENTIAL MACHINE iS EXPECTATION EQUIVALENT TO A FINITE DETERMINISTIC MACHINE 1,1 THE CONCEPT OF EXPECTATION EQUIVALENCE In the two models discussed in the introduction, the expected value of output, EA(x), played an important role in the physical interpretations. Let us repeat the definition of the expected value of output. Definition 1.1: The expected value of output for a probabilistic sequential machine A is given by EA(X) = I.A(x).*F for x in Z Definition 1.2: Machines A and A' are expectation equivalent, written A - EA' if * EA(x) = EA,(x) for all x in Z Recall from example 0.2 that EA(x) was the actual output of the chemical cell and not an expectation. Hence the basic concept of expectation equivalence is analogous to the definition of behavioral equivalence of Burks for the model 0.2. However for example 0.1, the slot-machine, expectation equivalence is not the generalization of this kind of behavioral equivalence. The concept of indistinguishability discussed in Chapter 3 seems to be the appropriate generalization of this kind of behavioral equivalence. Let us now turn to an example to show how proper coding of the outputs could make the concept of expected value of output relevant to an unreliable digital computer.

Example 1.1. Proper choice of output code can make the expected value of output relevant to the study of real computers. We encode the output so EA(X) is approximately the code for output for string x. Then expectation equivalent machines have nearly the same input-output behavior when one averages it over a large number of programs. Suppose from some machine A we have IA(x) = (. 0000,.0625,.8750,.0625,.0000,...) IA(y) = (.8750,.0625,.0000, 0000.0000,.0625,...) IA(z) = (.0000,.0000,.0000,.1250,.8750,...) and FT = ( T,, A, A*, *,...) with the intent that x causes an "A" as output y causes a "T" as output z causes a "f," as output We can recode the output symbols by the following (FT is FT recoded) FT = (100, 011, 010, 001, 000,...) and EA( ) = 0102 which is the code for A EA(Y) = 1002 which is the code for T EA( ) = (.001)2 which is not a code, but if decoding is used which picks the closest code number, z is associated with output " * ". A more careful choice of code numbers could have made each expectation equal to a code, simplifying the decoding problem.* However in a practical situation, only a sample expectation to be decoded can be obtained and a more elaborate statistical decision rule than just comparing for equality must be

used in decoding the output symbol. *Proof: Let Xi be the code weight. (Xi = Fi) and I.A(zi) = (Pil, Pi2,...Pin) i = 1,2,...,n The condition that EA(Zi) = Xi i = 1,2,...,n implies that P11X1 + P12X2 +... + PinXn = X1 PnlX1 + Pn2X2 +. + PnnXn = Xn or equivalently ( P1l-1 P12..Pin X P'X = P21 P221 P2n Pnl'.',. Pnnl Xn which has a non-zero solution iff Determ. (P') = O. By definition an eigenvalue of a matrix M is some number %i such that / mll-Ni ml2.. m Determ.( m21 m22-Ni.. m2n = O mnn.... mn-n /i For any stochastic matrix, 1 is an eigenvalue. Hence Determ.(P') = 0 is always true for any choice of probabilities and the result follows. In order for the encoding to be unique we also need Xi / Xj, i # j but conditions on the probabilities for this to occur will not be considered here. Example 1,2. Machines A and A' which are expectation equivalent: A - EA' IA(x)F = I'A'(x)F' Vx E Z 10

A = < I, A(O), A(l) F > and A' = < I, A'(O), A'(1), F > A( O) 1 0 A( 1) /5 1/5 /5 /2 1/4 1/4 1/5 4/5 o 1/4 0o 3/4 4/5 /5 O A(O) 1 o o\ AT(l) = 17/o o 3/lo\ (5/8 o 3/8 I/5 0 2/5 o 1/2 1/2 ~9/1o o 1/lo/ 7 F = F' = 5 These machines are expectation equivalent from any initial probability distribution, I, over states. The previous example shows that two machines can have very different switching matrices and still be expectation equivalent. Frequently, studies of Markov processes are concerned with the location of the zeros in the transition matrices. The example shows that the locations of zeros in the transition matrices is not the only relevant factor in the study of expectation equivalence, Since the graph theoretic properties of the transition matrices, such as the accessibility of a state, depend on where the zeros are, one would not expect a purely graph theoretic approach to be very fruitful in the study of this problem. Hence some of the tools of linear algebra will be used in addition to the above approaches. 1.2 THE REDUCTION RELATION RF In this section a congruence relation, RF, will be defined so that a quotient machine can be constructed, States of the quotient machine will correspond to the distinct values of expectation which occur for input strings. If the rank of RF happens to be finite, the machine constructed has a finite 11

number of states. By attaching a deterministic output device to each state of the constructed machine the expectation equivalent deterministic machine is obtained, If the rank of RF is finite, some class of the relation must contain infinitely many strings. A necessary condition for RF to be finite in rank is that it be non-trivial, i.e. at least two different strings are contained in some class, This necessary condition produces strong constraints on the form of the symbol matrices of such probabilistic machines. Definition 13~: The reduction relation RF is given by x RF Y iff EA(x) = EA(y) & EA(xz) = EA(yz) Vz e Z VI E S If Z contains A, a semigroup identity, the definition reduces to x RF Y iff EA(XZ) = EA(yz) z Z 7, VI e S RF is a congruence relation on Z because of the reflexivity, transitivity and symmetry of "=" and the substitution property in its definition, In order to discuss congruence relations between stochastic matrices which may not be generated by strings of symbol matrices a matrix congruence analogous to RF will be defined, Definition 1.4: The matrix reduction relation R between n x n stochastic matrices B and B"' B RM B' iff IBF = IB'F' and there exist machines A and A' such that IBA(z)F = IB'A'(z)F' for all I e S, for all z e Z Hence two strings x and y which are in the same class of the relation HF will have equal expectations from any initial state of the machine and 12

will continue to have equal expectations for any finite input continuation z. As far as expectation of output is concerned, the behavior of the machine A is the same after either string x or string y. Returning to Example 1.1, we can interpret x and y as program segments which produce the same final output code and from which any continuation will give the same output code. If intermediate outputs are suppressed, x and y in RF can be regarded as equivalent microprograms in the machine A. 1.3 CONSTRUCTION OF THE QUOTIENT MACHINE Definition 1.5: The equivalence class of x' of R, a congruence relation, is given by R[x'] = (x: x R x') It is a well known result [10] of automata theory that given a right congruence relation R on, one can construct a quotient automaton with no output T(R) T(R) = < a, S, M > where a = R[A] S = (R[x]: x e Z M is a function from S x Z into S such that M(R[x],a) = R[xal x e Z; a e Z Definition 1.6: Let 3 CL *. A congruence R refines P if x R y = x e 3 iff y e. 13

Theorem 1.1 (Nerode) Let: be a subset of Z. E is the behavior of a finite (deterministic) automaton A = < T(R),~ > over E where -' = (R[x]: x e 3] iff there exists a right congruence relation R of finite rank which refines 5. Theorem 1.2 if the congruence relation RF has finite rank, then for any A there is a finite deterministic automaton A' such that the tapes accepted by A' are T(AO). Proof: Let: = T(A,I) = (x: EA(x) > A}. Note that RF refines 5 i.e. x RF y H-x E T(A,\) iff y E T(A,\). If RF has finite rank, by definition {RF[x]I has a finite number of members. Using theorem 1.1 we construct T(RF) = < a, S, M > and A' = < a, S. M, X> which accepts T(A,A) Q. EL D. Definition 1.7: rpA(x) is the response of A to input string x. if A is deterministic, rPA(x) is the state of A after an input of x. If A is probabilistic, rPA(x) is a random variable taking on values which are states. We use the above construction to give a sufficient condition for the reduction of a probabilistic sequential machine into an expectation equivalent finite deterministic machine whose output function is either a constant C(s) R R for each state s or a random device OA(S) with expectation E(O A(s)) = C(s) Theorem 1,3 The reduction relation RF defined by a probabilistic machine A has finite rank if and only if there exists a finite deterministiC machine A' 14

with a deterministic output OA, such that OAt(rpA,(x)) = EA(X) Vx e Z Proof: (sufficiency) By theorem 1.1 let Al = < a, S, M, 0 > where / is the empty set. Note any congruence R refines ~ vacuously, We attach an output function OA, to elements of S. OA,(s) = EA(x) s = RF[x] For a deterministic machine, M is extended to M* which operates on strings rather than symbols by M*(s,c) = M(s,a) s e S C E M ( s,ax) = M*(M*( so),x) x e We note that M (a,x) = rpA,(x) so we need show only that rpA,(x) = RF[x]. Let x = ili2..im for ij e Z; j = 1,2,...,m. rpA,(x) = M*(z,x) = M*(M*(a,il)9i2...im) = M*(M (a, iz)i2...im) = M*(M(RF[A],il),i2o..im) = M*(RF[Ail],i2...im) = *(M(RF[il ] ii2), i3. oim) = RF[ili2..oim] = RF[x] Hence the constructed sequential machine is A' = < a, S, M, OA, > (necessity) Given A.' = < a, S, M, OAT > such that OAT(rpAt(x)) = EA(x) Vx e Z OA,(rPA (xz)) = EA(XZ) Vz e Z Let rpA (x) = S x E E Define Sx R0 Sy iff x RF y 15

Let n' be the cardinality of S - finite. rank RO = rank RF rank RO < n' Hence rank RF is finite. Q.E.D. R Instead of the deterministic function OA", a random device OA,(s) such that E(OA,(s)) = EA(x) could have been used in the construction 1.4 THE PARTITION OF THE SET OF ACCESSIBLE STATE DISTRIBUTIONS INDUCED BY RF Definition 1.8: V(A) = {IA(x): x E,*} - the set of all stochastic vectors which can occur as distributions over the states of A. We sometimes call V(A) the "state vectors accessible in A". Definition 1.9: A set of vectors V = (vl,v2,...]} is convex if for any finite set of indices I ci > 0 and Ci = 1 civi E V. The convex closure of a iCI icI set of vectors V, written V+ = iv': v' =I civi andk ci = 1 and ci > 0 and iEI iCI vi e V}. Theorem 1.4 If RF has finite rank r, there exists a partition II = (IIl,...,IIr) on V(A) and an integer valued function g(l,m) such that IIiA(a) = IIg( i, ) i = 1,...,r; C e Proof: We use RF to induce an equivalence on the set of stochastic vectors accessible by the machine. Since RF is of finite rank, we form a set of an arbitrary distinct representative from each congruence class, say xl,...,xr where xi xj 16

Define IIJ = IA(x) x e RF[xj] We show that (IIT,...,IIr) is a partition of V(A) r Let W = [J IIi i=l IA(x') e W IA(x') e V(A) IA(x') e V(A) -,x' e RF[xk] for some k = 1,...,r =-IA(x') e IIk for some k = 1,...,r =>IA(x') E W Henc e n w = UJ li = v(A) i=l We show ZIi IIj = i3j suppose that Ii n = {y,.. where Vy = IA(y) IA(y) e IIi =4 y e RF[xi] = y RF xi IA(y) e I=j = y e RF[xj] -x y RF xj Hence we get y RF xi =Xi RF Y by symmetry and transitivity of RF gives Xi RF Xj #=xi E RF[Xj] 17

But since xi and xj are representatives and there is only one representative from each class xi = xj i j which is a contradiction. Finally we show there exists an integer valued function g(ic) such that TIiA(cx) = ng(ic) Cx E Z vl e Ii =vl1 = IA(wl) for some wl E* vljA() = IA(wl)A(a) = IA(wlc) e IIT for some j as has been shown above v2 e Ili # v2 = IA(w2) for some w2 e Z v2A( c) = IA(w2c) e TI since elements of RF have the substitution property, i.e. wl RF Xi = —Pwla RF XiO C E w2 RF xi>w2c RF xia C e Z xiv is an element of a class with representative xj for some j and depends only on xi and a. So there is a function g(l,m) such that g(i,c) = j aC e QoEoDo 1o5 NECESSARY AND SUFFICIENT CONDITIONS THAT STRINGS BE IN THE SAME RF CLASS The relation RF has occupied an important place in the development of this theory. We now study the structure of the matrices of strings which are in the same RF class. Definition 1o10: A relation R is non-trivial if there exist x and y in the domain of R with x ~ y such that x R y. 18

Definition 1.11: The kernel of F = Kern. (F) {v E Rn: vF = 0) where R is the set of reals. Definition 1.12: The span of a set of vectors (vl,...,vr] is denoted by < (vl,...,vr) > = Civi VCi c R Theorem 1.5 A necessary and sufficient condition for x and y to be in the same class of the reduction relation RF is: (x,y)RF<K a a subspace U of Kern. (F) such that * (i) UA(z) C Kern. (F) z e Z (ii) A(x) = A(y) + with ui U i = l,...,n Un Proof: (x,y) E RF - IA(xz)F = IA(yz)F VI E S Vz E Z hence A(x)F = A(y)F since S = ((1,0,...,0),...,(0,...,0,1) and A e Z by linear algebra A(x) = A(y) + n where hi e Kern. (F), i = multiplying by A(z) 19

h1 * A(x) A( z) = A(y) A(z) + A( z) Vx e Z nn hi A(xz) = A(yz) + A( z) we can multiply by any initial state distribution I so h, + IA(xz)F = IA(yz)F + I.h A(z)F I e S hn But since x and y are in RF IA(xz)F = IA(yz)F Vz E Z I E S Hence I A(z)F = 0 => hi A(z) e Kern.(F), i = 1,2,...,n. hn Let U = < hi,...,hn > We get UA(z) C Kern (F) Vz E Z Notation: we denote by Ai the i'th row of the matrix A. h, Let H = where hi e U C Kern.(F), i = 1,2,...,n. hn * A(x) = A(y) + H multiplying the equality by I on the left and F on the right we obtain h1 F IA(x)F = IA(y)F + I ( h I E S hn'F but h.F = 0 since hi e Kern (F) i = 1, bn IA(x)F = IA(y)F 20

using * again and the same argument we get A(xz) = A(yz) + HA(z) IA(xz)F = IA(yz)F + IHA(z)F = IA(yz)F Ui since HA(z) = i where ui e Kern.(F) Q.E.D. Un We now simplify the restriction (i) of Theorem 1.6, to symbol matrices rather than string matrices. Theorem 1. 6 Let U = < j. [{A(x)i-A(y)i]: i = 1,...,n for x,y such that (x,y) e RF x EZ then U.A(z) C Kern. (F) 4= [2V a subspace of Rn: (i) UA(i) CV vi e Z (ii) VA(i) = VC Kern.(F) vi e Z] Proof: UA(z) C Kern.(F) Let V = < ({uA(z); u E Ug z e Z > VA( i) = {uA(z)A(i); u e U, z e Z*) = V Consider an arbitrary v E V. There must be some set of indexes J and constants cj such that: 21

v cjujA(zj) by definition of V. jEJ v.F = cuA(zj) ~ F = cj(ujA(zj)F) jeJ But ujA(zj)F = 0 by UA(z) CKern.(F) so v'F = 0 Hence V 5 Kern. (F) UA(z). V already shown..UA(z) C Kern.(F) Q.E.D. Definition 1.13: A subspace V is invariant under a set of linear transformations {Ti ~ i = 1,2..,m] if V'Ti = V i = 12,. Using Theorems 1.5 and 1.6, we get the following directly: Theorem 1o7 Strings x and y are in the same class of RF if and only if there exists a subspace V of Kern.(F) such that (i) V is invariant under (A(i); Vi E 7] (ii) A(x) = A(y)+H where Hi e V i = l,...,n 22

1.6 NECESSARY AND SUFFICIENT CONDITIONS THAT RF BE NON-TRIVIAL A very weak necessary condition that RF have finite rank is that it at least be non-trivial. From theorem 1.7 it is immediate that: Theorem 1.8 The reduction relation RF is non-trivialK<=- a subspace V of Kern.(F) such that (i) V is invariant under (A(i); Vi E Z) (ii) A(x) = A(y)+H where Hi e V i = 1,...,n (iii) x / y. Hence we now know that given strings x and y in RF, the difference between the rows of the matrices A(x) and A(y) must be elements of a subspace V which has special properties. Namely V must be invariant under all symbol matrices and contained in the kernel of the output vector. Theorem 1.9 A necessary and sufficient condition that RF be non-trivial is that A(i): Vi e Z be reducible for the same change of basis to V. In other words, there exists a linear transformation W of the state vectors S to a basis for V such that basis for V A1 0 W 1A(i)W = A2 A i i i Where O denotes a submatrix of zeros and A1, A2, and A3 are submatrices which for all i in Z have the same number of columns and rows. 23

Proof: By theorem 1.7 and standard matrix theory [see Jacobson, Lectures in Abstract Algebra, V. II: Linear Algebra, Van Nostrand, New York, 1952, pp. 116-117]. Consequently, Theorem 1.9 gives us a matrix reformulation of the statement that RF be non-trivial. Example 1.3. We now show a probabilistic sequential machine A which illustrates theorems 1.3 and 1.7. The method by which this example was generated will be discussed in a latter report. A = < I, A(O), A(1), F> where I = (8/10, 1/10, 1/10, 0, 0, 0) 0 1 0 0 0 0\ /10 O 1 0 0 0 0 5 A(O).= 0 0 1/2 0 1/2 0 F 1 o 0 0 0 0 0 2 0 0 3/4 0 1/4 0 1 0 0 0 0 0 11 \2 o o 1/8 0 7/8 0 O O 0 1 0 0 A( 1) = O 0 4/8 0 4/8 0 O 0 3/8 0 5/8 0 0 0 2/8 0 6/8 0 1 0 0 0 0 0 The state diagram for A:

1/2(0), 7/8(1) O C/ 1/4 3/4(O) 78(18) 4 5/8(1) 0 4:@ 2 1/4(0), 3/4(i) < 2 where the following labeling conventions are used. p(K): pe[0,1]; K e, means probability of transition of p via symbol K. 2: F2: Output of Fe occurs when the machine is in state 2. P1(K1) P1(K1), P2(K2) 0 == 0: is replaced by 0 > 0 P2(K2) We note that OORFO since:

A(00) = /0 1 0 0 0 0\ (0 1 0 0 0 0 0 1 0 0 0 o\ 1 0 0 0 o O o 1/2 0 1/2 o O 0 1/2 o 1/2 0 0 O O 0 0 1 O O O O O 4 o o0 3/4 0 1/4 o/4 4 0 1/4 o 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 o 0 5/8 0 3/8 0 0 0 0 0 0 1 0 o 9/16 0 7/16 0.0 0 0 0 0 1 which gives (A(OO) - A(O))F = l O O 0 10 o0 0 0 0 0 0 5 O o +1/8 0 -1/8 0 1 = (o0o,o,oo,o) 0 0 0 0 0 0 2 0 0 -3/16 0 3/16 0 o 0 0 0 0 o0 2 Hence A(00)F = A(O)F or IA(00)F = IA(O) F for all I. Furthermore, for all P e [0,1] (0o o0 P, 0, 1-P, O)A(o) = (o, o, P, 0, 1-P, O) (0 o, P, 09 1-P, O)A(1) = (0, 0, P, 0, Z-P, 0) that is W = < ((o, o, P, 0o 1-P, o) > is invariant under the symbol matrices A(O) and A(1). V = < {(O, 0o P, O, -P, 0) >C W and VA(O) = V VA(l) = v Hence for z e Z

0 0 0 0 0 0 * 1/80 0 -1/8 0 (A(00) - A(O))A(Z) = Cz +1/8 0 -1/8D 0 0 0 0 0 0 0 0 -3/16 0 3/16 0 0 0 0 0 0 0 where Cz is a constant depending on the string z and (A(OO) - A(0))A(z)F = DF - 0 consequently Vz E * VI E S IA(00)A(z) F = IA(O) A(z) F or EA(OOz) = EA(Oz), which shows OORFO. By the same method one can show that 10 RF 1 011 RF 11 01011 RF 11 111 RF 11 01010 RF 0 and all strings are in the classes RF(A), RF[0], RF[O], RF[11], RF[O],, RF[001], RF [0101] which means that RF has finite rank. We compute the expectations and construct the expectation equivalent deterministic machine A'. Note that the values of expectation depend on the initial state I. EA(A) = IA(A)F = iF = 8.6 EA(O) = (0, 9/10, 1/20, 0, 1/20, O)F =4.6 EA(1) = (0, 0, 3/20, /20, 15/20, O)F = 1.1 EA(01) = (0 O,/8 3 72/80, 5/80, ) F = 1.9 EA(10) = (0, 0, 3/20, 0, 15/20, 2/20)F = 1.1 = EA(i) (since 1ORFl) EA(11) = ( 0, 9/40, 0, 31/40, O)F 1.0 EA(010) = 1.9 EA(0101) = 9.1 27

Hence the expectation equilvalent deterministic machine is 1ot!d A o 1o i.9 0 1 We note that A' has 7 states while A has just 6 states. The deterministic cycle 0101 appears in both machines 28

2. DETERMINING WHETHER A PROBABILISTIC SEQUENTIAL MACHINE IS N-MOIENT EQUIVALENT TO A MACHINE WITH DETERMINISTIC SWITCHING AND RANDOM OUTPUTS In this section the concept of expectation equivalence is generalized to N-moment equivalence. A congruence relation RF is defined which partitions the input strings into classes whose members all produce the same expectation and first N-1 central moments in a given machine. If RF has finite rank, a finite quotient machine can be constructed which is deterministic with each state corresponding to a congruence class. Each state can be connected to a random device having the same expectation and N-l moments as the class represented by the state, giving a deterministic machine with random outputs. The deterministic machine constructed is N-moment equivalent to the probabilistic machine. After the first theorem concerning a necessary and sufficient condition that two strings be in the same RFN class it is obvious that a simple substitution gives generalizations of the results of section 1 and they are presented in this section without proofs. 2,1 DISTRIBUTION EQUIVALENCE~ -D The random variable structure of probabilistic sequential machines will be investigated in this section. Definition 2.1: 0A(x) = the output random variable of the machine A after a string x has occurred as input. The distribution of OA(x) is IA(x) and values of OA(x) are the entries of F. 29

Definition 2.2: A and A' are distribution equivalent written AEDA' if for JA = ~: (IA(x)jFj / O0 there is a 1-1 map h between JA and JA' such that IA(x) h() = I'A'(x) j j e JA' X EZ Fh(j) = FJ j e JA Distribution equivalence corresponds to the conventional definition of equivalence for discrete random variables except for random variables Fi t Fj for i # j. Referring back to example 0.2 note that two chemical cells are distribution equivalent if (1) We neglect those partitioned areas which have either zero efficiency or a zero fraction of the catalyst. (2) Of the remaining partitioned areas there is a correspondence between the partitioned areas of one cell and the other such that corresponding areas have the same fraction of catalyst regardless of the sequence of controls entering the cells. (3) Corresponding partitioned areas have the same efficiencies. 2.2 MOMENTS OF THE OUTPUT RANDOM VARIABLE Definition 2.35 Let F1 F =' Fi e R i = l2,,I,,n Fn call Then the i'th central moment of O0(x) is 3o

i(x) = EA[(O A(X) x)) i = 2,35 Sometimes AL ( x) = EA(X) will be used informally, Theorem 2.1 AJ4x)=. (k)-1) IA(x)(F )EA(X) k i = 2, 3.,k=O Proof: By the binominal theorem A1(x = EAj (-l)kO*(xk EA(x) (k),k=O _j *t i-k To compute the expectation of the discrete random variable OA(x) note * i-k i-k it has the same distribution as OA(x) but takes on values F1 9.. F Fn for i /k i-1 ix) = (-1) k( )E [O (x) ]EA(x) + (-l) EA(x) k=O i-l ~- k i i-k k i i =L ( (k) ~ IA(x)(F )EA(x) + (-1) EA(x) Q.E.D. k=O 2,3 SPECIAL PROPERTIES OF RABIN PROBABILISTIC AUTOMATA Definition 2.4: A Rabin probabilistic automaton [4] is a probabilistic sequential machine such that Fi = 0 or F 1 i = 1,2,...n. We now observe that Rabin probabilistic automata have rather special features as far as the random variable of the output is concerned. 31

Corrollary 2,1: For a Rabin probabilistic automaton A i-1 A() (-) k()(X) k+l (-l)E A(X) i = 2,3,... k=O Proof: Fi = 0 or 1 hence (Fi-k) = F for i / k and the result from Theorem 2.1. Corrollary 2.2: If EA(x) = EA(y) for some Rabin probabilistic automaton An then all central moments for x and y are equal also, i.eo A A Ixi(x) = yi(Y) for i = 2,3,*.. Note: for i = 2 we get the variances of the outputs are equal. Corrollary 2.3: If two Rabin probabilistic automaton A and A' are expectation equivalent then A A' pJi(x) -= ki (x) i =2,3,... Vx E 2.4 THE CONCEPT OF N-MOMENT EQUIVALENCE: =N Even if two machines are expectation equivalent, the statistics of their behavior may be so different that for many purposes we would not want to consider the machines behaviorally equivalent. Returning to example 0.1, two slot-machines can be expectation equivalent, meaning that the average payoff is the same for both, but one can be much more desirable than the other for a player of limited resources. For a player with limited resources might have a far longer average time until "gambler's ruin" on one machine rather than the other. Hence in order to lump those machines in the same class whose 32

statistics of behavior are somewhat alike, we introduce the notion of Nmoment equivalence. Definition 2.5: Probabilistic sequential machines A and A' are N-moment equivalent, written =N, if EA(x) = EA,(x) (x) = 4(x) i = 2,...,N for all x in Z 2.5 THE RELATIONSHIP BETWEEN -D AND -N Theorem 2.2 For probabilistic sequential machines A and A' A ED A' = —-A EN A' for all finite N Proof: Distribution equivalence means there exists an h such that Fh(i) F (IA(X))h(i) = (I'A'(x))i Vx E when (I'A'(x))iFi # 0 Hence n n (A( x) ) h( i) Fh i) = i=l i=l or EA(x) = EA,(x) which is expectation equivalence. For any finite N Fh(i) = (F') 33

The fact that A A' N (x) -= N (x) comes from inspection of Theorem 2.1. Symbolically, we have shown A Al -- A = A' for any N. How close one can come to a converse to Theorem 2.2 depends on the form of the entires of F. Lemma 2.1 (Gantmacher [11]) Given a sequence So, Si,... of real numbers S, if one determines positive numbers ri > 0, r2 > O,...,rn > 0 00 > Vm > Vml,..., V1 > 0 such that the following equations hold m (*) Sp = 7 rCV= (p = 0,1,2,...) j=l then the solution to (*) is unique. We can apply the lemma to get the following partial converse. Theorem 2.3 If machines A and A' meet the following requirements (Letting h(i) = i W.L.G.) (i) (IA(x))iFi = 0 iff (I'A'(x))iFi = 0 i = 1,2,... n. (ii) All states in a given machine have distinct output symbols (iii) EA(x) = EA,(x) Vx e Z A A'.(x) = (x) i = 2,3,... Then A and A' are distribution equivalent. Proof: We use Lemma 2.1 Since the central moments and expected values of output are equal for any string, the moments of OA(x) and OAT (x) about zero are equal for any string. 34

So = Zi[(iA(x))i such that Fi ] 0] S1 = EA(X) = EA,(X) A 2 A' 2 S2 = 42(x) + EA(X) = k2 (x) + EA,(x) We discard those components whose contribution to the moment is zero and relabel the non-zero components by the index j. Let J = (i: IA(x) iFi 0) Because of assumption (i) we also have j + {i: I'A(x) FI O]0) Hence Sp = (IA(x)) (F.)P P = 0,1,2,.. jeJ = (I'A'(x)) (F'.) P = 0,1,2,... jcJ By the lemma the solution is unique. (IA(x))j = (IA(x))j j J j E J Fj = F j e J Hence A and A' are distribution equivalent. Example 2.1 The condition (ii) of theorem 2.3 is a necessary condition as shown by the following: IA(x) = (.5,.5,.2) F = F' = 1 I'A'(x) = (.5; 4, 51)

EA(X) = IA(x) F =.5 EA (X) = I'A'(x) F' =.5 Since A and A' are Rabin automata, by Corollary 2.3 A A i(X) = i(X) i = 23. However, A and A' have different distributions over states for the string x. 2.6 THE N-REDUCTION RELATION N N Definition 2.5: The N-reduction relation RF: xRFy if for all I in S A A [EA(x) = EA(Y) and I(x) = A(y) i = 2,3,..,N] A A* = [EA(xz) = EA(yz) and ti(xz) = i(yz) Vz E, i = 2,3,**. N] The relation RF is a congruence relation and RF = RF. Elements in the same congruence class of RF have expectations and the first N-l central moments equal. Hence the machine Z mod RN can have random devices attached to the states (which are RF[x]) such that the first N-1 central moments and expectation of each device is the same as the congruence class represented by the state. The resulting machine has deterministic switching and random output functions and is equivalent by aN to the probabilistic machine defining RNF. Theorem 2.4 The N-reduction relation is non-trivial iff there are strings x and y, h, N A(x) = A(y) + where < (hj,.o.,hn >C ) Kern>(Fi) N and < [hl,...,hnl > A(C) C n Kern.(Fi) i=l

N Proof: Suppose that RF is non-trivial. EA( X) = EA(Y) x / y <2-> IA(x) F = IA(y) F VI e S < —> A(x) = A(y) + rn ri e Kern.(F) i = 1,2,...n 2%(x) = IA(x)(F2) - EA(X)2 = IA(y)(F2) - EA(Y) 2 IA(x)(F)2 = IA(y)(F2) VI e S < A(x) = A(y) + ri e Kern.(F2) rn For any i, A~(x) can be written as a recursive function of IA(x)(Fi) and smaller powers of F, i.e,, i-i A 7 (_J~k ~ F i-k k (x) = A(x)(F ) + (-) IA(x)(F i-k)EA(X) + (-1) EA(X) L — k k=l Hence by induction we assume IA(x)(Fk) = IA(y)(F ) k = 112.*o.i-l; VI Hence A(X) IA(x)(Fi) + f A4() = IA(y)(Fi) + 5 37

Ai(x) = i(y)< —IA(x)(F') = IA(y)(F') VI rl <K A(x) = A(y) + where ri E Kern.(Fi) rn The rest of the proof is analogous to Theorem 1.5. Q.E.D. N If we substitute RF for RF and Q Kern.(F1) for Kern.(F) the proofs of i=l Theorems 1.4, 1.6, 1.8, and 1.9 go through exactly as before and we state the dual theorems which are obtained. Theorem 1.4D If RF has finite rank r there exists a partition Tr = (i1r,...,rr) on V(A) and an integer valued function y(i,m) such that rciA(a) = Ty(i, a) i = 1,2,...~ Cr a e. Theorem 1.6D Let U = (< [ JA(x)i-A(y)i] i = 1,2,...,,n > for (x,y) E RF then x* for any z e Z' N UA(z) (.i' Kern.(F ) -> there exists V i=l a subspace of Rn such that for any i e Z (i) UA(i) C V N (ii) VA(i) = VCQ Kern.(Fi) i=l 38

Theorem 1.8D RN is non-trivial - (dV) a subspace Rn such that N (i) VC(T Kern.(Fi) i=l (ii) V is invariant under [A(i) i Z (iii) A(x) = A(y) + H where Hi E V some Hi ~ O. Example 2.2 We extend Example 1.3 to illustrate theorems 1.4D and 1.8D. N 10in < ((O, O, p, O, -p, 0) > C Kern 5n n=l in 2n for any finite N. Hence we can replace the output from any state with a random device with the same first N central moments as the probabilistic sequential machine. By way of illustration, we compute the variances. Note that here the classes of RF are also the classes of RF. 2A(A) (8/10, 1/10, 1/10, 0, 0, a) 100 -(8.6)2 25 = 8.84)4 39

Likewise, we get 42(o) = 1.44 2(1) =.09 P-2(01) =.09 42(10) =.o9 2(11) (0, 0, 9/40, 0, 31/40, 0) 100 -(1.0) 25 4 1 4 4 2( 010o) = (0, o, 21/320, o, 11/320, 72/80) 0loo - 3.61 25 = 0.0 20101) = (72/80, 0, 53/1280, 0, 75/1280, 0) 100 - (9.1)2 25 4 4 = 7.29 Hence a machine A' which has the same expectation and variance for each string can be constructed with deterministic switching and random output devices symbolized by s: t e,N n 40

The machine A' is then just the machine of example 1.3 with the outputs connected to devices such as the above. 7 ~6 [9.1, 7.9. 0 0 0 ~21: 18.6, 8.;,! where C is the initial state of A'. Fig. 2.1 Machine A' which has the same expectation and variance for all strings as probabilistic machine A of Exanple 1.3. Example 2.3. Probabilistic sequential machines A and Al such that EA(x) = EA.(x) and A(O) = l O o1 A() =/5 1/,1/4 - 1:/ 4 /_o 0 41 41

A'(O) = O ~ l A'(1) 4/5 l/5 0 O 1/4 3/4 0 4/5 1/1: O O l 4/5 l/5 0 For both machines F = F2 for F1, F2 arbitrary real numbers Note that RF is non-trivial since there is an invariant subspace U = < {(1, O, -1) > such that [A(O)-A'(O) ]j e U [A(l)-A'(1) ]j e U2,...n. Theorem l.9D RF is non-trivial<=>the symbol matrices A(i):i e Z be reducible for the same change of basis (f)r V) i.e. a a linear transformation W from the state basis S to a basis for V such that basis for V A1 O0 W-1A(i) W = i i A2 A3 where O denotes a block of all zeros the same size for all symbols i N and V C 6 Kern,(Fi). i=l

3. THE NOTION OF INDISTINGUISHABILITY AS A CRITERION OF BEHAVIORAL EQUIVALENCE If probabilistic sequential machines A and A' are behaviorally equivalent in an intuitive sense, taking into consideration how machines are built and repaired, one would expect them to be interchangeableas a submachine of any larger machine. Indistinguishability of two machines in any machine in which they can be plugged into is a strong criterion, the ramifications of which we shall investigate. The following example [9] illustrates how the notion of equivalence through accepting the same set of tapes, =T) fails to meet this indistinguishability requirement. 3.1 EXAMPLE OF TWO DISTRIBUTION EQUIVALENT MACHINES WHICH PERFORM DIFFERENTLY AS COMPONENTS OF A MACHINE A1 = < I?, A1(O), Al(1), F1 > where o 1/2 1/2 o o O O 0 1 0 A( O) = Al(l 1) = 0 o0 0 0 0 0 0 0 1 O O O 0 1 F1 = 0 I1 = (1 0 0, 0, O0) A2 = (I29 A2(O), A2(1), F2) where 12 = Il F2 = F1 43

o 1/2 1/2 O O o 0 0 1/2 A2(0) = A2(1) = 0 0 0 1/2 1/2 0 0 0 0 1 0 0 0 0 1 Note that machines Al and A2 happen to be independent of the input as A1(O) = Al(l) and A2(0) = A2() and hence are both markov processes. TABLE 3.1 COMPARISON OF MACHINES Al AND A2 x EA (x) I1Al(x) EA2(x) I2A2(x) A 0 (1, 0, O, O, o) O (1, o, 0, 0, 0) 0 or 1 1/2 (0, 1/2, 1/2, 0, 0) 1/2 (0, 1/2, 1/2, 0, 0) 00, 01, 10 or 11 1/2 (0, 0, 0, 1/2, 1/2) 1/2 (0, 0, 0, 1/2, 1/2) all x: ~g(x) ~ 3 0 (0, 0, O, 0, 1) 0 (0, 0, 0, 0, 1) From the above table we see that Al and A2 are distribution equi-valent as well as expectation equivalent. We later will show the existence of a machine which behaves differently with Al and A2 as submachines despite the fact that the state behaviors of Al and A2 are Markov processes. Definition 3.1. A + B denotes the machine obtained from plugging the outputs of A into the inputs of B, subject to the provision that the inputs of B be compatible with the outputs of A. Definition 3~2. A and A' are tape equivalent machines, written A -T A' if for some specified k! and k2 44

T(A, X1) = T(A', \2) Definition 3.3. A and A' are tape indistinguishable for a class C of machines if T(A+C, ) = T(A'+C,X) for all x and C e C. We may sometimes let C be a larger class than finite deterministic or probabilistic automata. Theorem 3.1 If probabilistic sequential machines A and A' are distribution equivalent they are not necessarily tape-indistinguishable for the class of finite deterministic automata. Proof (by example): Let C be a finite deterministic machine which accepts 01, 10 with probability 1 and all other types with probability 0. We tabulate the expectation of A1 -+ C and A2 + C in Table 3.2. TABLE 3. 2 EXPECTATION OF Al -+ C AND A2 + C xA1-C (x) A2_C (x) 00 0 1/4 01 1/2 1/4 10 1/2 1/4 11 0 1/4....~~4

Hence T(A1+C,\) # T(A2+C,X) for any % E (1/2, 0). The reason for this difference is because the conditional probabilities of output random variables differ for Al and A2. For example, Prob. (Ol (01) = 1) = 1 given O* (1) = 0 While Prob. (0* (01) = 1) = 1/2 given O* (1) = 0 A2 A2 Theorem 3.2 For probabilistic sequential machines A and A' if for all finite deterministic machines C and any cutpoint A. T(A+C,2 ) = T(A' +C, ) - A A' E Proof: Suppose EA(x) # EA,(x) for some tape x of length k. Without loss of generality pick EA(x) > EA,(x). Let kc be a rational such that EA(x) > >c > EA,(x). Let C be a deterministic machine which beginning at time k computes the number ik-%c where ik is the input at time k. Since Xc is rational C needs only a finite number of states. C accepts the string x iff ik-\c > O, which can be done in a finite number of steps. x e T(B+C, c) iff EBC( x) > _c but since C is deterministic x e T( B+C, Xc) iff EB(x) > Xc hence let B = A and B = A': x e T(A+C, Xc) and x E T(A'+C,Xc) so T(A+C, Xc) ~ T(A'+C, Xc)

By logical equivalence we have shown for the class C of finite deterministic machines (\)(C) [T(A+C,y\) = T(A'+C,v) i] (x) [EA(x) = EA,(x)] Qo.E.D. By the example presented in Theorem 3.1 we know the converse is not true, 3.2 A MORE SATISFACTORY TECHNICAL NOTION OF INDISTINGUISHABILITY The example at the beginning of this section shows that notions of machine equivalence such as =T equivalence and even distribution equivalence, D, break down under composition of machines. In order to get a more satisfactory definition of behavioral equivalence, the conditional probability structure of probabilistic sequential machines will be explored. A stronger concept of equivalence, called indistinguishability, based upon equality for the two machines of the probabilities of all possible output strings given all possible input strings will be formulated. Following the development of Carlyle [6], a bound will be found for the length of strings needed for deciding whether two machines are indistinguishable. In what follows it is assumed that Z contains a string of one symbol A so that A(A) = ~E the matrix identity. Definition 3.4. The conditional probability for a sequence of outputs y = YlY2. *Ym given a string of inputs x =.o~.am starting from an initial distribution II = ( I.,I2... eelIn) in a machine A will be written PI( (y/x) or if the machine involved is clear from context, just PIF(y/x).

We note that the symbols of the output alphabet are real numbers which occur as components of the output column vector F, i.e. the output alphabet Y can be written n Y = U {Fi i=l Definition 3.5. The probability of a sequence of transitions Sil+Si2+o..+Sij with output sequence y because of input sequence x will be written PSi,+.. * Sij(Y/x) Definition 3.6. The conditional probability transition matrix A(yi/a) is formed from A(ca) by zeroing out all columns except those corresponding to states with output yi. More formally, Let J j =i] Yi ~ Y Yi Yi Yi and let Q be the matrix with [Q ] 1 for j J and Q ] = 0 J'9 Yi k otherwise. Then A(yi/a) = A(a)Q Yi E Y, a e Z. Note that [A(Yk/c) ]ij is is just Psisj (Yk/a) Remark 3.6' Let y E Y*, x e Z*, Yi E Y, a E 7 such that ~g(y) = ~g(x) Then A(yyi/xa) = A(y/x) A(Yi/a) By definition [A(yyi/xa) ]lm is PS,+Sm(yyi/xc) For any state Sk

S y+Smy Yyi/jx) = SQ+Sk( y/x) Sk+Sm( yi/) since transitions to different states Sk are mutually exclusive events. n PS3+Sm(yyi/xc) = PS+Sk (y/x) PSk+Sm(Yi/a) k=l using the definitions again n [A(yyi/X) ],m = X [A(y/x) ]e,k[A(Yi/c) ]k,m k=l or in matrix form A(yyi/xc) = A(y/x)A(Yi/a) Hence the conditional probability transition matrices for strings can be generated by the conditional probability transition matrices for symbols as was the case for the transition matrices A(x). Remark 3.7: Given initial distribution over states 1Ti the probability of getting output string y from input string x is just n n P4(y/x) =:2 nA(y/x)] P In ( Y/X) = X X Pi [A(y/x) ] i j j=l i=l with S = we can write P4(y/x) = I:A(y/x)S Remark 3.8: We note the following identity 49

PAI(y/x) = P (yyi/xy) for all a e Z since Piri( yy /xa) = ItA(y/x) A(yi/c) S YiEY YieY IIA(y/x) A(Yi/c)S = IIA(y/x)A(ca)S YieY But for any n x n stochastic row matrix C CS = S Hence ITIA(y/x) A(a) S = IIA(y/x)S = P (y/x) Definition 35.7. The terminal distribution II*(y/x) for a sequence of outputs y given inputs x II A(y/x)S The i'th component of II*(y/x) is the probability of being in state i after input string x has occurred and output string y has been observed. The following identity holds whenever PiA(y/x) > 0. A A *y PI'I(YYi/xc) = PI (y/x) PI*(y/x) (yi/j) Yi e Y, 9 E Z Definition 3.8: Machines A and A' are indistinguishable written A T A' if A A' * Pii(y/x) = PII' (y/x) Vx Ce Vy e Y Hence our concept of indistinguishability for machines depends on observable identity when both machines are started from their initial state distributions.

Definition 3.9: Machines A and A' are k-indistinguishable if A A' m m PII(y/x) = PrI (y/x) x E (Z), y E (Y) for m = O,l,.,k. Definition 3.10: In a machine A, two initial state distributions II and X are indistinguishable if A A y PI (y/x) F (y/x) vy Y, vx e Definition 3.11: In a machine A, two initial state distributions II and. are [k-indistinguishable if pA (y/x) = PA(y/x) vx such that ig(x) < k, Vy such that ~g(y) = gg(x) Checking whether the indistinguishability definition (3.8) for machines or for initial distributions (3.10) holds using only the definitions involves calculation of an unbounded sequence of conditional probabilities. In the next section is shown a bound for the length of strings whose probabilities need to be calculated. As in the deterministic machine case, if n is the number of states, then only strings of length n-2 or less need be considered in establishing indistinguishability. 353 THE RELATIONSHIP BETWEEN THE INTUITIVE AND TECHNICAL CONCEPTS OF INDISTINGUIS HABILITY We have yet to relate the intuitive notion of indistinguishability to 51

the technical definition 3.8. In the next theorem will be shown that two machines indistinguishable in the technical sense are indeed indistinguishable when plugged into C, any finite state probabilistic or deterministic machine. Since C has a finite number of states, it is assumed that finite strings of Z = C(Y*), the random variable taking on values of strings of outputs of C given strings of inputs from the random variable Y, depend only on finite * strings Y Theorem 3.4 Let C* be the class of finite state probabilistic and deterministic sequential machines. For any C e C* A-C A' C PII (Z = C(Y*)/x) = P II (Z = C(Y*,)/x) if A At PII(A/x) = PII'(YA'/X) for YA and YA, having the same range yA E Y Proof: For any fixed value YA of the output string random variable of A, YA PAI (z = C(YA)/X) = P (YA/x)P (z = C(yA)/yA) * since the occurence of different YA are disjoint events, for all YA e Y: g(yA) = 4g(x). A4C A C PII (Z = G(YA)/x) =PI(YA/x)P (z = C(YA)/YA) yAe(Y) ~g(x) Likewise

A't+C C PII (Z' = C(YA)/x) =P (YA/) (z = C(YA)/YA yme(Y) (g(x) (z' Y But since Z and A' and YA and YA, range over the same sets respectively, and the indistinguishability of A and A', i.e. A A' P II(YA = YA/) = PI'(YA' = YA' /x) we get A+C At' C PII (Z = C(YA)/X) = PII' (1 = C(YA,)/x) which means A+C and A'+C are indistinguishable. Q.E.D. Since the machine C might ignore its inputs, it is clear that the converse to Theorem 3.4 does not hold. Hence the criterion of indistinguishability as a submachine has lead us to the technical definition 3.8 as a kind of behavior equivalence.

4. FINITE COMPLETE SETS OF INVARIANTS FOR THE BEHAVIORAL EQUIVALENCES -E' -N' AND -I AND THE REDUCTION CONGRUENCE RELATIONS RF AND RN The results of the previous sections involve relations defined over all finite strings of the input alphabet. In this section are found bounds for the length of strings necessary to consider in order to decide whether two elements of the domains of the relations are in the same class. Definition 4.1. A set of functions fl,...,fm is a complete set of invariants for the relation R if for all x and y in the domain of R xRy <== f(x) = fi(Y) i = 1,...,m We now show sets of functions which are invariants for the above relaN tions. A set of functions which are invariant over RF and RF are: f,A,)(X) = EA(xZ) for all z: ~g(z) < i, for all I e S f(ANz)(X) N(XZ) While for the relation I, the set of functions below is a set of invariants(A) = P(y/x) for all x and y: ~g(x) = eg(y) < i Likewise the set h(X 1)(A) = EA(x) for all x: ~g(x) < i h(x r)(A) = r(x) forr=2,. e.,N is a set of invariants for the relations -E and It is clear that for an unbounded i, the above are complete sets of invariants. However, in what follows a finite value of i will be found for each of these cases, In the case of -E the bound will be the same as the 54

well known Moore bound for deterministic automata but in the case of =N it will be lower for most machines. The main tool used in finding the various values of i is the following simple lemma. 4.1 THE FUNDAMENTAL LEMMA Lemma 4. 1 Given an n-dimensional vector space V, a finite set T = (Ti] where each Ti E V x V is a linear transformation on V and some finite set of vectors VoC V such that dim <Vo> = r > 1. Define Mo = Vo M1 = fvo.Ti Ti E T, vo e VO] Mk = oTil.. Ti Til'.. Tik T, vo e V]i and let Li MJ j< =0 Then there exists an integer J(T) such that for any vo e Vo: vo f 0 (i) LJ(T) = LJ(T)+1 (ii) Lk-1 Lk for k 4=J(T) (iii) J(T) < n-r Proof: Lo C L1 C... C Li C.. C Lk as a consequence of the definition. 00 The sequence (dim Lj)j=0 is bounded above by n, the dimension of V. Call J(T) the smallest index k such that Lk+l = Lk. Showing that the sequence J(T) (dim Lj]j=_ is strictly increasing requires that for all j+l < J(T) j~~~~~~~o~5

Lj+l # Lj+2 =Lj+l 1 Lj which is logically equivalent to Lj+i = Lj ->Lj+2 = Lj +1 Hence it is sufficient to show Lj +1 Lj Lj +2 C Lj +1 Assume Lj +1 C Lj W. L.G. pick V = Vo'Til...Tij+2 Lj2 C = (vo'Til..Tij+l) Tij+2 But W = vo'Ti Ti j+l Lj+l So there is a finite set of indices I = (ii} of a spanning set U for Lj U = Vo T iT Ti.*Ti:i i E I) B1 B2 Bri such that ri < j and constants ci w = Z ci(VoTBi *T i so v = wTi.j+2 = ci(voTBi.. TBi *Ti j+) e Lj+l i.e. Lj+2 C Lj+i Now consider the sequence of dimensions dim Lo, dim L1,...,dim LJ(T) since LLk Lk+l for k+l < J(T), dim Lk < dim Lk+l for k+l < J(T)

Noting that dim Lo = r, dim Lo + J(T) < dim LJ(T) < n which gives J(T) < n - r Q. EDo 4.2 A BOUND FOR TESTING FOR MEMBERSHIP IN -I Theorem 4.1 If A is a probabilistic sequential machine with n states, then (n-l)indistinguishability of initial distributions it and iT' is sufficient to guarantee indistinguishability of initial distributions it and it'. Proof: Using lemma 4.1 let Vo = iSl = \/i()and dim <Vo> = T = IA(Yi/a) Y i E Y, c E C3 Vo. Ti = A(yi/) S by the lemma, For any string x = i..ir,: for r' finite, A(y/x)S can be expressed as A(y/x) S = ci A( YB i / ) (*) writ with r < n - 1 for i E I, y i Yi a.i E Z (for k = 1,~..,ri ) Bk Jk Hence for initial distributions i and it'

PIi(y/x) = IIA(y/x)S = ciIIA(y i Y /a i )S i' IB1 Bri J 1 Jri Let i and xi i'' y and x i'''~ i B1 Bri J Jri A i i i i Pii(Y/X) = ciPiA (Y /x ) with eg(y ) = ~g(x ) < n - 1 i-I multiplying (*) by ic' gives AA Ayi'i'I (y/x) = ciP, (/x) ieI By the assumption of (n-l)-indistinguishability for Tr and T' A i-i- A i-i i PAI(y /x ) pAII(yi/xi) eg(xi ) = ig(y ) < n 1 Hence PA(y/x) = PA (y/x) Q.E.D. 4.3 EQUIVALENCE OF DISTRIBUTIONS IN ONE MACHINE Using Lemma 4.1, we can now make effective the definition of the relaN tions RF and RF of Section 2. A bound will be obtained for the lengths of strings necessary to consider to decide whether x and y are in the same congruence class. Definition 4.2: Distributions jx and % are equivalent for a machine A, written r A \, if Tc A(x)F = NA(x)F x E Z Definition 4.3: Distributions c and \ are K-equivalent for a machine A,..~~~~~~5

K written r - A%. if IIA(x)F = *A(x)F x e: < Qg(x) < K Theorem 4.2 If A is a probabilistic sequential machine with n states and if Tr and A are (n-1) -equivalent in A then T A# A. Proof: Let x be in Z and let us use Lemma 4.1 with A(x) F = ciA(xi)F Qg(xi) < n - 1 ieI Hence =ZIA(x)F = ciIA(x )F KA(x)F = ci>A(x )F ieI iE I (n-l) -equivalence gives IIA(x )F = \(A(x) F i e I So IIA(x)F = %A(x)F QoE. Do 4.4 BOUNDS FOR TESTING FOR MEMBERSHIP IN -E AND RF Definition 4.4: The abstract join of probabilistic sequential machines A = < rr,A(O)...A(k-l),F > with n states and A' = <A~A'(O)....A'(k-1) F'> with n' states is the abstract n+n' state machine A& written AeA' = <,A&(O),...A&(k-l),F& >

where A@(i) = (A(i) ATOi) \o A'(i and F'F Xr and A can be embedded in the n+n' dimensional space as n' zeroes n zeroes 4= (:I, o01. ) =' ('7 \) The problem of deciding whether two machines A and A' are expectation equivalent, i.e. iA(x)F = %A'(x)F Vx e is logically equivalent to deciding when Tf and Ad are equivalent in A+A', i.e. whether ALA' Hence following Caryle [6], we use Theorem 4.2 to state Remark 4.1: XE & n+n'-l A&A' AA' which gives the following theorem. Theorem 4.3 Let A and A' be probabilistic sequential machines having n and n' states respectively. Then a necessary and sufficient condition that A and A' are expectation equivalent: [~A(z)F = kA'(z)F' Vz E ]<=> [rrA(x)F = XA'(x)F' Vx: eg(x) < n+n'-l]

Theorem 4.3 makes the experimental determination of expectation equivalence possible provided the number of states of both machines are known. Furthermore, it gives a bound on the process of finding whether two strings are in the same equivalence class under the reduction relation RF of Chapter 1. This result is summarized in the following theorem. Theorem 4.4 Strings x and y are in the same equivalence class under the reduction relation RF of an n state probabilistic sequential machine A< —>EA(xz) = EA(yz) for all strings z: ~g(z) < n-l and all I e S. Proof: xRFy=>=EA(XZ) EA(YZ) for all z e for all I S <=> IA(x) A( z) F = IA(y) A( z) F Let Xr = IA(x) and X = IA(y) <=>A( z)F = XA(z)F Vz e Z By Theorem 4.2 and its obvious converse, we get => IA( x) n1 IA(y) which gives the theorem. N 4.5 BOUNDS FOR TESTING FOR MEMBERSHIP IN — N AND RF Definition 4.5: nF = the independence number of an n state machine A with output vector F. nF = dim < ((Fi): i = 1,2,...,n > It follows from vector space arguments that 61

nF = Fk: Fk / 0 The independence number is just the dimension of the space generated by powers of the components of the output vector F. For a Rabin automata nF = 1 and all central moments reduce to polynominals in what we may consider the first "central moment" EA(X). In general, if the independence number is nF, then for all x in Z*, the (nF+l)'st central moment AnF+l(X) reduces to a polynomial in the lower central moments since A nF+l n +1(X)= IA(x)(F ) + Q(x) ~nF+x where Q(x) is a polynomial in which IA(x)(Fi), i = l,...,nF occur. Hence nF+l IA(x) c i(Fi) + Q(x) i=l since nF is the dimension of the space <(Fi) i = 1,2,..n)> nF X ciIA(x)(Fi) + Q(x) i=l Theorem 4.5 Let A be a probabilistic sequential machine with output vector F and n states. Then for any r < nF and strings x and y in A*' EA(xZ) = EA(yz) EA(z') = EA(yz') A2(xz) = ( z = [A((YZ)) z Vz'' ig(z') < n - r A A A(xz') -A 1(YZ ) r(xz) =r(yz) C[(yz?) Proof: Using Lemma 4.1 with Vo = F,(F),.,(Fr) 62

dim <Vo> = r < nF (Ti) = (A(i) i e ZI for any vo e <Vo> r vo Ti = A(i)vo = CkA(i) (Fk) k=l Consider any string z: Qg(z) = m' finite Then there exists a spanning set A(xi)vo with i e I and constants ci(vo) so that A(z)vo = j ci(vo)A(xi)vo g(xi) < n - r iEI Let vo range over the (Fi) i = 1,2,...,r. For any jr and k there are constant functions depending on (Fi), ci((Fi)), such that jrA(z)( F) = ci((Fi)) tA(xi) ( F) iEI A(z) (Fi) = j ci((F')) )A(x') (F') ieI Hence the moments about zero from C and % are equal if they are equal for all strings of length < n-r. Let i = IA(x) and X = IA(y). Then we have for any z and any initial distribution IA(xz)(Fi) = IA(yz)(Fi) i = 1,2,...,r holds if and only if for i = 1,2,.,,,r IA(xz')(Fi) = IA(yz')(Fi) for all strings z of length less than or equal to n-r. Noting by Theorem 2.1 63

that any central moment 4A(x) is a function of IA(x)(F),...,IA(x)(Fm) the result is established. Q. EoDo Corollary 4.5 (Bound for the relation RN to hold) Let A be a probabilistic sequential machine with n states and with N < nF. Then xRFy<->for all strings z': Qg(z') < n-N EA(XZ') = EA(yz') A A < 4A2(xz) = A2(yz)') for all I c S NA(XZT) = A(yz1) Theorem 4.6 Let A and A' be probabilistic sequential machines having n and n' states respectively. Then for all r K nF + nFt - 4/ {'' e YTY' and y # 01 and for any initial distributions Tr in A and k in A' then EA(X) = EA(X) EA() = EA(x') iA x A' * A A' 4< 2 = ~ 2(x) x= 2e Z ><=> 1 x') = L2 (x') Vx'' g(x') < n+n'-r, I I A' A Proof: Construct A = A&A' and let VO in Lemma 4.1 be F F = dim<V Vo = {(F F,),...,n+n') nF dim <Vo> y ([:(yY or eY') and 9 0 and'iYnY'} nF+nFT -#{ Y: yEYY' and' # O0 Using Lemma 4.1 and an argument like the one in Theorem 4.4 establishes the theorem. Q.ED.

4.6 DISCUSSION OF THE GENERALIZATION OF THE MOORE BOUND Corollary 4.6 Let A and A' be n-state deterministic machines with two-valued output alphabet Y = Y' = (1,23. Then A and A' are indistinguishable for all strings if they are indistinguishable for all strings of length at most 2n-2, Proof: In Theorem 4.6 we have nF 2+2-2 = 2 so that r < 2. For deterministic machines, indistinguishability reduces to EA(x) = EA,(x) for all x e Z and also EA ( A' EA(X) = EAt(x) 2 A(x) = e2 (x) Hence the right side of Theorem 4.6 gives the result. Q.E.D. Theorem 4.6 can be regarded as a generalization of the Moore result [7] to probabilistic machines with arbitrary rather than binary output alphabets. Note that Moore's bound is 2n-1 since he considers the initial output as part of the experiment. We consider the initial outputs when considering strings of length 1 since the symbol A has identity symbol matrix. The role of the zero output symbol in Theorem 4.6 is a significant departure from Moore's deterministic results. In order to get the same result as Moore in Corollary 4.6 it was necessary to pick a two-valued output set (1,23 rather than (0,1) with the implicit assumption that such recoding of output symbols cannot affect indistinguishability between deterministic machines. Without the recoding, r = 1 and the bound is one higher than the Moore bound. However, in the probabilistic case, a different bound for machines with a zero output symbol than those with nonzero symbols seems reasonable. A

zero annihilating some probabilities in the expectation and higher moments can mask significant changes in distributions. It is clear from Theorems 1.8 and 1.8D that changes in Fi from zero to nonzero can affect the kernel of F, perhaps to the extreme of making RF of infinite rather than finite rank and preventing the construction of an N-moment equivalent finite machine with deterministic switching.

REFERENCES [1] Braines, S. N. and Svechincky, V. B., "Matrix Structure in Simulation of Learning" I.R.E. Transactions on Information Theory Vol. IT-S, No. 5, September 1962, pp. 186-190. [2] Howard, Ronald A., Dynamic Programming and Markov Process, M.I.T. Press and John Wiley and Sons, Inc., New York, 1960 (Chapter 3). [3] Thrall, Robert M. and Tornheim, Leonard, Vector Spaces and Matrices, John Wiley and Sons, Inc., New York, 1957, pp. 298-300. [4] Rabin, M. O., "Probabilistic Automata," Sequential Machines, Selected Papers, Edited by E. F. Moore, Addison-Wesley Publishing Co., Inc., Reading, Mass., 1964, pp. 98-114. [5] Burks, Arthur W., "Computation, Behavior and Structure in Fixed and Growing Automata," Behavioral Science, Vol. 6, N. 1, January 1961. [6] Carlyle, J. W., "Equivalent Stochastic Sequential Machines," Institute of Engineering Research Report Series, No. 60, Issue No. 415, Electronics Research Laboratory, University of California, Berkeley, California, 1961. [7] Moore, E. F., "Gedanken-experiments on Sequential Machines," Automata Studies, C. E. Shannon and J. McCarthy eds., Princeton University Press, 1956. [8] Shannon, C. E. and Weaver, Warren, The Mathematical Theory of Communication, University of Illinois Press, Urbana, 1948, pp. 34. [9] Arnold, Richard, unpublished communication. [10] Rabin, M. 0., and Scott D., "Finite Automata and Their Decision Problems," IBM Journal Res. and Dev., 3, 1959, pp. 114-125. [11] Gantmacher, F. R., The Theory of Matrices, Vol. 2, Chalsea Publishing Co., N.Y., N.Y., 1959, pp. 236-237. 67

DISTRIBUTION LIST (One copy unless otherwise noted) Technical Library Naval Electronics Laboratory Director Defense Res. & Eng. San Diego 52, California Room 3C-128, The Pentagon Attn: Technical Library Washington, D.C. 20301 Dr. Daniel Alpert, Director Defense Documentation Center 20 Coordinated Science Laboratory Cameron Station University of Illinois Alexandria, Virginia 22314 Urbana, Illinois Chief of Naval Research 2 Air Force Cambridge Research Labs Department of the Navy Laurence C. Hanscom Field Washington 25, D.C. Bedford, Massachusetts Attn: Code 437, Information Attn: Research Library, CRMXL R Systems Branch U. S. Naval Weapons Laboratory Director, Naval Research Laboratory 6 Dahlgren, Virginia 22448 Technical Information Officer Attn: G. H. Gleissner, Code K4 Washington 25, D.C. Asst. Dir. for Computation Attention: Code 2000 National Bureau of Standards Commanding Officer 10 Data Processing Systems Division Office of Naval Research Room 239, Building 10 Navy 100, Fleet Post Office Box 39 Washington 25, D.C. New York, New York 09599 Attn: A. K. Smilow Commanding Officer George C. Francis ONR Branch Office Computing Laboratory, BRL 207 West 24th Street Aberdeen Proving Ground, Maryland New York 11, New York Office of Naval Research Office of Naval Research Branch Branch Office Chicago Office 230 N. Michigan Avenue 495 Summer Street Chicago, Illinois 60601 Boston, Massachusetts 02110 Commanding Officer Naval Ordnance Laboratory ONR Branch Office White Oaks, Silver Spring 19 1030 E. Green Street Maryland Pasadena, California Attn: Technical Library Commanding Officer David Taylor Model Basin ONR Branch Office WashingtonD.C. 20007 1000 Geary Street Attnt Code 042, Technical Library San Francisco 9, California

DISTRIBUTION LIST (Concluded) The University of Michigan Lincoln Laboratory Department of Philosophy Massachusetts Institute of Technology Attn: Professor A. W. Burks Lexington 73, Massachusetts Attn: Library National Physical Laboratory Teddington, Middlesex, England Office of Naval Research Attn: Dr. A. M. Uttley, Supt. Washington 25, D.C. Autonomics Division Attn: Code 432 Commanding Officer Kenneth Krohn Harry Diamond Laboratories 6001 Dunham Springs Road Washington, D.C. 20438 Nashville, Tennessee Attn: Library Mr. Laurence J. Fogel Commanding Officer and Director General Dynamics/Astronautics U. S. Naval Training Device Center Division of General Dynamics Corp. Port Washington San Diego, California Long Island, New York Attn: Technical Library Department of the Army Office of the Chief of Research and Development Pentagon, Room 3D442 Washington 25, D.C. Attn: Mr. L. H. Geiger National Security Agency Fort George G. Meade, Maryland Attn: Librarian, C-332

Unclassified Security Classification DOCUMENT CONTROL DATA R&D (Security claeelficatlon of title. body of abetract and Indexing annotation must be entered when the overall report ie cl.aalled) 1. ORIGINATIN G ACTIVITY (Corporate author) Za. REPORT SECURITY C LASSIFICATION Logic of Computers Group Unclassified The University of Michigan 2 b GROUP Ann Arbor, Michigan 48104 3. REPORT TITLE EQUIVALENCES BETWEEN PROBABILISTIC AND DETERMINISTIC SEQUENTIAL MACHINES 4. DESCRIPTIVE NOTES ('typo of report and Inchuaive datee) 5. AUTHOR(S) (Lest name. firt name, Initial) Page, Carl V. 6. REPORT DATE 7i. TOTAL NO. OF IPACG.7b. NO. OF nars April, 1965 67 11 a.. CONTRACT OR GRANT NO. 9a. ORIGINATOR'$ REPORT NUMI:R(S) Nonr 1224(21) 03105-37-T b. PROJECT NO. c. SbR. OTIH REPORT NO(S) (Any other nubere that may be aeaIn ed hi..pori. d. 10. A V A IL ABILITY/LI.ITAtiON NOTICES Qualified requesters may obtain copies of this report from DDC. 11. SUPPL EMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY Office of Naval Research Department of the Navy Washington 25, D.C. 13. ABSTRACT The concept of probabilistic sequential machines (PSM), a generalization of Rabin's concept of probabilistic automata, is defined. Such diverse devices as unreliable digital computers, slot machines, and chemical cells are presented as examples of PSM. Using the examples as motivation, various kinds of equivalences between machines are discussed. The fundamental question of when a PSM is equivalent in some sense to a deterministic machine, perhaps with random devices attached to output states, is considered. Finally various tests involving finitely many random variables are devised for each of the kinds of equivalences between PSM and for reduction, if possible, to deterministic machines. One of the tests is a further generalization of the Moore bound for deterministic machines than has previously appeared in the literature. (U) DD 1'JAN,4 1473 UNCLASSIFIED Security Classification

UNCLASSIFIED Security Classification 14. LINK A LINK B LINK C KEY WORDS _ROLE I WT ROLE WT ROLE WT 1. probabilistic 2. sequential machine 3. automata theory 4. communications channel 5. stochastic machines 6. discrete stochastic processes 7. markov processes with rewards 8. congruence relations 9. probabilistic automata INSTRUCTIONS 1. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statements of the contractor, subcontractor, grantee, Department of De- such as: fense activity or other organization (corporate author) issuing (1) "Qualified requesters may obtain copies of this the report. report from DDC." 2a. REPORT SECURITY CLASSIFICATION: Enter the over- (2) "Foreign announcement and dissemination of this all security classification of the report. Indicate whether "Restricted Data" is included Marking is to be in accord-report by DDC is not authorized " ance with appropriate security regulations. (3) "U. S. Government agencies may obtain copies of 2b. GROUP: Automatic dowrigrading is specified in DoD Di-this report directly from DDC. Other qualified DDC 2b. GROUP: Automatic downgrading is specified in DoD Di- users shall request through rective 5200. 10 and Armed Forces Industrial Manual. Enter the group number. Also, when applicable, show that optional.. markings have been used for Group 3 and Group 4 as author- (4) "U. S. military agencies may obtain copies of this ized. report directly from DDC Other qualified users 3. REPORT TITLE: Enter the complete report title in all shall request through capital letters. Titles in all cases should be unclassified.,, If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis (5) "All distribution of this report is controlled. Qualimmediately following the title. ified DDC users shall request through 4. DESCRIPTIVE NOTES: If appropriate, enter the type of. report, e.g., interim, progress, summary, annual, or final. If the report has been furnished to the Office of Technical Give the inclusive dates when a specific reporting period is Services, Department of Commerce, for sale to the public, indicovered. cate this fact and enter the price, if known. 5. AUTHOR(S): Enter the name(s) of author(s) as shown on 11. SUPPLEMENTARY NOTES: Use for additional explanaor in the report. Entet last name, first name, middle initial. tory notes. If mAlitary, show rank and branch of service. The name of the principal author is an absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the name of the departmental project office or laboratory sponsoring (par6. REPORT DATE. Enter the date of the report as day, ing for) the research and development. Include address. month, year, or month, year. If more than one date appears on the report, use date of publication. 13. ABSTRACT: Enter an abstract giving a brief and factual summary of the document indicative of the report, even though 7a. TOTAL NUMBER OF PAGES: The total page count it may also appear elsewhere in the body of the technical reshould follow normal pagination procedures, i.e., enter the port. If additional space is required, a continuation sheet shall number of pages containing information, be attached. 7b. NUMBER OF REFERENCES: Enter the total number of It is highly desirable that the abstract of classified reports references cited in the report. be unclassified. Each paragraph of the abstract shall end with 8a. CONTRACT OR GRANT NUMBER: If appropriate, enter an indication of the military security classification of the inthe applicable number of the contract or grant under which formation in the paragraph, represented as (TS). (s), (C), or (U) the report was written, There is no limitation on the length of the abstract. How8b, 8c, & 8d. PROJECT NUMBER: Enter the appropriate ever, the suggested length is from 150 to 225 words. military department identification, such as project number, 14. KEY WORDS: Key words are technically meaningful terms subproject number, system numbers, task number, etc. or short phrases that characterize a report and may be used as 9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi- index entries for cataloging the report. Key words must be cial report number by which the document will be identified selected so that no security classification is required. Identiand controlled by the originating activity, This number must fiers, such as equipment model designation, trade name, military be unique to this report. project code name, geographic location, may be used as key 9b. OTHER REPORT NUMBER(S): If the report has been words but will be followed by an indication of technical conassigned any other repcrt numbers (either by the originator text. The assignment of links, rules, and weights is optional. or by the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATION NOTICES: Enter any limitations on further dissemination of the report. other than those UNCLASSIF IED Security Classification

lllll lll lll ll 3 9015 03695 26