Requests for additional copies by agencie's of the Department of Defense, their contractors, and other Government agencies should be directed to the:, Defense Documentation Center (DDC) Cameron Station Alexandria, Virginia 22314 Department of Defense contrxactors must be established for DDC services or, have -their' "need-to-know" certified by the cognizant military agency of their project or contract. All other persons- and organizations should apply'to the: Clearinghouse for Federal Scientific and Technical' Information (CFSTI). Sills Building 5285, Port Royal Road Springfield, Virginia 22151

T HE UN IV ER SIT Y OF MI C H IGAN COLLEGE OF LITERATURE, SCIENCE, AND THE ARTS Communication Sciences Program Technical Report EQUIVALENCES BETWEEN PROBABILISTIC SEQUE.NTIAL MACHINES Carl V. Page ORA Projects 031O05 073639 and 07537 Supported byo DEPARTEENT OF THE NAVY OFFICE OF NAVAL RESEARCH COINTIRACT NO. Nonr-1.224(21) WASHIN2 —TON, DC. and U. S. ARY1iT RESEARCH OFF'ICE (DURHAM) GRANT NO. DA-ARO(D)-3)1-124KcG663 D'iJRHAI, NORTH CAROLINA and NATIONAL INSTITWTES OF HEALTH PUBLIC HEALTH SERVICE GRANT NO, GM-12236-02 BETHESDA, MARYLAND admin:i. stered t h.rou.gh: OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR Novrember 1965

This report was also a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The University of Michigan, 1965.

ERRATA Pa ine Correctioans (B-from bottom T- from top) 23 Figure 141 at state'.l 5 should be deleted and replaced. t h i. 26 Figure 1,2 Arrowheads left out (3. >(5 and (x) ) — IA (x) )37 Equation (2) i (x) =(x) 39 1 T ^o all symbols a and o~ 41 Figure 241 Arrowheads left out 1 -- 2 and 79 Figure 301 A: arrowhead 1 -- 145 4 T b 168 8 B Period after also,

ACf (,r) wV D I);..... 1 Wou3id t1i k. to exprs.Ys r. y, r.atitude and apprec i;l.ton to Professo s John, iolland, ArTilh Naylor, Betzalel Peleg, ugenie Lawler, r'id9 i, > ircha$rd Kg -hr V o fo'. tSCr': on:Ty,,',xi,ra com j.i st t. i'[ would like to fthank S-'tec F.Ir tetnie i, and Dr,., RPichard Arnold fo" otheir many important questions during the format ive stage of this worke Mlany aspects of this work would not have been developed without the interdisciplinary nature of the Communifcation Sciences Pro~ramt I would like to express my gratitude to the many members orthe University of Michigan faculty who ha've been responsible for the existence of such a programn lMy wife Gloria deserves special thanks forl h1er loyalty and understanding without which this work might not have been possible, Finally, I would like to acknowledge the aid 0rf the members of the Logic of Computers Group,, especially Kuniko Misawa, who typed the final manuscript and several earier drafts, and the financial support of the Office of Naval Resca'rch (NonrU-1224(221), the U,.S, Azry Research Office, Durham (DA-ARO(D3)).-3-124-G665) and the National Institutes of Health (GM1 12236-02). ii

RESEARCH PROGRESS REPORT Title: "Equivalences Between Probabilistic Sequential Machines," by C.V. Page, The University of Michigan Technical Report Background: The Logic of Computers Group of the Communication Sciences Program of The University of Michigan is investigating the application of logic and mathematics to the theory of the design of computing automata. Condensed Report Contents: This report introduces new definitions of behavioral equivalence and shows relationships among the various notions of behavioral equivalence between probabilistic machines. Four basic problems are discussed for several different behavioral equivalences between machines. In what follows we use the symbol "=" as a variable to denote one of the several behavioral equivalences considered in this report. Given an arbitrary probabilistic sequential machine A: (1) Is there an input-state calculable machine A' (a machine with deterministic switching and random outputs) such that A = A'? (2) What are the machines A' with minimal number of states such that A _ A'? (3) How does one obtain all stable modifications of A, i.e., all machines A' such that the switching probabilities of A' and A differ but A - A'? (4) Is there a finite bound on the length of experiments required to establish whether A - A' holds? The four basic problems are solved completely for some equivalences between machines and are left open for other equivalences. Some applications are made to optimal control problems. For Further Information: The complete report is available in the major Navy technical libraries and can be obtained from the Defense Documentation Center. A few copies are available for distribution by the author.

TAT~;r, f!?,.:T'XNTS Page CHAPTER O INTRODUCTIO(N o 6,,, i I,,,, 4 0,o The (Concept of Probabilistic Seqwtcnitial Machine. 1 O~2 Notational Conventions G.. i i, 3 043,Models of Probabilistic Seslential l Machines *, S CHAPTER 14 D)ETERMINING WHIETHER A PROBABILISTIC SEQUENTIAL MACHINE IS EXPECTATION EQIJVALENT TO A FINITE DETERMF.INISTIC A ACH I NE i * * 4 1l. Tlhe (Concept of Expectation Ecluivalence,,,,, 9 1,12 The Reduction Relation RE 4.,,..,. 10 1,3 Construction of the Quotient Machine,,.,. i. 11 1 4 Construction of an Expectation Equivalent Finite Deterministic iMachine,,.,.,,,, 6.. i 13 145 The Partition of the Set o'f Accessible State Distributions Induced by RE,e 4 * * 6,, 6.., 15 1.:6 Necessary and Sufficient Conditions that Strings Be in the Same RE Class,,,,.., *,, 17 1,7 Invariant Subspace Characterization of RE.,,. 6 21 1,8 Necessa-y and Sufficient Conditions that R be Nontrivial 4 4 4.1 6 * 4 6 4 4 4 21 CHAPTER 2, I)ETERMINING WHlIETHER A PROBABILISTIC SEQUENTIAI, MACIIINE IS N-MIOMENT EQUIVALENT TO AN INPUT-STATE CALCULABLE MACHINE 4,,, e 4i 4 i 6 4. i 4 I6 i 27 -0 Introduct ion a.4 * 6 4, 4., i * *, O 27 2,l I)istrihbution Equivalence: _!), 6,, i,. 6,, 0 * 27 2.,2 Moments of the Output Random Variable, *, 4 i 28 2 3' Sqpeci.al Properties of Rabin Probabilistic Automata,,, 29 2,4 The Concept of N-moment Lquivalence: -N 30 725'rhe Relationship between D nd N 31 2 46 lThe N m-oment Reduction Rel tiDn: 6 4, 4.4. 6, 6. 34 2.7 Inp;utr2state Calculable achlines 0, * 0 4 4 34 2i,8 Cthoiv:terization of Inptit-state Calculable Machines,,, 35 2,9 A,Nc:i::.a:i; and Suffici ent C.ondition for the N-moment. R.ediuc tion a Relaftion to Hold.' 4, 4 4, *, 4, * 0 36 CIHAPTEPR 3, THIE.'";JI"tI.) r()I NDISTINC.GlUISI.IABILIT'FY AS A CRITERION OF BEIIA\VI4(RAL-, I`t/JI\,:VALENCE * * *. * 4 4 4 4-2 3,1 E xTample of Two Distribution Equivalent blachines Wh. ich Are Not. Iutn erThran...a.b4 le as. Componern.t~ of a Machine 4, 6, 6 2 34 2 A.!ire Satisf::."tory Technic;:lI Notion of IndistinguishabilN. t; o 4 c. 4 e 0, 4 4 o 4 o o 4 *,. 46 iv

i'rABLEt OF (C)ONTENTS (Cent inuod) 0)3 The Rc:::A:tionship between thie In*tuitive wand Techn;ical Coneepts of lndistinguishability,... - < t,:: 0 CHAPT R1 4,. FINYITE COMPLETi SETS OF iNVARIANTS FOR TIE B E -HAVIORAL I:JQUIVALENCES E:'-N AND ANI) TIlE RED)UCTION CONGRUENCE RE I,ATiONS 1E AND RN - e e. 53 4,'The Ftmdadrentai Lemma,.. - 53 4,2 A lEp:nd'.. Tresting;or M Iclbership in I5 4,3 Equivalence of Distributions in One Mlachine,.. *. 56 4,4 Bounds for Testing for Mlembershin in' and Pf,. 58 4.5 Bounds fo Testiinr. for blembership in:N ant( RN, t r C 60 4,6 Discussion of the Generalization oaf the Moore Bound 4 ( 4 CHAPTER 5, ON A FINITE COMIPIETE SET OF INVARIANTS FOR TAPE EQU IALENC. =E, 6 5, 0 Intreoduction', 66 PART le ISOLATED CUTPOINT MIACHINIE'S o 66 5.1 The Rabin Reduction Theore *, 0.. C, e. 67 5S.2 A Finite Set of Invariants for for Probabilistic Sequential Machines ilaving Isolated Cutpoints,,.,. 72 PART II: A CLASS OF MACHINES CONTAINING ISOLATED CIJTPOINT IMACIlINESc 74 5, 3 The Stability Congruence Relation 6 6 74 5A4 A Sufficient Condition for a Finite Set of Invariants for C. 74 5. 5 Simultaneous Experiments t 6, f., e r, C......... 75 5,6 The Minority Property.. (C,, 76 5,7 The Balance Property; 6. 77 568 The Concept of Stability near the Cutpoint,,,: 81 5,9 A Finite Test for = for Machines Stable near the Cutpoint and Which Have the Balance Property ~,,, 84 5. 10 Conclusions i,, C. C t. r - 87 CHAP'TER 6, STABILITY PROBLE'.MIS FOR AND 89 -' N -8 6,1 Enul.meerat'ion o the Behavioral E.quivalence Class of a chc ine c t 6 6 89 6,2 The General Stability Problem.., (,,, 90 6, 3 Enumerat tion of Stable Machines for -N and - 93 604 Invariant Error s.. atrices -... 97 6S Stabil.ity Transformations Which Preserve the Behavior of Ei. enss t ft es, ~,,:.,. e,,. t o:, t. 6 102 c',;~~~~~

TABLIE'.OF CONTENTS (Concluded) Page 107'....TER 7 ON TtI- STABIIITY PIOBLE.'1::()R TAPE ErQUIVALENCE =T 107 7o1 Introduction 1,,, 7 f,,,. 6, e,,,, 3, 1'07 7,2 Farkas? Lemma and Tape Acceptance for Nonsingular Mat.i-; hin es.... c,..... 6,.. p,.; h A,,.. 1.07 7:. 3 A Necessary and Sufficient Condition for X-stability for Nonsingular Mlachines eb I 0, e, A.,,; 110 744 A Special Class of Stability Transformations for Nonsingular MFachines, o r,:,. 4, ~ 111 7..5 Polar Sets of Vectors t, t e 117 7.6 Tape Acceptance hExpressed in Polar Set Notation,, 118 7.?7 Stability for Certain Machines Which Have Singular Switching latrices 6,,.,,. o, 120 CHAPTER 86 M1INIMAL PROBABILISTIC SEQUENTIAL MACHtINES,, 121 8,1 State Equivalence of Mlachines *, H o, *,. 5 121 8,42 Equivalent States of a Machine.o.. e,, 121 8~3 State and Distribution Reduction of a MHachine 122 864 Reduction in Sequence 6 6 6b, t, 16 6, 6 6 6, 123 8,5 tlinimal Hachines for E N and 128 E' N 61 8 6 Mlinimal Mlachines for -T o d 6 0 el 6 60 130 8,7 The Critical Set of Tapes 6 o A 6 6 6 6 o 6 6 6 A, 132 868 Geometric Interpretation of S+ 6 t 132 8,49 The Critical Distributions of S+ 6 A t 6 6 133 8o10 Geometric Interpretation of State Mierging 6 A 6 e 134 8, 11 Comparison of State Reduction for I, -N and E with T 136 CHAPTER 9o APPL.,ICATIONS TO OPTIMAL CONTROL PROBLEMS 0 e t 138 9O0 Introduction o 6 6 6 6 6 6 o 6 6o, 138 9,1 Input Events..,,,. 6 o, c 6 138 9,62 Input Event Equivalence: E f 6 6 6 A e 139 9,3 Input=Output Events 6 6, 6,, Q A, 6 6 0,, 6 141 9,4 Input-COutput Event Equivalence: IO =' e.. " -, e, 141 9,5 Optimal Control Problems o e,,,, 6 A. r 142 966 Equivalence of Policies,,,,, A,., 147 9.7 Characterization of Equivalent Proper Policies *,,, 154 9t8 Experimentation on the Expanded Machine 6 6 I, el " e, 156 9,9 Optimal Control Probl.ems with Optimal Policies of Sequences of Inputs 6, o 6,, t,6 t, c O, 157 CHAPTER 106. SUItIMARY AND OPEN PROBLEMS 6 6 0 o 6 6 o 6 6 6 6 6 162 10,1 Summary I I c. 6 6 6 6 6, 6 6 6 6 6, 6 b,6 6 162 10,2 Open Problems t e 6 6 6 h,. a,,,,, 6 6 6 164 APPENDIX c t. c, e;. 6 4 6 6 6 6 C 6 6 6 6 6 167 BIBLIOGRAPHY 6.:e.,, o, o o o 6 6 6 6 6 6 ek 6.,6 169 vi

LIST 0)1 TABLES Table PE 3,1 Comparison of Machines A1 and A2 43 3,2 Expectation of A, *..nd A, -. C: fo:.: Strings x of Lngth 2 nh, d 44 5,l A Sequence of Experiments }:ich Shows the Balance Propert;' t Holds for A and A'l A. 80 6,11 Characterization of the Error Mlatrices for -N and "'A A A F FA A F A F A 95 vii

LIST OF ILLUSTRATIONS FiQure Page 141 State Diagram for the Machine of Example 1,2,,. 23 1,2 Al = Z*/RE for Example 1,41 <..,,, i I.,,, 26 201 Input=state Calculable Machine A' which has the Same Expectation and Variance for All Strings as Probabilistic Machine A of Example 1,2 e,,A,?. t r 41 5,1 Transition Diagrams for Machines A and A' which IHave the Balance Property., 6 t e,. o,, 0 t, 79 5,2 Machines of Example 5.2 A, d e, r e e r,, s * 82 7,.1 K(A(x)'tF) of Example 7,1 is the Positive Orthant with the Rectangle Removed... r, A,. 110 7,2 Polar Set Example 6 o I r. 118 8.1 A Coordinate of a Distribution in the Tetrahedron Isomorphic to S+ for n=4..,. o,,, A, 133 8, 2 I)A; X) is the Area Enclosed by the [)otted Iines, iA e the Triangle (l) r (2)-r (3),,, A,, 134 803 The Set of Di)istributions when S2 is Identified with S i, e S (S2)" 0 e r,, r.., < a:, 134 44(S2)A 134 9,1 Relationships among the Machine Equivalences e., e 142 9,2 lThe Constructed Finite Deterministic Machine with Vector Outputs c e A, G C... e r t 6 6, t f. 158 viii

LIST Of: TSYMIBO'L),S AND TERIS x, ^,, La B' Probabilistic sequential machines. Ati) Thie ith machine in a sequence of machines, A -B denotes the machine formed by plugging the output of A into the input of B, Def, 34,1 43. (o- (Sl< -S2)4- #. Sn) state reduction in sequence. Def, 8.4, 124, A O A': the abstract join of machines A and A', Def, 4.4, 58, A behavioral equivalence variable denoting a set of behavioral equivalences fixed in context, Subscripts denote particular behavioral equivalences, _s() State equivalence of machines variable, Def. 8.1, 121, — stable perturba- Def, 6.1, 900 t.on Expectation equivalence. Def, 1 2, 9. N-moment equivalence, Def, 2,5, 30. Distribution equivalenceo Def. 2,2, 28, Tape acceptance equivalence, Def, 3.3, 43, Indistinguishability. Def. 3.9, 49, Where r is an integer or integer valued variable: - with N = r. XIE X-input event equivalence. Def, 9,2, 139. ix

LIST OF SYIBOLS AND TERMS (Continued) Input event equivalence. Def, 9.3, 139, X-input-output equivalence. Def, 9.5, 142.:I0 Input-output equivalence. Def, 9.6, 142,.o S' An equivalence of distributions (or states) (A variable equivalence associated with the behavioral equivalence variable E,) Def. 8.2, 122. A" I Equivalence of distributions (or states) of machine A associated with -E' Def, 4.2, 56. ok ~A Equivalence of distributions (or states) associated with AI, eff 8,2, 122d A- Equivalence of distributions (or states) associated with -,, Def, 8,2, 122, A'A —T.-.....Equivalence of distributions (or states) forciated Def, 8,6, 1302 Yin'1' Equivalence of distributions (or states) for all strings of length K or less for - Def, 3, 56, A(a) Symbol matrix for machine A and input a, Defo 0i1, 2, A(yi/a) Conditional probability transition matrix for irnput symbol a and output symbol yi. Def 3d,7 47. A[n(Si)] Reduced machine in which state Si is replaced by distribution L, Defo 8.3, 123, e1aning depends on context, i.e. denotes a tuple of things as in machine definition: (1) A = lr, I, S, A(0), oo., A(k-l), I, OA> Def, 0.1, 1, (2) The machine A started in initial distribution I: (a variant of (1)): <I, A> Def, 6,6, 103,

LIST OF SYMBOLS AND TERMS (Continued) (3) <V> where V is a set of vectors means the span of V. Def. 1.11, 17, (4) <A(Z*),*> means the monoid of matrices A(x): x c X* with matrix multiplication, 90. Where' is a finite set; the number of elements of 3. Backward deter- Def, 7.2, 107. minism Balance property Def, 5.8, 77. C(A,X) Set of critical tapes for machine A with cutpoint X. Def, 8,7, 132. Cij (k) Cost of transition from state S. to S. caused by input symbol k, 144. Complete set of Def, 4.1, 52. invariants for an equivalence Canonical distri- Def, 5.10, 85, bution pair D(A,X) Set of critical distributions of machin. A with cutpoint,. Def. 8.8, 133. dim(Lj) Dimension of space L.. 6 Separation of isolated cutpoint, Def, 5.1, 66 or parameter in stability near the cutpoint, Def. 5.9, 81, E Expectation in the sense of probability theory, 29. EA(x) Expectation of output of input sequence x of machine A. Def, 1,1, 9. xi

LIST OF' SYM!B3i.JS AND TERS (C.ontinued) E1lA'(:t) tn, dimcnsional identity matrix. rror matrix of input string x, 91. th ei The i' stochastic vector in n-dimensional space in an indexed set of such stochastic vectors, t.A r. Outnut vector of machine A. Def, 0,.1 2.!(k) Column vector such that the ith entry is just the ith1 entry of F raised to the kth power,!Def. 2,3, 28. 0 -The emrpt,, set, h (i), ~ h#(i) Index ner-irn function I)Def. 8.5, 123^ Errcr matrix of input string x (sometimes written E ), 91l H{ An invariant error matrix for symbol ao Def. 6.2, An eigenerror matrix for symbol a. De f~ 6f3, 98., i V t _...t Initial nrobability distribution over the states A nof mac nce AC I Probability of being in state S. initially f.or machine A..i 1ot)(it\X)~ ~enrztInput-ou event for output seouence y?c machine A with cutpoint X, Def, 9,4, 141. Isolated cutnoint Def, 5,1, 66. K(A,P) Where A is an n x n matrix and F an n dimensional column vector is the set of all column vectors f

LIST OF SYMBOLS AND TERP1MS (Continued) suc> that A > F, 109, Kern.(Tr) Where T is a linear transformation is the set of all vecEors a such that a.Tr O. K*(A,F) The r.olar set of K(A,I-), Def, 7,3, 117, kA The value of the Rabin-Paz bound for machine A, 86. Symbol with identity symbol matrix, ie, A(A) E(n), Input semigroup identity: Ax x x x e Z * 4. 9g. (&x) The length of input sequence x, 4..4 (y) Tihe length of i:-pu'mt (or output) sequence y, If real numb::r, cutpoint. Remark 3, 3. (In Chapters 3 and 4 sometimes used as an initial distribution). M1(S,o) Transition function for a deterministic machine M, Mlinority rronerty Def, 5,7, 77, of pairs of machines Monoid A semigroup with identity. n, nA The number of states of machine A. n The independence number of machine A (with output vector F). Def, 45, 60(), nD Number of states of deterministic machine D. Nonsingular machine pg, 108, xiii

LIST OF SYMBOL3SI AND) TERMfS (Continued) 0~ 0A Output function for machine A, Defo 0,l, 2, *(x Tjhe output random variable for input string x in machine A = 0 (rpA(x))o Defo 2o1) 27, OR(S) Random output attached to machine A depending on tA(S) state SO Ai(x) ith central moment of output for input string x of machine A, Defe 2o3, 29c P( ), Probe( ) The probability of the event in the parenthesis. PA(y/x) The probability that output sequence y will be observed given input sequence x in machine A started from initial distribution o DI)ef, 305, 46 and Remark 3279 48d )p P~Permutation of the integers' Depending on contex either (1) A permutation of the integers, (2) An initial distribution of machine A, (3) A policy (Chapter 9)> pgo 145(13), Permutation matrix depending on string x. Theorem 7c3, 11ll p Partition on the set of input sequences j e T(Si) State,S is replaced by distribution r, 1)efe 8,3, 123..*(y/x) Terminal distribution for input sequence x and output sequence y0 Def o 3o8, 41, T(1), r(2), 9 (3) Ex.treme points of Example 8ol, 134o r(1)), wr(3) Projections of n(1), and r(3) 0 Fig. 8.2, 134. xiv

LIST OF SYMBOLS AND TERMS (Continued) q(y/ux,v) The probability of observing output string y from machine A given that input string u produced string v as output and that x is the next input sequence, 141. qS (x,t) Probability that input string x occurs at time t 1 because of policy ir for a machine (specified in context) with initial state Si, 158. R Depending on context either (1) The set of real numbers. (2) An arbitrary relation (usually a right congruence relation) on input strings. R[Xl] The equivalence class of xl in equivalence relation R. Defo 1,4, 11* (Note that A[ rS1#] means the equivalence class of S1# in equivalence relation E RE Reduction relation for -E' Def. 1.3, 11. RN Reduction relation for =N' Def. 2.5, 34. r, R, JR' Negation of the relation R. RR Where S is a set of input strings: the right congruence relation induced b5. iCe. X1RXa?Xl Z E x2Z E ] Vz e *; x1x2 C Z * RS(A,A') The stability congruence relation (right congruence relation induced by S(A,A'). Def, 5,4, 74, rPA(x) Response of machine A to input x; i.e. the state of the machine after input x, Def. 1.6, 13, Relation R refines Def, 15, 12, a set S S(A,A') Stable set of input tapes for machines A and A'o Def, 5,3, 74~ XV

LiST OF SY'A IBOLSAS AND!T TERIMS (Continued) S, SA Set of state vectors of machine Ao Si State S. = (0, 1,o 0, 1, 0, o, 0) S+ Convex closure of the state set, i~,c all probabilIltv distriut ions for art n state machine, )ef 108, 15,O Z: Input al.phabet set of machine A, 20 ra;An arbitrary input symbol,;* oSet of all finite input sequences, Zk (Zk Set of all input sequences of length k, Stability near DIef 599 81e the cutpoint T(A3X) The set of inPUt ta accepted )bv robabilistic seauential machine A with cutpoint X, Remark 3, 3, T (A;X) y-input event, Deft 9.1, 139. T(D) Set of tapes accepted by deterministic machine D, i.e, those tapes which lead to designated final states T(A) = A" Stability transformation of machine A giving macihine Al, 90 (Also written T A —> Al) T (R)!h'hierc R is a righllt congruence relation. the quotient machine % /R, 11l (1), j. j), ) Experiment on the set of machines A (1) A(r) using the input set TO Def0 5Q5, 76 xvi

LIST OF SYFMBOI.,S AND TERMS (Concluded) Tape indistinguisha- (for a class of machines) Def 3,4, 44, bility u The n dimensional column vector of ones,, or a subspace depending on context, V(A) {IA(x): x C* }, the set of distributions accessible by A, 15, v An cigenstate for symbol a, Def0 6e5, 103. W(S4 (S2)) lWedge shaped volume in S+ affected by mapping S2 onto S4. 135, x Arbitrary input string (x E E*), xi () Excted cost of policy l from initial state Si, 1~~!~145 Xi (Ipt) Expected cost at time t of policy T started from state Si, n Y { f i{Pi} Set of out codes for a machine A. Remark 2, 30 i=l 1m1 Y* Set of all finite sequences of symbols from YO (y)r Set of all sequences of Y of length ro Z(A) Set of singular switching matrices of Ao xvii

CHAPTER 0 INTRODUCTION The notion of behavioral equivalence is a fundamental part of the study of automata theory, Two superficially different definitions of behavioral equivalence occur in the literature for deterministic machineso Ones discussed by Burks [1961]1 which we will write EI, calls two machines behaviorally equivalent if they define the same function from input strings to output strings0 The other, part of Rabin-Scott [1959] automata theory, which we will write =T' calls two machines behaviorally equivalent if they accept the same set of tapes, For any deterministic machines D and DI with the same input alphabets and binary outputs, D E DI holds if and only if D =T D holds. However, for the generalizations of =I (Carlyle [1961]) and =T (Rabin [1964]) to probabilistic machines A and Al, we observe that A -T does not imply A - A9 This thesis is concerned with properties of several behavioral equivalences between probabilistic machines, In order to gain insight into the kinds of equivalences which will be studied, two models of probabilistic sequential machines will be presented later in this chapter0 0,1 THE CONCEPT OF PROBABILISTIC SEQUENTIAL MACHINE By a probabilistic sequential machine is meant a system which satisfies one of the following definitions: Definition 0ol~ A (Moore-type) probabilistic sequential machine A is a system A e= n, I, So, Z A(A) a a E, Fs 0> where n: a natural numbers the number of states

I' an n-d imensional stochastic vector, the initi.al distribution S, set of state vectors = {S1 = (1, 0,.,., 0),,.,) S - (0,,, 0, 1)) asyla1ihabet set,'Jsuall = { 0, 1, 2, k i) A(c): n x n switchirno matrix for irnrut smbol:>l c A(r, ) is the nrobabilitv of a transition rrom state P to state rm via; smrbol 7. We freque7n~4tly call' \:(a) a "syrmbol matrix",.:: out-Tiut vector, a r-dlimensional] colrmn vector whose entries are real lIumlbers.. is the outnut u rcm state S.. 0): sutut f4r;Ycti on ((Si) S. When cl5ar from context, some of t}ho Farts. of th-e fcrmal definirtion may be omitted for the definition of a particular machine~')e finit on t A (M.'ealv-type) procbabljilistic sequential mach ine, A = (l, S Z,'\(C ) c c,, s T where nr, I, S, E, A(a): Fs E. are as in Definition 0,1 and where the cu.ttt f:tncet orn. sv fie;:'(. S ) S, 7. c r ( i j 1 );., i i It is an easy matter to sl1hoi tho.tl Dfefiriitor (f, and I)efinition 0,2 are equivalent in the fol bo;.'ing sensce:? or every "'ooret-type prob'abilistic sequCOential m achifne- t jvre is u'eaitv-pe secuentiA ml machine wlhii ch is indisti il.is'1ahle) i.rn an intuitive.enlse and vice-vets-a. Consequently. w0e will e Coneorncd o0;i-. w ith the rn rtics of "oorc-ty.'e prob)ailistic s e.,nt acii. nc, i which fromr::. on 1.ill be cal led "vlnroba'bili.stic seuentrial i nIac.iri;I'T or just "mac)'i-.cs, There seem to be nanon:n, sstems s.iA1:1c "robabilistic secuential machines in fieltds of study rnol h}.st oricall1 a sc.izitcd wvith autonata theory. Brains and Svechtinshy discuss a syst:n li;e I)efiniticn 0,1 in their paper "M'Iatrix Structure in Smulati: of l.arning" [1962j If one takes tlhle cartesian product of nach!ines of')e:rinition 0,2, onc gets.I..ar:ov processes with rey;ards and alternatives as studied in sequential decisio.n

theory as presented by Howard [1960]1 Matrix games as discussed by Thrall [1957] can be considered as instances of Definition 04l in which I and P are strategy vectors and game matrix A(x) is defined by a string x, A simple correspondence shows that t.he noisy discrete channel of Shannon [1948] is equivalent to the system of Definition 0,20 Someday probabilistic sequential machines may become a unifying concept, organizing and providing results for diverse fieldso Probabilistic sequential machines were devised as slight generalizations of the notion of Rabin [1964] of probabilistic automata, If one restricts I to elements of S and Fi = 0 or 1 for i - 1, 2, *oo, n then Definition 0,1 defines probabilistic automata, Following Rabin, we describe how the transition matrices for sequences of inputs are generated by the symbol matrices, Remark 1. Let x = il i ij, j 19 0oo ro Then A(x) = A(il) ooA(i ) ioe,, the switching matrix for a string x is found by multiplying the matrices for the symbols of x together in order, Remark 20 Sometimes the real numbers which are the outputs of a probabile istic sequential machine will be regarded as codes for symbolic outputs, Remark 3; The exnectation of o for input string x of machine A is just EA(x) = IA(x)F, For any real number XA, the set of taes accted L.A:with cutont X T(A,) - {x C'*k F (x) > X} 0o2 NOTATIONAL CONVENTIONS In what followvi it -,i,,ii:- c c',:s'-,,!*:. ent to use certain notational conventions WVith regard to sibscripts, note. t+hat s;tate Si is identified 1

with the vector 0, o, 19 nS ooo 0), We use A(x)i to mean the i-th row of the matrix A(x), U, tally A(x) will mean the i-th power of the matrix A(x>) On rare. casions when clear in contexts A(x) may mean the i-th column of A(x).'The following notational identities are used frequent lv S A -A KF A(x)i F = (A(x) F)i When necessary9, identifying superscripts will be added to basic symbols9 eg, F is the output vector of machine Bo If r is an alphabet9 all sequences of length r are denoted by rr Following the conventions of automata theory, given an alpahbet r, Fr denotes the set of all finite sequences (or strings or tapes as they are often called) of symbols from r% It will always be assumed that r* contains an identity string Ar so that Arx = x x C r* Furthermore, we will assume that the input alphabet set ~ contains a symbol A such that A(A) = E(n) o the n-dimensional identity matrix, We do not require that Ax x although A(Ax) = A(x), The length of a string x is the length of the sequence of symbols which it denotes and will be written Ig, (x) Contrary to normal conventions we regard A as having length 1 1g. (A) = 1 whereas in general gAr) = O0 Concatenation between strings will be indicated by juxtaposition, Multiplication of matrices will be indicated by either juxtaposition or other customary notation. Hence if x and y are strings lg,,(xy) lgo(x) + lg.(y) The exponential notation on strings will be used to indicate repetition n t ime. itoe x = x so that lgc(xn) = n1lgO(x)o An abstract machine

(a machinmc wit har nto nial. Siatpt seiied)t ti.ll. ie indirated by leavinr a blank in the de'inition of the machine eo~?.,-<, A(%.) 2... 0> 0,3 MIODEL[S 0)1;: PR()BABII.iSTIC SEQUE}JNTIAL MAFI H I INE.1 LT~.o models will be considered, one of which is probabilistic iA-dl one of which is deterministic, although both fall within the axiomatic -framework of probabilistic seouential mach. i nl Exalmdlc i ). Probabilistic internal operation~ A slot-machi ne, A simple model of a probabilistic secuential machinTe is a s lotmachine.. The static nosition of the dials represents the presernt state of the machine, Usually there are 20 different positions oan r.h dial and 3 diaIs for a total of 8 000 states, The input consists of putting in. coin and pulling a lever, causing the machine to travel transiently through many states until it settles down in one state~ An output is associated with each stateo Nothing (which is associated with 0) comes out unless the dials all display the same object. In that casceh somnic change tumbles ouwt (which is associated with the corresponding real number) usually dependent only on the kind:.:,f object bleing displayed, i ei,! the state of the machineo Such a machine whose outiut is controlled by its statres'is kno'[n as a "Mloore machine" [1956]o Each state can be associated with a number between I and 8 000, and the output for each s:t~atec can be tatbulatecd in a column vector or 8,000 x 1 matrix~ IIn the formalisml this collumn vector will be called the "output vector" aznd designated by tihesymbol "F", The output for state i will be written as 1, Thle enormous number of distinct ways the lever can be pulled are pre

vented rramon signilfL:antly inflrencing the outcome by spring loading. lHence a normal pull of the lever L produces only one kind of state transit tion law which. could in principle be determined and tabulated in a "switching" matrix A(LW, The behavior of a slot machine A could be described using a finite state markov chain with rewards and transition matrix A(L) but for the fat. -that various nonstandard but repeatable inputs have been developed by players of such machineso A more complete description requires some finite number of additional transition laws associated with the nonstandlard inputs to the machine, We associate such inputs with additional input symbt)ols, Consider howa the dials of the machine might be found initially, If the dials can be completely observed, the initial state S. is represented by a vector I (or a I x 8,000 matrix) with a 1 in the i'th component and zeroes elsewhere, On the other hand, the dials may not be completely visible, we may wish to specify the average behavior of a large number of machines run simultaneously, or we may wish to consider the average return from playing one machine only when it is left by other players in one of a set of preferred states, In any one of these cases, I can be a stochastic vector (I1~,,., I8 000) where I. is the probability of being in state S. at time tol In the general case, the next state probabilities starting with an initial state vector I and an input string x are given by I*A(x). Hence the expected value of output of a machine A starting with initial state distribution I and output vector F after a string x of inputs has occurred is just In(xX ) IoA(x) F which is a b)ilinlear form in I and F with form matrix A(x)o The variance in output and other higher moments can be defined analogously~

Example 0,2 Deterministic internal structure~ Chemical production cell. Suppose a chemical tank A is divided into several isolated Compartments A1, of An by partitions which are interconnected by an electronically controlled system of pumps and valves0 Suppose that there is a finite set of controls Z m {09 19 o o0 K=l} and that for each control c a fixed fraction of the chemical in compartment Ai, v.C i is pumped into compart= ment Ajo For all controls c in E, the full influence on redistribution j of liquid in the tank can be described in a n x n matrix A(c) with vC. being A(c)ijo Furthermore, suppose that the liquid being pumped between compartments is a catalyst which causes production of a desired end product in each compartment with a different efficiercy ie,,o if the mass fraction of catalyst in Ao is P, and Fo is the efficiency of Ai9 then the output of end product is POFo Note that it is assumed that the output of the compartment depends linearly on the catalyst present, The initial state I is an n comsponent vector with the i9thi component 1. being the mass fraction of catalyst in compartment i0 Note > I = 1 since the tank is a closed system as far as the catalyst is concerned, The distribution of mass fractions of catalyst over the compartments after a sequence of controls x = il' oim is just IoA(il)oo0ooA(im): ICA(x) That is, (IoA(X)i is the mass fraction of catalyst in compartment i after starting with initial d'stribution I of catalyst fractions over compart= ments and the string of control inputs x 1 ilo i The total end product from the tank is the sum of the outputs from each compartment: ~ (IoA (x))iFi which can be written IoA(x) oF in matrix i=l notationr This expression has the same form as the expectation of output

for the probabili:tic slot-machine, but there are no overt probabilities involved here,'Ti"ie mass fractions of catalyst play the same role as the probabilities in the first example, However, the output will still be written like an expectation as EA(X)W The total end product accumulated, Tx, for the string of controls x from time t0 to time t0 + m is given by adding the output from each substring i,,ee. T= a EA(i ) + EA(ili2) + u + LE(il2 m)

CHAPTER 1 DETERMINING WHETHER A PROBABILISTIC SEQUENTIAL MACHINE IS EXPECTATION EQUIVALENT TO A FINITE DETERMlINISTIC ASCHINE 1,1 T'HE CONCEPT OF EXPECTATION EQUIVALENCE In the two models discussed in the introduction, the expected value of output, EA(x), played an important role in the physical interpretations, Let us repeat the definition of the expected value of output. Definition 1.i, The exected value of output for an input string x of a probabilistic sequential machine A is given by EA(x) = I~A(x) F for x in Z* Definition 1o2, Machines A and Al are expectation equivalent, written A F EA4x) A- EA, (x) rfor a1 x in Z* Recall from Example 0,,2 that EA(x) was the actual output of the chemical. cell and not an expectation, Hlence the basic concept of expectation ec!.uiv.l]ence is analogous to the definition of beha:i..oral equivalence -: ~fo~ Example 0,2. However for Exlmiple 0 1, the slot.machine, expectation equivalence is not the genera lizat.,n of this kind of behavioral equivalence, Insteado the concept of indistinguishability discussed in Chapter 3 seems to be the appropriate generalizat ion E~xa?p1e 1..! Machines A and A9 which are expectation equivalents IA(x)F = IQA (x)F9 Vx c Z*

10 A <I. A,(0) A(1), F) and A (I;<I \(0)9 Al(1), F9> A(O) /1 0 0 A(1) - /3/5 1/5 1/5 /12 1/4 1/4)i 1/5 4/5 0!/4 0 3/4 0 /4/5 1/5 0 A (0) / 0 0 t A(1) = 7/10 0 3/10' t5/8 0 3/8 3/5 0 2/5 ) s0 1/2 1/2/ k9/l0 0 1/10 F F1 s 3/ These machines are expectation equivalent from any initial probability distribution9 I. over the states. The previous example shows that two machines can have very different switching matrices and still be expectation equivalent, Some graph theoretic properties of the transition matrices which are important to markov theory, such as the accessibility of a state, depend on the location of the zeroes, This example shows that the location of the zeroes is not the only relevant factor in the study of expectation equivalence, Consideration of the interplay between the state transitions and the real number outputs attached to states requires the use of elementary linear algebra, 1,2 THE REDUCTION RELATION RE In this section a congruence relation on input sequences9 RED will be defined so that a quotient machine can be constructed, If the rank of RE happens to be finite9 the constructed machine has a finite number of states, States of the quotient machine will correspond to values of expect tation which occur for input stringso By attaching a deterministic output device to each state of the constructed machine9 an expectation equivalent

deterministic machine is obtained. If the rank of RE is finite, some class of the relation must contain infinitely many strings. A necessary condition for RE to be finite in rank is that it be non-trivial, 1ie, at least two different strings are contained in some class, This weak necessary condition requires the symbol matrices to satisfy certain strong conditions, Definition 1,30 The reduction relation R is given by x RE Y iff EA(X) = EA(y) & EA(xz) = EA(Yz) Vz E * VI E S If E* contains AD a semigroup identity, the definition reduces to x RE y iff EA(xz) = EA(yz) V- C *, VI c S, RE is a right congruence relation on E* because of the reflexivity, transitivity and symmetry of S= and the substitution property in its definition. It follows that strings x and y which are in the same class of the relation RE will have equal expectations from any initial state of the machine and will continue to have equal expectations for any finite input continuation zo As far as expectation of output is concerned, the behavior of the machine A is the same after either string x or string y, 1,3 CONSTRUCTION OF THE QUOTIENT MACHINE Definition 14: The equivalence class of x' of R, an equivalence relation, is given by R[x9] = {x: x R x'} It is a well known result from Rabin and Scott [1959] that given a right congruence relation R on V*, one can construct a quotient automaton

12 with no output T(R) T(R) = <a, S, M> where a = R[A] S = {R[x]: x c s*} M is a function from S x ~ into S such that MCR[x] a) = R[xa] x e:*; a e z Definition lo5 Let BC ~*o A congruence R refines S if x R y -- x C,B iff y e S Theorem 141 Rabin and Scott [1959] Let S be a subset of E*0 B is the behavior of a finite (deterministic) automaton A = <T(R) 9C> over E where' = {R[x]; x E }1 iff there exists a right congruence relation R of finite rank which refines aB Theorem 1,2 If the congruence relation RE has finite rank, then for any X there is a finite deterministic automaton A' such that the tapes accepted by Al are T(AXa)o Proofo Let B = T(A,X) = {x o EA(x) > A}o Note that RE refines a ioeo x RE y = x E T(A9A) iff y E T(A,X), If RE has finite rank, by definition RE[x] has a finite number of members. Using Theorem 101 we construct T(RE) = <a, S9 M> and A9 = (a, S, Mj> which accepts T(Ak) Qo Eo Do

13 1,4 CONSTRUCTION OF AN EXPECTATION EQUIVALENT FINITE DETERMINISTIC MIACHINE The quotient machine construction will be used to obtain a sufficient condition for the reduct;ion of a probabilistic sequential machine into an expectation equivalent finite deterministic machine whose output function is either a constant C(s) for each state s or a random device R R A (s) with expectation E(OA(S)), C(s)6 A Definition 6. _ rpA(x) is the response of A to in str x. If A is deterministic rpA(x) is the state of A after an input of x, If A is probabilistic g rpA(x) is a random variable taking on values which are states with distribution IoA(x>) Theorem 1,3 The reduction relation RE defined by a probabilistic machine A has finite rank if and only if there exists a finite deterministic machine Al with a deterministic output OAr such that OA,(rPA, (x)) = EA(x) Vx c Z* Proof (sufficiency): By Theorem 11o let Al = <a, S, M~ 4 j where f is the empty set, Note any congruence R refines % vacuously0 We attach an output function 0A9 to elements of So o^, (s) = E A(x) s = RE[x] For a deterministic machine, M is extended to M* which operates on strings rather than symbols by M'(soa) = M(sca) s C S a E M* (s ax) M* * (M*(s, 9)o ) X c E* We note that M*(a,x) = rPA,(x) so we need to show only that rpAI(x) = RE[X]> Let x = i1li20 6m for i. E Z; j = 1, 2, o,60 md rPA9 (x) = M*(Alx) = H**( *( Ca,il) i2 o lim) =s fl"lll R_: <'" i >

= MI*(M(RE[A]i )iioi) = M*(RE[Ail] i2 doim) = M*(C(REtil],i2),i3 o i m) RE1 ii2 ooi2] = RE[x] Hence the constructed sequential machine is A' = <a, S, M, OAt> (necessity) Given A' (a, S MN OAl> such that OA (.rPA (x)) = EA(x) Vx E* OA9(rPA9(xz)) = EA(CxZ) Vz * Let rpA9(x) - Sx x E Z* Define S R0 S iff x RE y Let n' be the cardinality of S - finite~ rank R0 = rank RE rank Ro < n' Hence rank RE is finite. Q, Et Dh Cotollar 1,a The reduction relation RE defined by a probabilistic machine A has finite rank 4= there exists a finite deterministic machine A' such that A -E A Proof: The machine AO of Theorem 1,3 meets the condition of the corollary since EA (X) = OA(rP A (x))= EA() Vx E * Q6 Ee D0 Instead of the deterministic function OA~, a random device OA,(s) such

that E(OR {s)) = EA(x) could have been used in the constructions, 1,5 THE PARTITION OF THE SET OF ACCESSIBLE STATE DISTRIBUTIONS INDUCED BY RE Definition 1o7, V(A) = {IA(x)l x c E*} - the set of all stochastic vectors which can occur as distributions over the states of A, We some= times call V(A) the "state vectors accessible in A"l, Definition 1L8~ A set of vectors V =- tvl v2 } is convex if for any finite set of indices J, real numbers c. > 0 j e J and c. = 1 c.v. ~ VG The convex closure of a set of vectors Vo jcJ J jEjJ J written V = {v'' vo = Z c j.V. Cj - 1 Cj > 0O and v. E V}. It is clear that V(A) C S+, Theorem 1, 4 If RE has finite rank r, there exists a partition TI = (nl1r,,. - D ) on V(A) and an integer valued function g(Ltm) such that riiLA() c o) i = 1,,op r; aF E 1~ g(i~a) Proof, RE induces an equivalence on the set of stochastic vectors E accessible by the machine, Since RE has finite rank, form a set of an arbitrary distinct rep= resentative from each congruence class, say x1,,, o xr where xi Z x. i = 1 2.,;,o ro j < i Define II. {IA(x)} x c RE[XjJ We show that (Tllp 0o rIr) is a partition of V(A) Let W i i= IA(xg) e W t IA(x~) c V(A)

16 IA(x') c V(A). xv RE[Xk] for some k = I, 0 r -— IA(x') Ilk for some k = 1, ~, r IA(x') C W Hence n tW r ril. V(A) FI'1 We show ni n rj = $ i # j suppose that IA(y) Er. ri nI. IA(y) i'Ii = y c RE[Xi] -- y RE xi IA(y) e rl. = y E RE[Xj] YRE x. Hence we get y RE xi x' xi RE y by symmetry and transitivity of RE gives xi RE Xj xi c RE[Xj] But x. ard x. are representatives and there is only one representative from each class x, =x. ipj 1 J which is a contradiction, Finally we show there exists an integer valued function g(i'o) such that n A~a) c_ n o e 2 1- A(ia) a C v1 E 7I I V1 IA(wl) for some w1 E Z* v1A(o) = IA(wl)A(o) = IA(wla) c I-. for some j as has been shown above. v2 E Fri~ v;;;d 2 = IA(w2) for some w2 e Z* v2A(o) = IA(w2a) ~ lq. J

17 since elements of RE have the substitution property ieo w R x i-= W a R xia a E 1 E i 1 E 1 w2 RE xi = 2 RF xa x = C xpo is an element of a class with representative xj for some j and depends only on xi an o,, So there is a function g(%tm) such that g(i D)- C j 3a Ce Q. E6 Dr 1,6 NECESSARY AND SUFFICIENT CONDITIONS THAT STRINGS BE IN THE SAME P CLASS The relation RE has occupied an important place in the development of this theory, The structure of the matrices of strings which are in the same RE class will now be studied~ Definition 1,.9 A relation R is non-trivial if there exist x and y in the domain of R with x # y such that x R yo Definition 110c The kernel of F -- Kern (F) I V CE Rn: vio = 0} where R is the set of' reals~ Definition 1ollo The sIan of a set of vectors {V1' 0oo vr) is denoted by <(vl~ oo vr}>. ctvi ci R} Theorem 1,5 A necessr-;y and sufficiont condition for x and y to be in the same class of the reduction relation RE is~ x RE y~. there exists a subspace U of Kern~ (F) such that

(i) UA(z) C Kern. (F) viz P Z* (ii) A(x) = A(y) +I with u c U i e 1 n \Un Proof: x RE y = IA(xz)F = IA(yz)lF VI C S Vz C* hence A(.x)F = A(y)F since S: { (1O O) o.,0 O))9,,6 (0, oo O 1),} and A c E* By elementary linear algebra the solution of the above is a particular solution and a kernel, A(x) - A(y) + where h.i Kern,(F) i = 1) 2, oo n multiplying by A(z) A(x)A(z) - A(y)A(z) + 6 A(z) Vz ~ z* h hI A(xz) = A(yz) + A(z) Multiplying by an arbitrary distribution I and output vector F IA(xz)F = IA(yz)F + Io |' A(z)F I c S+

19 But since x and y are in RE IA(xz)F = IA(yz)P Vz c ~* I C S+ Ience ihl L Iet A( where hi c UC Kern (F), i = 1, 2, n0, n. &el; it-I h il i A(x) = A(y) + H (1) Multiplying by I on the left and F on the right gives h F IA(x) F P IA(y)F + I; I C S+ ht F but h F = 0 since h. e Kern, (F) i = o o, n IA(x)F = IA (y) F using (1) again and the same argument gives A(xz) = A(yz) + HA(z) IA(xz)P = IA(yz)F + IHA(z)F = IA(yz)F u1 since tHA(z) =, where ui e Kerno (F), i = 1B 2, o,, n o 1 Qo E, Do Part (i) of Theorem 1,5 will be restricted to the finite class of symbols rather than the unbounded class of strings0

20 Theorem 1, 6 Let U < U {A(x) iA(y)i}; i = 1l - n for x y such that x RE Y> X Pe 1 then U-A(z) C Kern, (F)} ["IV a subspace of R': (i) UA(ot C V: Vao e (ii) VA(ao C V C Kern, (F) ta C E] Proof, UA(z) C Kern, (F) Let V = <{uoA(z); u E U, z C ~*}> VA(o) {= uA(z)A(ao) u c U z C E*} - V Consider an arbitrary v c V. There must be some set of indexes J and constants c. such that. v - ~ c.u.A(zj) by definition of V, jcJ J J J voF c.u.A(z) F J J j e) = fhI cj(uj.A(z.) jcJ J But ujA(zj)F = 0 by UA(z) C Kern. (F) so v-fF = 0 Hence V C Kern, (F) UA(z) C V already shown ~ UA(z) C Kern (F) Q. E. D.

21 1, 7 INVARIANT StUBSPACE. CHARACTE.RIZATION OP RE Definition,112. A subspace V is invariant under a set of linear trans= formations {Ti I i = 1, 2, f V Toi. C V i 1, 2.,... m Theorems 1,5 and 1,6 yield the following directly: Theorem 1.7 Strings x and y are in the same class of RE if and only if there exists a subspace V of Kern, (F) such that (0) V is invariant under {A(a); Vo C } (ii) A(x) - A(y)+-I where H.i z V i = 1,,,o, n 18 NECESSARY AND SUFFICIENT CONDITIONS THAT RE BE NON-TRIVIAL A very weak necessary condition that RE have finite rank is that it is at least non-trivialO, PFrom'Theorem 1,7 it is immediate thatCorollary 1,8 The reduction relation RE is non-trivial 4z= there exists a subspace V of Kern., (F) such that (i) V is invariant under {A C); n0 ( Z} (ii) A (x) A(y)1+H where I.. V' - 10.c.. n (L~ ) x P y Hence we now know that given strings x and y in the same class of RE t~he diffe..rence between the rows of the matrices A(x) and A(y) must be elements of a subs.ace V which has special properties, Namely V must be invariant under al s#rmbol matrces and contained in the kernel of the output vector0

22 Theorem 1,9 A necessary condition that RE be nontrivial is that A(a) Va c Z be reducible for the same change of basis, In other words, there exists a subspace V and a linear transformation W of the state vectors S to a basis for V such that basis for V A O0 4 1 W1 A(a)W = A AA Where 0 denotes a submatrix of zeroes and A1, A2, and A3 are submatrices 10 2' 3 which for all a in Z have the same number of columns and rowso Proof: By Theorem 1.7 and standard matrix theory [see Jacobson, Lectures in Abstract Algebra, V. II: Linear Algebra, Van Nostrand, New York, 1952, pp. 116-117]. Theorem 1.9 gives us a strong matrix reformulation of the statement that RE be nontrivial0 xampe_ 1.2. We construct an expectation equivalent finite deterministic machine from a probabilistic sequential machine A illustrating Theorem 1,3, Corollary 1,3 and Theorem 1.o7 A. <I, ACO), A(1), F> where I - (8/10, 1/10, 1/10, O, 0, Q) O 1 0 0 0 0 0 1 0 0 0 0 A.() 0 0 1/2 0 1/2 0 / 0 0 3/4 0 1/4 0 1 0( O 0 0 1/ \

/ 0( 1/8 0 7/8 0 o o / o o / a o 0 0 0 1 0 0 A) (0 0 4/8 0 4/8 0 o o 3/8 0 5s/ o 0 0 2/8 0 6/s 0 1 0 0 0 0 0 The state diagram for A is shown in Figure 1L: 1 1/2(0), 7 /8(1) 0o'4' 5...... t. 5 0~ 1/8(1) ~/8() 1/4(1 \/4(0) 4 V 2 1/4(0), 3/4(1) Figure 1.1 State Diagram for the Machine of Example 1l.2d The following labeling conventions are'ased:e p(Kj: p [0,1] K e Z means rrobability of tralsition of p via syjmbol K,.i. - T Output of F occurs when the machine is in state 9. p1(K 1) I1(K), 1 2(K2) CG..._<_ >O is replaced by 0O.... ~ (' P2(K2) It will now be demonstrated that for machine A )OCR 0

24 A(00) /0 1 0 0 0 0 0 1 0 0 0 0 (0 1 0 0 0 0 0 1 0 0 0 0 O 1 0 ( O O \ 0 0 O O O 0 o o 1/2 o 1/,2 o0 0 0 1/2 0 1/2 0 o0 ) 0 0 0 1 o 0 0 0 0 1 0 0 3/4 0 1/4 0 0 3/4 0 1/4 0 o O o O O 1/ o0 0 o o 1 /0 1 0 0 0 0 \ 0 1 0 0 0 0 o o 5/8 0 3/8 0 0 0 0 0 0 1 O 0 9/16 0 7/16 0 \o 0 0 0 0 1 which gives 0 0 () 0 0 0 1 5 (A(00) A(0))F 1/8 0 4/8 (0,0,0,0,0,0) O 0 -3/16 0 3/16 0 1 0 0 0 0 0 2 Hence A(00)F = A(O)F or IA(00)F = IA(0)F for all I, Furthermore, for all P c [0,1] (0 0, P, 09 1-P, O)A(0O) = (0, 0, P, 0, l-P, 0) (0, 0, P. 0, l-P, )A(1l) = (0, 0, P, 0, l-P, 0) that is W' {(o, o, P, O, -P, ) }> is invariant under the symbol matrices A(O) and A(1), V < (O{(0 0, P, 0, -P, 0)}> C W and VA(O) = V VA(1) = V By Theorem 1d7 we know OOR. E0 but let us verify this fact. For z c Z* 0 0 0 0 0 0 (A(00) - A(O))A(z) = C z +1/8 0 -/8 0 D 0 -3/16 0 3/16 0 (O O C) Q O O

where C is a constant depending on the string z and (A(00) - A(O))A(z)F = DF = 0 consequently Vz; Z* VI E S* IA(00)A(z)F - IA(O)A\(z)F or EA(OOz) a EA(Oz), which shows OOREO. By the same method one can show that 10 RE 1 011 RE 11 01011 RE 11 111 RE 11 01010 RE 0 so all strings are in the classes RE(A), R E[O] R E[i], RE[11], RE[O1], RE[010], RE[0101] which means that RE has finite rank, Following Theorem 1,3, we compute the expectations and construct the expectation equivalent deterministic machine A'l Note that the values of expectation depend on the initial state I, EA(A1) = A(A)F P IF - 8,6 EA(0) - (0, 9/10, 1/20, 0, 1/20, O,)P =- 4 6 EA(1) = (0, 0, 3/20. 2/20, 15/20, 0)F = 1,1 EA(01) (O, 0, 3/80, 72/80, 5/80, 0)F = 1,9 EA(1)) - (,y 0. 3/20, 0, 15/20, 2/2C)T = 1,1 = EA(1) (since 10RE1) EA(11) (=, 9/40p 0, 31/40, 0)F = 1,0 A (010) r9 EA(t0101) =- 9,1

26 The expectation equivalent deterministic machine of Corollary 13 is shown in Figure 1,2, 0 1 -: 10 -)- 110 Figure 102 As -.RE for Example 1,1 We note that Al has 7 states while A has just 6 states, The deterministic cycle 0101 appears in both machineso

CHAPTER 2 DETERMINING WHETHER A PROBABILISTIC SEQUENTIAL MACHINE IS N-MOMENT EQUIVALENT TO AN INPUT-STATE CALCULABLE MACHINE 2,O INTRODUCTION In this chapter the concept of expectation equivalence is generalized to N-moment equivalence, A congruence relation RN is defined which partitions the set of input strings into classes. All members of a particular class produce the same expectation and first N-1 central moments for the machine defining RNd If RN has finite rank, a finite quotient machine can be constructed which is deterministic with each state corresponding to a congruence class. Each state can be connected to a random device having the same expectation and N-1 moments as the class represented by the state, giving a deterministic machine with random outputs' The constructed input-state calculable machine is N-moment equivalent to the probabilistic machine, After the first theorem concerning a necessary and sufficient condil tion that two strings be in the same RN class, a simple substitution gives generalizations of some results of Chapter 1, Hence the generalizations are presented in this chapter without proofso 2,l DISTRIBUTION EQUIVALENCE: =D The random variable structure of probabilistic sequential machines will be investigated in this section, Definition 201: 0 (x): the output random variable of the machine A -~ cA - - after a string x has occurred as input0 Using Definition 1.6 we note that OA(x) = O(rPA(x))

28 Definition 2,20 A and A. are distribution equialent, written A A' if for JA { (IA(x)Fj O} there is a 1-1 map h between JA, and JA such that IA(x)h(j) = IA(x)j j E JA x hO ~j) i JAl Distritution equivalence corresponds to the conventional definition of equivalence for discrete random variables except for random variables F. # Fj for i # ja Referring back to Examnle 0Q2, two chemical cells are distribution equivalent if (1) We neglect those partitioned areas which have either zero efficiency or a zero fraction of the catalyst. (2) Of the remaining partitioned areas there is a correspondence between the partitioned areas of one cell and the other such that corresponding areas have the same fraction of catalyst regardless of the sequence of controls entering the cellst (3) Corresponding partioned areas have the same efficiencies, 2,2 MOMENTS OF TIE OUTPUTI RANDOMQ VARIABLE [efinition 2o3;, Let p ('1Fi c R i = 1, 2, p n ni/ call Fk I:.

29 Then the:iJth central moment of 0* (x) is A L["[ (O*jx) t(x) ) i] = 2, 3,* Theorem 2, 1 7i (X ) / g:) (4) TIA x (x) )EA(x) i = 2, 3, #., k-O Proof: By the binominal theorem A,,)eEC I1 k *xi-k k x'(.) = E[>E (l) (xiA k=0 To compute the exnectation of the discrete random variable O(x)i-k note that it has the same distribution as OA(x) but takes on values i=p'k for i f k i-l AA f[0*(x) E li4k k )E (x)i ( k-i i!k k i =: N. ( (k).k IA(x) (F )EA(x) + (1) A() k=0 Qo E. DI 2,3 SPECIAL PROPERTIES OF RABIN PROBABILISTIC AUTOMATA Definition 2,4; A Rabin probabilistic automaton [1964] is a probabilistic sequential machine such that I c S and F1 = 0 or Fi = 1 i = 1, 2,,,, n. Rabin probabilistic automata have rather special features as far as the random variable of the outnut is concernedO Corilarv 24.: For a Rabin probabilistic automaton A!~~l~m- r(g!____~~ ~.~UV

30 x 2 > )h(x l (1lkj'EA(X) 2 3' k k0 Proof, F. -0 or 1 hence 2(F>'=k) s. fo r i# k and the result from Theorem 2o1, Corollary 2.,2: If EA(X) - E (y) for some Rabin probabilistic automaton A A (x) (y) for i = 2 3 Note- for i -- 2 we gt the variances of the outputs are equal, Corollary 2.,3:; If two Rabin probabilistic automaton A and Al are expecta= t ion equivalent then a )) ~.... (x'i 2, 39 VX z* 2,.4 THE CONCEPT OF N-MOMEtNT EOUIVATLENCE -N Even if two machines are expectation equivalent, the statistics of their behavSost nay be so d.fferent that for many purposes we would not want to consider the machines behaviorally equivalent, Returning to Examplh 01 1, two sl!ot-machines can be expectation equivalent, mneaning that tht aveLtage P1Yo.f is the same for both, but one can be much more desirable than the other for a player of limited resources, For a player with limrSted resources might have a far longer average time until "gambler's ruin"' on one machSine 1'-hn tvhe other, 1en1ce- in order to associate mac-hines in i:he.-ae c.t- ss:,ho:c st, tistics of behavior are soinewhat alike, the notion of N-moment equivalence will be introduced, Definition 205. Probabilistic sequential machines A and A" are N-moment

31 eo uival ent wI e it A A A ifl Llx)' X) i - 2, N for all x in. E* Examle 21. Probabilistic sequential machines A and A' such that A N for any initial distribution I, i e, EA(x) EAv(x) and A A'i'(X) A( Vx ( X* i: 2i 3, I, I S+ A(0) / 1 0 0 A(1) / 3/5 1/5 1/5 1/2 1/4 1/4 1/5 4/5 0 1/4 0 3/4) 4/5 1/5 0 A (0) 0 0 1 A (1) = 4/5 1/5 0 0 1/4 3/4 0 4/5 1/5 o 0 1 4/5 1/5 0 For both machines. F for F1, F2 arbitrary real numbers, 2,5 THE RELATIONSHIP BETWEEN. AN) A N Theorem 2,2 For prolbabilistic sequential machines A and A~ A n A A -N A for all finite N D1) Proof, Distribution equivalence means there exists an h such that''h (i) 1 i twhen (I Ao (x))iFi $ 0

32 Hence n n 7e (IA(x))h( )h(.i) E (I'A'(x))iFi or EA(X). EA (x) which is expectation equivalence, For any finite N FN (Fl)N h(i) {F) The fact that A At uN(x) = PN(X) comes from inspection of Theorem 2,1, Symbolically, we have shown A ED Aq > A N Al for any N0 How close one can come to a converse to Theorem 2,2 depends on the form of the entries of F, Lemma 2,1 (Gantmacher [1959]) Given a sequence So S1,.. of real numbers S, if one determines positive numbers >1 0 r2 > 0' rn > 0 00 > V > V V > O0 such that the following equations hold S rjV (p 1 2 (*) then the solution to (*) is unique, Wte can apply the lemma to get the following partial converse, Theorem 2,.3 If machinc. A and A~ meet the following requirements (Letting hs i w f WI e1 e * jF)

33 (i (IA(x))F 0 iff (I'A'(x)) iF = O i = 1, 2, o.d, n. (ii) All states in a given machine have distinct output symbols, (iii) EA(X) = EA (x) Vx ~ ~* A' Ai(X) ) pi (x) i A 2, 3, oo Then A, and AM are distribution equivalent. Proo f, We use Lemma 21.l Since the central moments and expected values of output are equal for any stri.ny, tire moments of 0(x) and OA (x) about zero are equal for any stringV S0 i= [(IA(x))i such that F 0] SI = EA(x) = EA, (x) A 2 A' 2 52 =.(x) + EA(x) = ~2 (x) + EA,(x) We discard those components whose contribution to the moment is zero and relabel thie non-zero components by the index j. Let J = {i: IA(x)iFi F 0} Because of assumption (i) we also have Ti J = {i: IvA9(x)iFi f0} Hence s (IA(x))j(Fj) P () 1, 2, 2> (I A' (x))j (Fj) P = 0 1, 2, 2 d jcJ By the lemma the solution is unique. eA(x))j: (I A^(x))j j E J F. = F'I J J

34 Therefore A and Al are distribution equivalent. Example 2,2 The condition (ii) of Theorem 2.3 is necessary as shown by the following: IA(x) = (45,.3,.2) F = F' 1 I'A'(x) = (.5, 049.1) EA(x) = IA(x)F = o5 EA,(x) a I'A'(x)F' =,S Since A and A' are Rabin automata, by Corollary 2,3 A Av; (x) P= Pi Cx) i = 2, 3,.o. However, A and Al have different distributions over states for the string x. 2,6 THE N-MOMENT REDUCTION RELATION Definition 2,5: The N-moment reduction relation R: xRNy if for all I in S EA A EA(xz) = EnAyz) and Pi(xz) * pi(Yz) Vz e E*, i = 2, 3,.,,, N The relation RN is a congruence relation and RE = RN for N = 1. 2,7 INPUT-STATE CALCULABLE MACHINES A probabilistic sequential machine has randomness associated with its switching, or state transitions, and a deterministic output function 0, For some problems it is convenient to view a randomly behaving machine as having deterministic switching but a random output device, We study now connections between these two viewpoints. Definition 2,6: (Carlyle [1965]) A machine A is calculable _~~~~~~~~~~~~~ _tsaeclual

35 if knowint the state at time t and input at time t. the state at time t+1 can be calculated by a deterministic lfunction A3 trisvlc has pointed out. the class of finite inyutstate calculable m,-chine.s cons:.ists of exactly those machines which have finite determinst V sw tching and random out puts, Definition 2,7" A random ou depending on the state S. with parameters of r - r will be writtena Si' rld rP~ I 2, 8 CHARACTERIZAT ON OF INPUJTSTATE CALCULABLE MACHINES EQUIVALENT BY N TO PROBABILISTIC SEQUENTIAL MIACHINES We obtain a generalization of Theorem 1,3. Theorem 2 4 Let RIN be the N-moment reduction relation defined by a probabilistic sequential machine A. Rank IRNI - r finite = there exists an input-state calculable machine A' such that A -N Al ProofL Using the quotient construction of Theoremn 1.I. obtein an Al" = Z*/RN where A" < RN[i1! RN[X], M][RN[X], ] > and MI is analogous to the function!M in Theorem -.3, Elements.in the same congruence class of RN have expectations and the first N-1 central moments equal. Hience the machine A" can have random devices attached to the states (which are RN[x]) such that the first N-l central moments and expectation of each device is the same as the congruence class represented by the state, The resugting machine A1 has deterministic switc:hing and random output functions and is equivalent by EN to the probabilistic machine defining RN.

36 The detal1 of thi-e pxroof parallel the proof of Theorem 1.3:, Q0 E, D, 2,9 A NECESSARY AND SUFFICIENT CONDITION FOR': T!-HE NM.IOFMENT REDUCTION RELATION TO HOLD In t*1he PTCVzioXu s et'-on we have seen the importarnce of the N-moment reduction relation RN in characterizing those probabilistic sequential machines for which there is an input-state calculable machine equivalent by -N" Let lus now obtain invariant subspace conditions for strings to be in the same class, analogous to those of the theorems of Chapter 1,.; Theorem 2,5 x RN y A A(x) A(y) i: N where < }> C II Kern, (F' ) N and <(-1. h}> Az), " Kcrn..(Fi) Proof; Supfpo;ec that RN holds for x and y'A(x) = EA(y) 3 IA (x) F IA (y) F VI I A'(x) A (y) + r. Kern() j 1= 2, n rn:v2(x) I IA(x)(FT) E )A(X) IA:x)) r IA(y~ts) ( 7) VI, (y) t ^ ( x ) ( t:, ~, -t3 IAty) (F") VlT -:. 5

37 A(x) = A(y) + o rj c Kern,(F2) j = 1, 2,,, n rn For any i, PiA(x) can be written as a recursive function of IA(x)(Fi) and smaller powers of F, ie,, i(x) = IlA(x)(F: ) + (-1) kIA(x)(F k)EA(x)k + (-l) E 1k=Ax (l)EA(x) Hence by induction we assume IA(x)(Fk) = IA(y)(Pk ) k = 1, 2, 0, i-l; VI e S (1) Hlence A(x) = IA(x)(Fi) + ) A i 1J(y) = IA(y) (F ) + (x) IA(x) (Fj) = IA(y) (j) VI C S+ j < i (2) dicx) =i.(y) A A (,'P.(X) = >t(y)P: AI(x) = I\(y) where r.j Kern,(` ) j = 1, 2, ^0 n rn! which completes the induction, The rest of the proof is analogous to Theorem 1,5, Q, Iji nD N If we substitute RN for RE and (|' Kern, (F ) for Kern, (F) the i=l p)roofs of Theorems 1,4, L6, 1,8 1 and 109 go through exactly as before and we state the dual theorems which are obtained, Theorem 14DI) If RN has finite rank r there exists a partition r = (T=71, t~ iT )

38 on V(A) and an integer valued function g(i,m) such that niA(o) C (i ) i = 1, 2,,., r a c Theoren 1, 6D Let U = K {-A(x) -A(y i = 1 2,.., n and x RNy> then for any z z s* N UA(z) {C Kern. (F ) and there exists V a subspace of R i=l such that for any f. s (i) IJ.^Aa) C v N (ii) VA(a) C V C V Kern,(Fi) i=l Theorem 1,8D R, is non-trivia' - (3V) a subspac.: of Rn such that N i=l (ii) V is invariant under A(a) ac e, (iii) A(x) = A(y) + H where I. z V some 0i. 1 () Theorem. 9D RN is non-trivia':- there exists a sub-': such that the symbol matrices A(c) o a e I be reducible for the same change of basis for V i oe, there exists a linear transformation W from the state basis S to a basis for V such that basis for V -SA1 0 W A (t)W =

39 whecre denotes d iDlock of at.?~ex:.oes t's.'Xf z:s': ize for all strylwbols i and N V Icr Res. OFi VC Kern (F ) i I,ix.-pn?1e 2.3 We e-xtend Example 1,2 to illustrate Theorems 2,,4, 2.5 and 1,.8D,, 10n (O 07," p ~, p, ~j }> j Kern. \ n for any finite N. NotW that in this case that the classe. of RE are also the classes of R N:H eceW we can ireplace the output from any state of the machine in Figure 1.2 with a random device nosscssing; the same first N central moments as the probabilistic sequential machine, Let us compute the var:.>:.nces,, A 2 fl..iA fl/O -1/10.. 1/10, 0, 0), (100o (8) 6 8 84 Lt.kewiie2 We get (? 1,: A44 l2 () 0 Ap;(i0, < z(

40 p2(11) (0, 0, 9/40, 09, 31/40, 0) 1O0 0. (1:0)2 25 = Oi0 V2(010) - (0, 09 21/320 09 11/320, 72/80) 100\ 3,6, 25 25 0X0 A (0101) = (72/80p, 0 53/12809 0, 75/1280, 0) (100) (94)2: 7,29 A machine AV which has the same expectation and variance for each string and deterministic switching will be constructed using random output devices symbolized by attached to states S' which supply random numbers with mean e an:d variance V., The machine A'," shown in Figure 2,1, is the machine of Example 1.2 with the outputs connected to random devices such as the above rather than deterministic outputs,

41. 0) o I( 4]. I 0O~~~~~~09i./ 1O 0.1 Q1X is the initial state of Al re 2.1 Input-state calculable machine Al which has the samen expectation and variance for all strings as probabilistic machine A of' Example 1I2,

CHAPTER 3 THiE NOTION OF INDISTINGUISHIABILITY AS A CRITEPION OF BEHAVIORAL EQUIVAIENCE Suppose probabilisitic sequential machines A and A' are behaviorally equivalent in an intuitive sense, Taking into consideration how machines are built and repaired, one would expect them to be interchangeable as submachines of any larger machine. Indistinguishability of two machines in any machine into which they can be plugged is a strong criterion, the ramifications of which will be investigated. The following example of Arnold [1964] illustrates how the notion of distribution equivalence, ) fails to meet the interchangeability requirement, 3,1 EXAMPLE OF TWO) DISTRIBUTION EQUIVALENT MACHINES WH1ICHI ARE NOT INTERCHANGEABLE AS 1COMIPONENTS OF A MACHINE A1 - <I1, A1(0), A1(1), F1 > A I2= 2 A2(0) A2(l), F2) where I = I1,5 F2 F 0 1/2 1/2 0 0 0 0 0 1 0 A1(0) A1(1) 0 0 0 0 1 O 0 00 1 O. 00 I (l, 0f 02 10) 0 0 0 0 1 F1 = I1 = (1, 0, O, 0 0) (0 1/2 1/2 0 0 42

43 -lachines A1 and A2 happen to be independent of the input ieo are Mlarkov provesses since AL(0) = A1(1) and A2(0) a A2(1)o TABLE 3 1 COMPARISON OF MACH.INES A1 AND A2 x EA (x) I1A1(x) EA(x) I2A2(X) 0 (1, 0, 0, 0, 0) 0 (1, 0, 0, 0, 0) O or 1 1/2 (0o 1/2, 1/2, 0, 0) 1/2 (0 1/2, 1/2, 0, 0) 00, 01, 10 or 11 1/2 (0, 0, 0, 1/2, 1/2) 1/2 (0, 0, 0, 1/2, 1/2) all x: Zg(x) > 3 O (0, 0, 0, 0, 0) 0 (0, 0, 0, 0, 0) Table 341 establishes that A D A2 Later a machine will be shown which behaves differently with Al as a submachine than it does with A2 as a submachine even though the state behaviors of A. and Al are Markov processes' Definition 381: A >B denotes the machine obtained from plugging the output of A into the input of B, subject to the provision that the input symbols of B include the output symbols of A, Definition 3S,2 The set of t, accepted by machine A with cutpoint X written T(A,X); T(AX) = {x: EA(x) > X} Definition 3,3: A and Al are tae equivalentmachines, written A T A' if for some specified X1 and X2 T(AX1) = T(A',X2)

44 Definition 3A4~ A and A' are istinuishable for a class of machines if T (A-CX) T(AB+CX) for all ), and C z 2,, The class' could be something more special than finite deterministic or probabilistic automata, eog, the class of definite automata6 Theorem 3o,i If probabilistic sequential machines A and Al are distribution.equivalent they are not necessarily tape-indistinguishable for the class of finite deterministic automata6 Proof- (by example)~ Let C be a finite deterministic machine which accepts 01, 10 with probability 1 and all other tapes with probability 0,.. We tabulate the expectations of A1 -C and A2 + C in Table 3626 TABLE 36 2 EXPECTATION OF A! + C AND A2 + C FOR STRINGS x OF LENGTH 2, IlIr~aL 1 — i2 2 00 0 O 1/4 0 01 1/2 1/2 1/4 1/4 10 1/2 1/2 1/4 1/4 11 0 0 1/4 0 ttence T(Ai-C,.X) - T(1y2'Cx ) for any X E (L/26 1/4) The reason for this

45 difference i,; that the condit ional nrobabi Ities of ontrv't;::... variables differ for A~ and A2' For example, Prob, fo* (01) = 1} 1 given 0 (1) = 0 1 1 Wh1 i 1 e Prob, {0 (01)= 1} = 1/2 given OA (1) = 0 A2 A Theorem 3,2 For probabilistic sequential machines A and At if for all finite deterministic machines C and any cutpoint!1 T(AC,X) = T(A'+CX) A -E A' Proof: Suppose EA(X) Z EA (x) for some tape x of length k, Without loss of generality pick EA(x) > EAQ(x)e Since the rationals are dense in the reals, let I. be a rational such that EA(x) > X > EiA(x) Let C be a deterministic machine which beginning at time k computes the number ik'c where i is the input at time k, Since X is rational C needs only a finite number of states, C accepts the string x iff i k > 0, which can be done in a finite number of steps. x E T(B+CA c) iff EB (x) > X but since C is deterministic x s T(B-CX c) iff EB(x) > X hence let B = A and B = A9o x e T(ACI,X ) and x T(A'C, c ) Sc T(A-+C X c) T(A9+C9X );C C By logical equivalence we have shown for the class C of finite deterministic machines (X)(C)[T(A-CX) =- T(A'+CX)] ~> (x) [EA(x) 0 EA,(x)] QC FEli 1)A

46 By the example presented in Theorem 3,1 we know the converse is not true, 3,2 A MORE SATISFACTORY TECHNICAL NOTION OF INDISTINGUISHIABILITY The example at the beginning of this section shows that machine equivalences such as ET equivalence and even distribution equivalence, -DV break down under composition of machines. To obtain a more satisfactory definition of behavioral equivalence, the conditional probability structure of probabilistic sequential machines must be explored, A stronger concept o- aquiva1lencc, called indistinguishability, based upon equality for the two machines of the probabili*:.ies of all possible output strings given all possible input strings will be formulated, following the development of Carlyle [1961]. In what follows it is assumed that Z contains a string of one symbol A so that A(A) = E(n),the n-dimensional matrix identity, Definition 3_.5. The conditional probability for a sequence of outputs y = YlY20ooYm given a sequence of inputs x ola.,,am starting from an initial distribution 11 = (111, N12,,, Nn) of a machine A will be written Pn (y/x) or if the machine involved is clear from context, just Pn(y/x), Table 3~2 shows how machines Al and A2 differ with respect to Definition 3,5. The symbols of the output alphabet are real numbers which occur as components of the output column vector F, ie, the output alphabet Y can be written n Y -= {Fi} As usual, the set of all finite sequences of symbols from Y will be denoted by y.

47 Definition 3 6;. The probability of a sequence of transitions S. +S - S. ~Jy~,;r~"~-",u rc~.~~IC~.~ ~ -'~ "1 2 j with output sequence y because of input setiuence x will be written PS (,,(/x) S1. e 11 1 /J Definition 3,7 T`he?onditional mrob lt transition matrix A(yi/a) is formed from A(a) by zeroing out all columns except those corresponding to states with output Yi,. More formally, Let J = {j F. = y. yi c Y Yi i j Yi Yi Y and let QYi be the matrix with [Q i]jj = 1 for j C J and [Y]k j = 0 otherwise. Then A(Yi/O) = A(o)QQ Yi c Y, a c E. Note that [A(Yk/a)]iij is just PSi+Sj (k/a) 1 J Remark 3 6; Let y c Y*, x EC *, y. c Y, a c Z such that 2g,(y) = Zg,(x) Then A(yyi/xa) - A(y/x)A(yi/a) By definition [A(yyi/Xo)].m is PS S (YYi/xa) For any state Sk PS (YYi/X") PSQ-Sk (Y/X) PS;S (Yi/a) Since transitions to different states Sk are mutually exclusive events PS mS i S (y/x)PS S (Yi/a) Z m k=l R k k m using the definitions again [A(yyi/x)] m = [Ay/x)] k[A(i/)]k m 1 2R9m kl

48 or in matrt.ix for;, A (yyi/x ) A(y/x)A(yi/c;) Heencc tihe conditional probability transition matrices for output strings liven input strings can be generated by the conditional probability transition matrT9ccs for output symbols given input symbols, analogous to the case for thie transition matrices A(x) Remark 3.7': Given initial distribution over states nt the probability of errting output string y from input string x is Just.n n Y/x) = _ — _ HiT[! A(y/x) ji..~l i? with U j we can write A P (y/x) iIA(y/x)U Rcemark 3.$,,8 We note the following ic,. ntity ~I{Y"/x -- /.'y)')/2i).;/ for all l o: sinace j y~' ~Yyi/xe) I " f-l nA(y/x)j'( vi/ )U.(y/x).__ A(y/o)U, ITA(y/x)A(ci)U V.?Y * 1 But for a.ny n x n stochastic row matrix C CU Ui 1enc.~

49 I)ef:itK tion 3,, 8 The terminal distribution T*(y/x) for a sequence ot outputs y given inputs x (assuming PA(y/x) > 0) * (Y//x) AX The i"t.h component of ll*(y/x) is the probability of being in state i after input string x has occurred and output string y has been observed, The following identity holds whenever PA(y/X) > 0() PA(yy i/) = Pi(y/x)P.y/x) (Ya) Y9 a,EC x c s*9 y c Y* Definition 3,9, Mlachines A and Al are indistinguishable written A A# if A (yx) A~ P(y/x) = Pn1 (y/x) Vx ~ E*, Vy E Y* The concept of indistinguishability for machines depends on observable identity when both machines are started from their initial state distributions,, Definition 3.,10 Mlachines A and Al are k-indistinguishable if (y/x) = P 0(y/x) x (z)mC y E (Y) for m = 00 19,: 9 k., Definition 3Jile In a machine A, two initial state distributions Ii and x are indist in=uishable if PA(Y//X) P (Y/X) Vy E Y*, Vx C * D)efinition 3,_12:. In a machine A, two initial state distributions n and X are k-indinstinguishable if AI kA P(hk/iiin) w (Y/xh) xth n)kty d t~ (Y ) f Checking, whether the indistinguishability definition (3,9) for machines or for initial distributions (3,11) holds using only the definitions involves calculation of an unbounded sequence of conditional

50 probabilitiesj In the next section is shown a bound for the length of strings whose probabilities need to be calculated~ If n is the number of states, then only strings of length n-l or less need be considered in establishing indistinguishability, 3-3 THIE RELATIONSHIP BETWEEN THE INTUITIVE AND TECHNICAL CONCEPTS OF INDI STINGU ISHABI LITY We have yet to relate the intuitive notion of indistinguishability to the technical Definition 3,9, The next theorem shows that two machines indistinguishable in the technical sense are indeed indistinguishable when plugged into CO any finite state probabilistic or deterministic machine; Since C has a finite number of states, it is assumed that finite strings of Z = CC(Y*A the random variable taking on values of strings of outputs of C given strings of inputs from the random variable Y, depend only on finite strings Yo Theorem 3, 4 Let Ca be the class of finite state probabilistic and deterministic sequential mazhines, For any C C C* tIf A: AI then A + C Ag * C Proofs For any fixed value y of the output string random variable of A, YA A*-C A C PA (z a C(yyx) a P (Y/x)pC(z = C(Y)/Y) since the occurence of different y are disjoint events, for all y c Y* Ag) C Zg(:xy) ) pA(z a cy/x ~x) PnA(y/x)pC C(y)/y) Since Z and ZV range over the same set and the indistinguishability

of A and A? A A0 Pn(Y/x) = R, (y/x) So for all x e Z* and all z e Y* C AC AA-+C, P11 (z C ((y)/x) P (z = C(y)/x) which means A-+C and A`+C are indistinguishable,, Q. E6 Dt Since the machine C might ignore its inputs, it is clear that the converse to Theorem 3~4 does not hold, The criterion of interchangeability as a submachine has lead us to -I as a behavioral equivalence for probabilistic sequential machines, The equivalence'! is well known as an equivalence between communication channels, The other kinds of equivalences discussed are equally valid for channels with numerically coded outputs, The relationship between the equivalences -T= -E= -N D9 and =I can be sunmlarized in the following schematic wayA A0 A -N A E Az -— E A - T A A A~ As we have seen in previous chapters, the concepts of behavioral equivalence for probabilistic machines analogous to those of deterministic machine theory depend on the device being modeled, Consequently, applications of probabilistic sequential machines to new domains are likely to suggest new kinds of behavisoral equivalences,

Ct-IHAP\ ITEP; 4 TFINITE COMPINYIETH SETS [0' INVARIANTS FOR TIHE BEHAVIOtRAAI. EQUJIVALE-,NCElS E ANl )D I AND T-IE REDUCTION CONGRUENCE RELATIONS RE AND RN The results of the previous chanters involve relations defined over all finite strings of the input alphabet. In this section are found bounds for the length of strings necessary to consider in order to decide whether two elements of the domains of the relations are in the same class, Definition 4i41 A set of functions I = {fl, a, f } is a set of 1l m - invariants for the relation R if for all x and y in the domain of R xRy-> f (x) = f (y) i = 1,.o0 m The set of functions is a comnlete set of invariants if xRy,,, ff (x) fi (y) i = 10 o. ~ m Wa;, exhibit sets of functions which are invariants for the above relations, A set of functions which are invariant over RE and RN areo f-X)(x EA (xz) I z)xJ A for all z: 2g(z) < i, for all I ~ S (, x Nz While for the relation i the set of functions below is a set of invariants' g (xy)(A) = P (y/x) for all x and y,; g(x) =,g(y) 5 i LT I K wise the t. hex l)(A) EA(x) for all x Zg(x) s i A h(x r) (A) ~ Vr(x) for r = 2, 0, N is a set. of invariants for the relations andI N It is clecar that for an unbounded i, the above are complete sets of

53 invariants, However, in what follows a finite value of i will be found for each of these cases, In the case of -E the bound will be the same as the well known Moore bound for deterministic automata but in the case of =N it will be lower for most machines, The main tool used in finding the various values of i is the following simple lemmad 4 1 TIIE FUNDAMENTAL LEMMA Lemma 4,1 Given an n-dimensional vector space V, a finite set T = {Ti} where each T. ~ V x V is a linear transformation on V and some finite set of 1 vectors V) C V such that dim <VO> = r > 1. Define M = V0 M1 {vOTi: Ti c T, Vv0 C V0} Oi 1 Mk = v0 T.: T,dooT. T., T. e T, v E V0} k 011 ik ilk 0 and let L i U Mj> ==0 Then there exists an integer J(T) such that (i) LJ(T) = LJ(T)+l (ii) Lkl C Lk for k.< J(T) (iii) J(T) < n-r Proof: L0 C L1 C,0o C Li C,0 C Lk as a consequence of the definitiono The sequence {dim Ljj=O is bounded above by n, the dimension of V.;j j=O

CaT i) JT) the smallest index k such that Lk+l = Lk Showinc that; u9 mz CJiv L. T is strictly increasing requires that for all j+1 < J(T) L+ 1 I. L. - > L- J Lj which is logically equivalent to Li+I = Lj - Lj2 j+l i.l 3 ~ >+2 J+1 iience it is sufficient to show ij* C Lj -== Lj C L As sunie Lj1 C j \W,L,C-G, pick v'( l, 0.1..T. L = (VOTi..OOT. ).T 0 1j-1 >+1 i +2 But W = VOTi T L. I 1,j+1 l~+ l So there is a finite set of indices I = {i} of a spanning set U' for L. U' = {VOTi i C I } 0 1 2 r. such that r.i j and constants c. W = ~> Ci (vOTB io o'T i ) ~t B Bri iCI 1 ri so v w.T. = f (__i(v()TB.,j.T1i.-T ) c T4 J+2 1 r. j+l o~~~~~ic C.s iCO2

55 Now cons'; e ls-th 7:sueCnce o. d.1 imens ions dim,..~ 4i dim Tn.5 i pl r c:-. i, I J r, 0, d:;i,' E>~~~~~ ~+-~~~~~~ r~ k k+ 1 fo k,' J (T). dim..i c d inL +'"r.' +m i.< J (T) Not i.n: that dip L( r, dim L0 + J(T) < dim L(T) Whi chil v.ss J(T) _ n.r 4, 2 A lBOUN) I()R TiOST IN; W1 Ii' H I I N._ Theorem 4m 1 If i j -s ri8 "Tabilistic sequential machine with n states, t -_n (n-l)-indistinguishability of initial distributions 1r and -rr is suff.cient to guarantee ixndtstinguisnuabil1ity of initial d-istributions -a and =. Proo'f Using Lemma 441 let v 0 { j} an d cdim <Y)> - 1 T = {A(Yi/a) y c Y G C V) T A(yj/i)U )by the. iiemima. For alny string x - if L.ri f for r' finite, A(y/x)ll can be expressed as i(y) Clii(yiill Ri /ji. ji ) )

56 wi th r. < n - 1 for i ~ I YBi C Y aj i F E (for k- 1 9 ri) k k Ilence for initial distributions 1! and -m9'v-" PAi(Y/x) =liA(y/x)lJ L ciTA(YBi yBi /o iI 1 r. 1 r Let i 1i 1 d 1 A' B Ai Ai P 1(y/x) = ciP(y /xi) with Qg(yi) = Lg(xi) < n 1 multiplying (*) by 7 gives A A i PA' By the assumption of (n l)>indistin guishability for ir and f pA y/i L A ii <i y~ /) P(y /x ) p g (xl) = zg(y ) < n 1 lien co A A (y/x, P1,1 (y/x) Q, E, D. 4,,3 EQUiVALENCE OF DISTRIBUTIONS IN ON}E MIACHIINE Using Lemma 4J, we can make effective the definition of the relations RE antd RN of Chapter 2o A bound will be obtained on the lengths of strinhs needed for deciding whether x and y are in the same congruence class., D?:finition 4.42: Distributions n and A are expctation e uivalent for A a machine A0 written ~ r X~ if it A(x)F = AA(x)F x c Z* Definition 4,30 Distributions i and A are K-exPectation equivalcnt for C=:^==G_~~~~~~~~~~~~~~~~~~~~irn~ta~~~P~?~lm, 16-~T

57 A a ma.chlne A,,:, written IT, if IlA(x)F = \A(x)F x ~ z* 0n < k g(x) < K Theorem 4.2 (Generalization of the result of P!az [1964]) If A is a probabilistic sequcntial machine with n states and if? and X are n=2 eauivalent distributions of A then A r AtE A Proof~~ We use the elementary fact that for any constant c nA(x)F= XA(x) F H iA(x)PF + c = XA(x)F + c Vx c Z* Since IA(x) and XA(x) are stochastic 4-, nHA(x)[F + c) ] = XA(x)[F + c( ] Vx C z If all the entries of F are equal, then all distributions are expectation equivalent, If at least two of the entries of F are not equal, then <F> # <K + c (i> Let Al be a machine differing from A only in that 1 F' _ 1: 4 c( |, Suppose we experiment with A and A' simultaneously, No new information is obtained, i,e, distributions are expectation equivalent for A iff they are expectation equivalent for A' - However, if we comp:.te a bound for the two machine experiment using Lemma 4,1, the bound will be lower than would have been obtained from an experiment on A alone, Since the results of the two experiments are identical, the lower bound applies to A also, V = {F F + c( I} dim <V0> = 2 T - =Ala) a 0 S} Vo0Ti A(i) oV0

58 By Lemna 4,1, there is a finite set of indices,J of vectors A(xj)V with VJtVO with Zg(x ) ~ n 2 such that for an arbitrary x E Z* there 0 are constants c. so that J A(xl)} - =-' cA (x)Vj which reduces to =, c A(xj)F + c" where c' is a constant, Multiplying by the initial distributions gives: fA(x)F = ~ c.I A(x)F + c' jcJ J \A(x)F -- c.XA(x.)F + c' jEJ J (n-2) -expectation equivalence gives HlA(xj)F = XA(xj)F so IIA(x)F = XA(x)F QO Eo Do 4,4 BOUNDS FOR TESTING FOR bM~.EMBERSHIP IN =E AND RE Definition 4o4,,__. The abstract Jo of probabilistic sequential machines A = <KtA(O)0,,,A(k-1l)F> with n states and A' = <X,A'(O),0ooA'(k-l) F'> with n states is the abstract n + nQ state machine AO written Ae' = < A"(O),00A*(k-l),F> where AQ(i) - and T and ~ can be embedded in the n + n' dimensional space as

59 n, zeroes n zeroes no ace'0 0 ) \ (09 Q O ~ D 09;4) The problem of deciding whether two machines A and Al are expectation equivalent ~:TrA(x)F A- XAA(x)F Vx e * is logically equivalent to deciding when ark and orr are equivalent in A*A 9 i e0 whether AeeAX E Hence following Caryle [1961], we use Theorem 402 to state Remark 4031; A*A D A~A O 7T* __. - =*.. X where K = n + n' - 2 E KE which gives the following theorem, Theorem 4,3 Let A and Al be probabilistic sequential machines having n and n' states respectively, Then a necessary and sufficient condition that A and Al are expectation equivalent; [IA(z)F = XA9(z)F9 Vz C E*J ve= [nA(x)F = XA9(x)F9 Vx: Zg(x) < n+n',2] Theorem 4,3 makes the experimental determination of expectation equivap lence possible provided the number of states of each machine is known, Furthermore, it gives a bound on the process of finding whether two strings are in the same equivalence class under the reduction relation RE of Chapter 1, This result is summarized in the following theorem. Theorem 4,4 Strings x and y are in the same equivalence class under the reduction relation RE of an n state probabilistic sequential machine A if and only if EA(xz) - EA(yz) for all strings z: [g(z) 5 n a 2 and all I e So

P roo f x RE y< EA(xz) = E (yz) for all z c E* -for all I E S z IA (x) A (z) F = IA (y) A (z) F Let T = IA(x) and X - IA(y) -*,- A(.z)' XA(z)?: Vz C Z* By TFiteorm 4, 2 and its obvious converse, we get TA (x) IA (y) which gives the th}eorem4 4,5 BOUNDS FOR TESTING FOR MEMBE3RSHIP IN - AND PN Definition 4,5 F = the independence number of an n state machine A with output vector F: n =dim <{(Fi): i = 1, 2,..., n}> It follows from vector space arguments that n = {Fk: Fk 0} where # is the cardinality operator on sets. The independence number is just the dimension of the space generat&e by powers of the components of the output vector F. For a Rabin automata n = 1 and all central moments reduce to polynominals in what we may consider the first "central moment" EA(x), In general, if the independence number is nF, then for all x in Z*, the (nF+l)'st central moment Pn 1(x) reduces to a polynomial in the lower central moments since Pn (x) = IA(x)(FnF+ ) + Q(x) n.+l where Q(x) is a polynomial in which IA(x)(Fi), i = 1,..., nr occur. H{ence F+l(X) = IA(x) 7 ci(-) () i=l

61 since nF is the dimension of the space <(Fi):i = 1, 2,., n> nF = I ciIA(x)(Fi) + Q(x) ifl Theorem 4.5 Let A be a probabilistic sequential machine with output vector F an'i n states, Then for any r s n F and strings x and y in Z*: iA(xz) EA(yz) EA(xz') = E^(yz') A A A A,2 (x Z) =j (yz),(xz (yz): g ) nrp Vz C V* Vz: g(z') n-rlr(xz) =A (yz) KrAXZ') A= Ur(YZ) where p = 0 if (4) c <{F, (F2),., (F )}>, p = 1 otherwise, Proof: Let VI = {F, (F2), oeo, (Fr)} Vo if) V0 IV' o otherwise The reason for the inclusion of i() in V0 here is the same as in Theorem 4,2, ie, for i = 1, 2,.o,j r IA(x)(Fi) = IA(y)(Fi)<===IA(x)[(Fi) + c() ] IA(y)[(Fi) + c1 ] Hence the dimension of <V0> is either r or r+l. We will call it

62 r+p where p is defined in the statement of the theorem dim (V0 >- r+p where r <nF (Ti} = {A(i): i E 1} For any v0 C <VO > there exist constants ck such that (defining (F0) = 1 vooT. = A(i)v0 = v CkA(i) (F) Consider any string z: Zg(z) = ml finite Then there exists a spanning set A(xj)v0 with j c J and constants cj(v0) so that A(z)v0 =, cj(vo) A(x.)v0 o Zg(x) < n - r p Let v0 range over the (F') i = p, ooo0 r and multiply by n and X xA(z)(Fi) cj((Fi ))rA(x.)(F) nnj r jej jEJ - J - Since n. n 1A(xj)(F ) A(xj)(F ) by assumption rA(x) (Fi) =,(y)(Fi) That is the moments about zero from i and X are equal if they are equal for all strings of length < n-r-po Let -r = IA(x) and X = IA(y). Then we have for any z and any initial distribution IA(xz)(Fi) = IA(yz)(Fi) i = p o o o r holds if and only if for i P o oo r IA(xz ) (Fi) = IA(yz ) (Fi) for all strings z0 of length less than or equal to n~rspo Noting by Theorem 2,4 (equation (2)) that any central moment Ur(x) is a recursive

63 function of IA(x) (F), I, IA(x)(Fm) the result is establishecd Coroll 4 (Bound for testing the-relation RN) Lct A be a probabilistic sequential machine with n states and with N < nFy x RN Y f-> for all strings z' t Qg(z') < n-N-p VEA(xz) - EA(yz') A A for all I E S A A 1N(xz) = N(yzl) LN Where p () if { E {F (F2) ooo, (c ), = 1 otherwiseo Theorem 4,6 (Finite set of invariants for EN) _ N Suppose A and Ae are probabilistic sequential machines havirg n and n' states respectively 2 r2 if! c <{F, (F ), o (F ) Let p = where r c nF + IFP - # {y ~ y c Yr'Y' and y 0 ()} 1 otherwise For any initial distribution 7 of A and any initial distribution Xa of. EA(x') = EA' (x') A L AA _ A 2i( (x') Vx'' g(x) n+nt-r p roof Construct A let in 4 Proo~ Costrut A= A9anid let V0 in Lemma 4ol be

64 VV P (IFpflfl oE or else VF I - dim (VI > np - tyo @heY Or EY"I at1 y a O and y#Y ~'YB) nF+n p=#{Y chY~ and'l # O} Using Lemma 4,1 and an argument like the one in Theorem 4e4 establishes the theorem., Q( Ec De 4,6 DISCUSSION OF TIlE GENERALIZATION OF THE MOORE BOUND Corollary 4.6 Lit A and A' be n-state deterministic machines with two-valued output alphabet Y = Y? - {12}, Then A and A' are indistinguishable for all strings if they are indistinguishable for all strings of length at most L i:-. - 2 Proof" In Theorem 4e6 we have nF 2+22 = 2 and p = so that r S 20 For deterministic machiness, indistinguishability reduces to EA(x) = EA,(x) for all x g E* and also EA(x) EA(X) () = 2(X)"(x Hence the right. side of Theorem 4~6 gives the result., Qe E, D, Theorem 4,6 can be rearded as a generalization of the Moore result [1956] to probabilistic machines with arbitrary rather than binary output alphabets. Note that P.oore' s bound is 2n-l since he considers the initial output as part of the experiment, We consider the init'ial outputs when considering strings of length 1 since the symbol A has identity symbol

65 matrix, The?ole of thre zero outpiut symrbol in,Theerem 4u,6'.: a s i,i:'fi.cant eie>:!rtu-e <om.....r.s r deten:inti. sti c r::uls In order to get p 0 in Corollar? 4,,6 weC usC a to,-vliuseld outlu>t seet {1,,2} ra.thler than,0,1} w'ith the implcit assuiptiorn tha t suclx reo Cdiih o~ ou, 0 t symbols c.flanot affect indist. ngui shability between deterministic machines, Without the recodi. nrg p, 1 and the boutnd is st ill the Moore bound~

CHAPTER 5 ON A FINITE COMPLETE SET OF INVARIANTS FOR TAPE EQUIVALENCE =T 5, 0 INTRODUCTION Rabin [1964] and Paz [1.9641 have shown that some probabilistic automata can accept nonregular sets, Consequently an experiment unbounded in length may be needed to determine whether two arbitrary probabilistic sequential machines accept the same set of tapes, Only for certain classes of machines do the results of a bounded experiment provide a finite complete set of invariants for T') In this chapter will be found finite experiments for determining whether ET holds and hence a finite complete set of invariants for two special ciasses of machines,, Part I deals with machines with isolated cutpointso Rabin has proven that an isolated cutpoint machine accepts the same set of tapes as some finite deterministic machine, This property makes such machines very special and in Part II we seek a more generally defined class of machines Part I contains a sufficient condition for the existence of a finite bound on experiments for deciding whther T holdso A class of probabilistic machines will be presented which have a larger bound on experimentation than isolated cutpoint machines,, PART I, ISOLATED CUTPOINT MACHINES Definition 5,1n For a probabilistic sequential machine A, a cutpoint X is isolated if there exists a 6 > 00 called the separation such that IEA(X) o= ~'! z 6 for all x in Z* The concept of isolated cutpoint was developed by Rabin [1964) to resolve the problem of being able to decide statistically whether a machine 6f,

67 with random switching accepts a given tapes The number of sample runs necessary to determine whether a tape x is accepted (for any fixed probability of being correct) depends on the difference between the expectation of the tape EA(x) and the cutpoint X.- But the expectations EA(x) are not known beforehand, However the assumption of a uniform bound for the difference between the expectation of any tape and the cutpoint provides a bound on the number of trial runs dependent only on the probability that the decision be correct, No finite characterization of isolated cutpoint machines has appeared in the literature, However it is extremely easy to construct examples of such machines, 5.1 THE RABIN REDUCTION THEOREM Rabin proved [1964] that any set of tapes accepted by an n-state probabilistic automaton with isolated cutpoint X and separation 6 > 0 also can be accepted by some deterministic machine D with no more than nD states where ( 1 n-l nD < (1 ~+ )n1 Paz [1964] sharpened the bound to n < (1 + )nA similar situation holds for probabilistic sequential machines, In order to prepare results for the next sections the bound will be established using two theorems, The proof of the first theorem is exactly analogous to a portion of the proof of the Rabin reduction theorem, (Theorem 39 Rabin [1964]) and the improvement made by Paz [1964]1 Theorem 5o1 Let A be a probabilistic sequential machine with n states., If

68 E:-: {el'. - y.1,,,. k} is zany st t of dh.stributions of A such that there exist strings z. so that i!eJA (z.i)F A(zij)FI s for all i $ j; i, i s 1.,.. k then d n-1 k -1 d - 1 where d = ax(Fi) - min(F ) Proof~. Let c- and e' be in E,. Write:r1 i. A(zij)F = hence ii! ~|(eJ-e )Azi j) eI)r (t i)rl + j-en)rn (1) Follo',i ng Pa- [19641 ], we rewrite the sum as two parts: <,:'-,)'rk. (ek)ek)rk + j (-ek )r k k k;- kk ek where the sets of i'.dices are Vht {k ~ (ej,,) O k'i- < 0i W'ccs, te tke () -;S J J l r, / (ek.c: ) *. (eke ) = ek - ek KS~~K*' kr~k l k=l But since ej and c- are stochastic'/ j. k (2) /ek-,K) k~t (ek-ek. (2)

69 Since all terms on the leoft are positive and all on the right are negativce. k`K+ kerK From which we get le k l k >I l k k k=l ktK+ Using (1) and the above rewriting we get 6a!/_; j (ee ),rk + Z (e-ek)rk keK+ k k k cKkLetting i max(F ), F min(F ) max z i mill since A(z:i ) is a stochastic matrix F >r > F. max k min >, (eek1)F + >, (e ek)Fminl kK+k max kcK kk m using (2) ~ IL (e}-ek)(F IF ) i k | (ek) e f max min keK+ k k max min - cK' k kl (max min so we get (excluding the trivial case where FPma Fmin) kcK9 k -ekl max mmn k;-"_K max mi By (3) ie-e-I k6- > (4) + ek=l max min Froen this elint on the proof of Rabin [1964] wvill be paraphrased Note that the "6" used in (1) is the same as "26" for Rabin so our results are superficially different~ Using an argunment invojv ng volumes of sets

70 in n-dimensional space,, a bound on k will be deduced, Let d = F min max min Define the set:. i - = l1: k by - i (e1, en e., j n; j=n (e-e:) = Eacih ao is the set, translIaed by e ~ i l 1, o0, k where: - 93o,a e J 0~ < e~ j = 19 o n ej = nn d The set o is an n-l dimensional simplex contained in the hyperplane xI + +, xn = d~ The n-l dimensional volume, V1 (a) of a is dn nn1(o) = c(T)n-l where c is a constant with respect to 6, i nd Since the e C i = l~, ioI k are all stochastic, we get Hlence each;. c-' T where T {(ell ~ en") I le" = 1 + d0 < e' i- 19,uc n} 1 9 nd We show that the oi have no interior points in common. A point e = (el9... e n)eOi is an interior point of ai (in the topology of the hyperplane xl + + xn = 1 + ) if and only if i 0 < e -= e p 1p ooo n P P Suppose o. and a. have a point e in common: 1. 3 0 < e p e%. je e e p = 1, n P 0 P p enp r p ep p 1 - ~ ~ n

71p 1p 6 ] p'r.~. which cont radicts eqtatoicn (i-..Icce (J,:knd o,e. (i. # j) have no intcrionr points in cor~;:mon v (+) * J'nA (r) lien ce c n-I 6 n ck(dnl c(1 + )n So k (1 + d.nl Q, E, DF, We are now ready to prove the adaptation of the Rabin reduction theorem to probabilistic sequential machines.3 Theorem 5~2 (Reduction theorem for probabilistic sequential machines) Let A be an n state probabilistic machine having isolated cutpoint X with separation 6, Then there exists a deterministic machine D with nD states such that T(A,,A) = T(D) and d nn i<D (1 + F) where d = (max(Fi) - min(Fi)) Proof. We define RT. the congruence relation induced by T(A,X) by: x RT y if xz c T(A,X) -,-> yz e T(AX) Vz E * Let xi Xr. x be a set of distinct representatives of the classes of RT, Since xi Ij. xj i $ j (wlhere I is the negation of the relation RT) there exists z E Z* (W..LC.,) such that

72 IA(xi )F X, IA(x jz)F < A Becaus); is a1 ilsolated cutpoint of A IA(xiz)F t.> A + 6, IA(xjz)F < X 6 Silr e.A a mact, ine IA(xi)A(z)F k> x 6o IA(xj)A(z)F < X -- 6 hence for each pair of classes i and j there is a z so that (IA(xi) - IA(xj))A(z)Fj | 26 By Theorem 5,1, there are at most k distinct distributions IA(xi) d.I, n: I whe-e k <, (1, r1,; o If two representatives of classes of RT have the same distribution it follows immediately that the representatives d n-l are in the same RT class, Consequently there are at most (1 + m,;) c^asses of RTr Since, has finite rank and is almost by definition a right congruence relations. Th;-orem 2 of Rabin & Scott [1959] applies and there exist.s a finite deterministic machine I) such that rT(t)) ='r(A, A) Q, Eo D, 5,.2 A FTTT1rP:i SET, ()OF INVAPTANTS:OR -(O)lT oR PROBABILISTIC SEQUIENTIAL MACHINES HAVING ISOI( ATEL) CUTPOINTS. It is now possible to vwrite down a bound on the length of strings required for deteramining whether T holds between two probabilistic sequential mazhines having isolated cutpointso Theorem 5, 3 Let A and A^' be probabilistic sequentiali machines having n and n' states, isolated cutpoints A and AX, and separations 6 and 6 respectively, then A T ^A0 i.eo,

73 T(A:,A) = T(A'A) if and only if x e T(A,X) - x e T(A',X) VX ~ Z* Q: g(X) < p where d' n-l 11 n.~ p < (1 + d)n — (1 d 2 -)n with d = ImaxF - minFI 2- 2r i i i i and d' = ImaxF! - minF I i 1 i 1 Proof: By Theorem 5,2 there are deterministic machines D and D' such that T(A,X) = T(D) T(A',X') = T(D') and d n-l nD < (1 + — ) d' n'-1 D, < (1 + 26,) By Theorem 10 of Rabin and Scott [1959], two deterministic automata are inequivalent by -T if and only if there is a tape x of length less than the product of the number of internal states of the two automata accepted by one machine but not by the other, Hen ce T(D) = T(D') iff for all x in ~* such that Qg(x)! nT) n), x c T(D) -x s T(D') Substitution of T(AX) and T(A',XA) into the above gives the theorem, Q, EL L).

PAPT I I, A CLASS OPF rMACHINES (CONATAIINCG ISOLATED CUTPOINT MIAACHINES 5~3 THE STABILITY CONGRUENCE RELATION Definition 5,2: An input string x is stable for a pair of machines A and Al if x c T(A,X) c- x c T(A'PX) Definition 3S 3, The'l stable set for A and AU, written S(A,A~): S(AA) = {x o x:is a stable tape for A and A'} Note that S(;\A) ): T(A,A "T(Al TA); [ * T(A,X)] Q [Z* T(A,X)]o Lemma 5,I S(AQ5A) A) A = A' Proofr Suppose S(A,A9) Z*o Then for all x e Z* x e T(ADX) - x T(AaX) which is just the definition of A AO If A -T Al, for all x cE Z* x = T(AX) —, x E T(A',X) which implies x E S(A,A') Definition 5o4. The stabli raht con ruence relation R (A AQ): S x R (A,AU) y iff z ~ *~ xz ~ S(SA,A')<- yz ~ S(AA') We remark that R (A,A") is the right congruence relation induced by S(AAU)o If it is clear in context, R (A,A") will be abbreviated to R s R has only one class if A -T A., since by Lemma 5,l the stable set consists of all possible finite input sequences and is hence zlosed under extensions. HIowever, if A =T Al does not hold, Rs may have several congruence classes which represent, intuitively, distinct ways that the machines A and A' can be unstableo 5,4 A SUFFPICIENT CONDITION FOR A rINITE COMIPLETE SET OF INVARIANTS: OR =T Theorem 5.4 (Sufficient condition for a finite experiment for deciding whether -T holds,) If the rank of RS(AA9) = r, finite, then A -T AV g- [Vx ~ Z*~ Lgo(x) < r; x C S(A,A")] Proof~ We first show that R refines the set Z* S(A,A'), Since 5 A E Z* we set z = A which gives 74

75 x Rs y — [x S(A;A ) y c S(A,A9)] Logical equivalence gives x R y _. [x b S(AA) A,-' —'y ~S(AAt)] which means that R refines Z* - S(AA"), ieo, x Rs y =. [x ~ z' - S(A A') A- y Z* D SAA' )] Using Theorem 2 of Rabin and Scott [1959], E*/Rs = D is a finite deterministic machine such that T(D), the set of tapes accepted by D, is just the set Z - S(A,.A') i e, D = Rs[A], {R [x]A}J,, s s~xi > lWhhere Rs [A] is the initial state of D R [x] is the state set of D, #{R [x]} - Rank Rs r M1 is the transition function of D: I(R s[x],ca) = Rs[xa] J is the set of final states of D: = {Rs[x]: x c ~* - S(A,AQ)} Note that x T (D)) -c- x e( S(A, ) (,) Theorem 7 of Rabin and Scott [1959] tells us that T(I)) is empty iff D rejects all strings whose lengths are at most the number of states of D, Symbolically we have: T(D) = * <-[ Vx c Z* d Zg(x) < r; x t T(D)] Using (*) we obtain T(D) = = [Vx Z* o Zg (x) r; x E S(AA')] (**) But T(D) - A* - S(AA') -' ~Z* (AA')= Hence by (**) * = S(A,AI) =-,[Vx e V*o gg (x) < r, x e S(A,A')] From Lemma 5,1 and the above equivalence A = AI = —['Vx E Z*: Qga (x) < r; x c S(AA')]o Q A Ea D, The converse to Theorem 5~4 does not holdo Even though the rank of Rs is infinite, there might be a finite string which shows that A -T A' does not hold, Let us now seek properties of pairs of machines which cause the rank of Rs(AAB) to be finiteA 5 o5 EXPERIMENTS WITH M1ACHIINES Let us formalize what is usually meant by experiments with

76 a machine to make later definitions more precise, Definition S S: An experiment on the r machines A (1) A(r) using the input set T = {x1,. xk, X, } written T(A(), A (r) X) is a table: X1 X2 o N ~ Xk o.* AG) b1 1 1 1 2 k A(2) b2 b2 bk T 3. 2 k A (r) b b, b 1 2 k where bs = 1 if I) A (x )F(S) > X x T = 0 if I(S)A(S) s< if(x.)F < X s = 1 o, r An experiment is finite if T is finite, Definition 5 6~ A simultaneous experiment on the machines A(1), A(r) using the input sets of tapes T1, o, Tm, written r x T xr. (A(~1) A(r) T1 X T2 x a x T0(A ). A( ) is a table of m experiments T(A1) 0 A (r) X) In what follows we wil not make use of the concept of simultaneous In what follows we will not make use of the concept of simultaneous experiment, but have included it here for the sake of generality, In particular, we will consider an experiment with two tapes xz and yz on the two machines A and A9 which will be written according to the notation of Definition 5o5 as {xzyz}(AA99X)0

77 5.6 THE MINORITY PROPERTY Definition 5~,7 A machine A is in the minority for an experiment {xz,yz}(A,A',A) if the experiment yields xz yz xz yz A-o 0 A: 1 1 or A'o 0 0 Ag: 0 0 or else a complement event of the above occurs~ xz yz xz yz A~ 1 0 A: 0 1 or AR 1 1 At: 1 1 5,o7 THE BALANCE PROPERTY A property of two machines will be defined which leads to a finite test for =T Definition So8: Machines A and A' have the balance property if for all x e Z* and those y C Z* such that vx RS(A,A') y either: (i) There are strings z and z' such that A is in the minority for the experiment {xz,yz}(A,A',X) and A' is in the minority for the experiment {xzl yz'}(A,A' X). or else (ii) There is an experiment {xz2,yz2 }(AA' ) which yields xz2 yz2 A: 0 1 Ag: 1 0 or the complement xz2 YZ2 A: 1 0 A': 0 1

7 8 Not, that. Kioditi o (' i i spec..fically rules out the case when only one machine ever is in the minority; ibe,, there are at least two distinct zVs such that xz C S(AA',),..; yz E S(AA9)>, Condition (ii) allows just one Zk to cause A and Ai to accept one continuation and reject the other for both x and y, Example 5ol' Machines A and Al which have the balance property, Suppose we consider a mod 2 counter and a mod 3 counter connected together by means of a 3 state buffer, Two of the states of the buffer serve as a channel for carrying information between the counters. However9 one of the states in the buffer is a sink, Each time the buffer is entered or left there is probability c; that the machine will enter the sink state which has output equal to the cutpoint, Consequently we view the entry to *he sink as a breakdown of the machine which causes it to accept all subsequent tapes, In general the cutpoints for the machines will not be isolated, Figure 5t1 shows the machines A and Al which are nearly isomorphic as far as state transitions are concerned, The 0 symbol of the counters and most other loops have been deleted from the diagram, A and A' a.re each started in an initial state rather than a distri= bution over.- the states, so that input strings cause distributions with only two positive components One component will be the probability of being in the sink or absorbing state and the other (initially much larger) will be the probability of being in some other state, Tape acceptance depends only on this:lcater state since we will assume FPsink) = 0 and the cutpoint X = 0,, Hence in this case it is sufficient to consider all pairs of s': tcs rather than all pairs of accessible distributib - in an experiment to find out whether the balance property holds, Witi a minor change in the notation for an experimenta to point out this simpl:fi. Nations we summarize a sequence of experiments in Table 51

79 (b) A':1 DU 1 111 I_ 2 1I (l1C) () outputs greater than c- equal to the cutpoint; states on the I:; Figure Sl" Transition diagrams',:-:~ A?~:..~., a and Al which ha4Vc the balance property, St,.cs left oi: the center line have outputs greater than cr equal to the cutpoint; states on the right have outputs lss than the cutpointt,ost. loons are omitted from the aboJe graphs to simplify them; hence if~ the graph leaves the transition of a state undefined for a particular symbol the reader may assume a loop has been deleted. The symbols are read as~ Ul-uwT D —down 1 —one, Note that "up" ihas a different orientation for the two machines:

80 Table 5ol A sequence of experiments which shows that the balance property holds for A and A,, The strings x and y of the definition of the balance propcrty are replaced by states (which are the responses to the inuts x and y other than the sink state), Accepted Strings' Machine states Z _ _ _ A I DDI 1 1 U U (1) 1 ] 1 (2) 0 1 E 1~ ") -1 __ _ -X __'=.0. A, (1 ) i 1' O (2w) 1 t 1 A 1DD 1 1U IJ 1UU (1) 1 1 (2) 0. 1 2 A (1') 1 i 1 (3") 1 0 o A D D 1 U 1 I U (1) 1 O 1 (2) 0 1 E 0 A) (20) 1' O (3) 1 1 A (3) using substrings of using same strings those for (1) and (2) as for (1) and (2) or with the initial (4) i symbol removed A' Any accepted states Rejected strings; Z A 1 LU I D 1 U U (6 o i (7) 1 ] Au (7v) 0 1 ~~~~~(8~ w~~~) (' 0 A U D D 1 1 UU (6) 1 1 (8) 0 1 A6 (7') 0 0 0 (8~) 0 1 M.L~Ic.I~~~.ldI~m ~ ze~~~U I*l-EcnwrCrr~~-Ci B=) Z -CC.W?~~Bj5. wC~~~l ts

81 5,8 THE CONCEPT OF STABILITY NEAR THE CUTPOINT As we have seen above, it is a straightforward task to find a finite bound on the length of experiments needed to determine whether two machines are equivalent by -T when the machines both have isolated cutpoints, There is an apparently large class of machines which may have non-isolated cutpoints for which a bound can be found, Although the cutpoints are not necessarily isolated for this class of machines, special properties will be required of them. Definition 5,9: Machines A and A' are stable when closer than 6 to the cut oint X if for any x in X*: Either EA(x) or EA, (x) in the open interval (X-66,X+6) x c T(A,X) x c T(A',X). When Definition 5.9 holds, we sometimes will just say that "A and Al are stable near the cutpoint" with the 6 understood in context, Clearly isolated cutpoint machines are special cases of those machines stable close to the cutpoint. But what properties of isolated cutpoint machines would we expect machines stable close to the cutpoint to have? In general it is not true that machines stable near the cutpoint are tape equivalent to deterministic machines, nor even that there is a finite set of invariants for =T' However, for the class of machines stable near the cutpoint and with the balance property, there is a bound on experimentation for ET analogous to the isolated cutpoint case. ExamIle 5,2: A class of machines stable near the cutpoint but with non-isolated cutpoints. We shall assume that the state descriptions of machines M and MI can be decomposed into two parts B1 and B2 and BI and B' respectively

82 0 B while in 2 B (PR) 0 M(R) = 0 SO 1. 0 0 o 0o The analogous situation holds for MIO Figure 5o2 shows the state decomposition' / / Min i all B ~ Z M~ c 1 2 SO.i Sl S' in 0 2 Figure 5,2 Machines of Example 5,20 Special properties will be imposed on the submachines so that M1 and Mi~ will be stable when near the cutpointo Any state s of a probabilis tic sequential machine has an output weight PsO However0 this fact

83 will be ignored temporarily in order to define a particular property of M and MIt, (i) Let Sin and Sin be the initial states of the machines under consideration, (ii) Let B1 and BI be deterministic~ (iii) The only way state transitions can occur from B1 to B2 is via state SO to S2o Likewise, the only way state transitions can occur from B` to B2 is from SI to S2I We will require B2 and B2 to be started simultaneously in what follows, To obtain simultaneous arrivals at S2 and SI29 we ignore the outputs and regard B1 and B; as Rabin=Scott automata with final states SO and SO respectively: Then we need only T(B1) = T(Bj) Note that we are not requiring that T(B19A) = T(BifX) because we have not used the normal outputst (iv) The submachines B2 and B2 satisfy EB (x) = Ec C(X) for all x not containing an R That is, B; B'2 for the alnThalet {R (R. Furthermore, B2 and B 2 are selected so that cutpoint X is not isolated, (v) Lastly, the entries of F,I and FM1 have the following property: There is a 6 > 0 so that Fo > A + 6 or Fi < X 6 where Si is in B1 F} >;A + 6 or F? < A 6 where S? is in BO 1a l 1 A - 6! Fi < - 6 all Si in B:

84 6 8 F. < X - 6 all S' in Bf 1 1 2 Whenever the state activity of IH is in B1 the state activity of M' is in BK The expectations for the submachines B1 and B' are just the weights attached to the states since these submachines are deter. ministic; hence E (X) i (Xs X+6) EMle(X) (X-6X+6) The only other possible mode of operation is for the state activity of H to be in B2 and the state activity of M' to be in B2 for which we have: (X-6) < E. (X) = EM (x) (X+6) Hence M and W19 are stable when close to the non-isolated cutpoint X, More elaborate examples can be constructed by weakening the acove restrictions, In particular (ii) and (iii) can be modified easily, 5~9 A FINITE TEST FOR =T FOR MIACHINES STABLE NEAR THE CUTPOINT AND WHIICH HAVE THlE BALANCE PROPERTY Theorem 5,5 Let A and Al be probabilistic sequential machines having nA and nA, states respectivelye If A and Al are stable near the cutpoint X (when closer than 6) and have the balance property then A -T A' if Vx C (Z)P [x c T(AX)= x e T(A' X)] where d nA-1 d' nA, 1 (1 +. ~ (1 + and d = (max(Fi) - min(Fi)) d) an i i d -(max(F') - min(F')) i 1 Proof: Let us pick a set of representatives xl D oo xr, o, from

85 the congruence classes of the stability congruence relation RS, Then *x. R x. for i f j i = 1, 2, d I. S 3 Definition 5o10o Given a representative x. of a congruence class of Rs(A,A'), the pair of distributions (IA(xi),I'A'(xi)) will be called a canonical distribution pair Of course, there is at least one canonical distribution pair for each representative of a class of Rso Subject to the hypothesis of the theorem, it will be shown that there are only a finite number of canonical pairs0 The definition of the balance property (Definition 5,8) leads us to consider two cases, Case (i): Suppose part (i) of Definition 5,8 is satisfied, Then for any representatives of the classes of Rs, xi and x;, there are strings z and z" such that A is in the minority for the experiment {xiz,xjz}(A,A?,X) and Al is in the minority for the experiment {xiz',xjz'}(AA',X), Without loss of generality, assume that xiz g S(A,A') in the experiment in which A is in the minority~ The expectations EA(Xiz) and EA, (XiZ) must lie on opposite sides of the interval (X-6,X+6) since A and A' are stable near the cutpoint. Furthermore, EA(x.z) must lie on the opposite side of X from EA(xiz) because A is in the minorityo The value of EA(xjz) may approach X but the value of EA(xiz) must be no closer than 6 to X. Hence IEA(XiZ) - EA(Xij ) 6 which gives by definition IIoA(xiz)oF - IoA(xjz)oFI Z 6 Using the property of machines jijA(xi)oA(z)oF - IoA(xj).A(z)~FI > 6 (5)

86 a similar argument gives~ IIHAR(xi)A9(z9)F? - IA' (xj)A'(z')FT'I 6 (6) Case (ii): Without loss of generality, assume that IA(xiz2) AXo Then we must have by definition IA(xjz2)F < A, IA'I(xiz2)FI < A and I'A (xjz2)Fp Z> So Since A and A' are stable near the cutpoint, all expectations above are outside the interval (X-6,A+6), Hence IIA(xiz2)F - IA(xjz2)FI 26 > 6 IIAu(xiz2)9F - I'A'(xjz2)FPI F > 26 > 6 (6) Consequently. either (5) and (6) hold or (5)9 and (6)" hold, Letting el = IA(xi) ej = IA(xj) and z..ij = z or z we observe that Theorem 5.1 applies and the maximal number of distinct distributions IA(xi) of A is nai1 dA ~ (1 + k A kA and letting ei = I'A (xi) e = I A (xj) and zij =z or z 2 Using Theorem 5,1 we get nA Consequently the number of distinct canonical distribution pairs is finite and equal to dAdA, o We note that dAd, d < k Ak, To complete the proof it will be shown that xi and x. distinct representatives implies that (IA(xi)9 IPA'(xi))

87 and (IA(xji),I'A (x)) are distinct pairs of distributionso Assume to the contrary that IA(xi) = IA(xj) and I'A'(xi) = IA'(xj)o Multiplying the former equality by A(z)F and the latter by A'(z)F1 for an arbitrary z gives: IoA(xi)0A(z)F = IA(j.)oA(z)F IA' (xi) 0A' (z)F' = I'A' (xj) oA' (z)F' Since A and A' are machines we obtain IA(xiz)F = IA(xjz)F I'A'(xiz)F' = I'A'(xjz)F' Suppose IA(xiz)F > I'A'(xiz)F, > Substituting equals for equals we get IA(xjz)3F > X IA'(xjz)F' > X Which means we have shown that [IA(xiz)F > X = =I'A'(xiz)F3 > X] = [IA(xjz)F > X=I'A' (xjz)F' > X] Using Definition 5,3 we get Xiz ~ S(A,A"- =- x.z e S(A,A') (*) Reversing the roles of xi and x. in the previous argument gives the converse of (*) so we have for an arbitrary z x z ~ S(A,A')=1) x z e S(A,Al) Hence by Definition 5~4 Xi Rs(A,A')xj which is a contradiction. Consequently there are dA.dA, classes of R (A,A'), Theorem 5.4 applies: A =T A' == [Vx (Z)dA dA' x e S(A,A')] But dA dA < AkA k and transitivity of implication yields the theorem~ QG Eo Do

88 5 L0 CONCLUISIONS If the properties of being stable near the cutpoint and having the balance rroperty imlply that a machine has an isolated cutpoint, then the bound of Theorem 5.5 is not as good as the bound of Theorem 5,3, If this were the case, instead of a finite set of iT:variants for for certain probabilistic machines, we would have obtained a very peculiar behavioral characterization of finite deterministic machines, However the great richness of the class of probabilistic machines leads the author to believe that there probably are many such probabilistic machines whose cutpoints are non-isolated. The lack of an adequate characterization of isolated cutpoint machines adds to the difficulty of proving whether the class of machines of Part I is property contained in the class of Part II,

CHAPTER 6 STABI3ITY PROBLEMS FOR I AND E STA -N -E 6.1 ENUMERATION OF TIE BEHIAVIORAL EQIUIVALENCE CLASS OF A MACHINE In general a machine may be behaviorally equivalent to a machine with fewer states," with the sanme number of statest or with more states, Some machines behaviorally equivalent to machines with fewer states will be discussed in Chapter 8,1 The systematic expansion of a machine into behaviorally equivalent machines with more states does not. appear in the literature of automata theory, Although the expansions of a machine are relevant to certain problems such as the decomposition of machines9 they will not be discussed here,. This chapter and Chapter 7 will deal with the remaining case, machines behaviorally equivalent to a given machine and having the same number of states as the given machine, In order to a-void large differences between machines masked by the initial distributions, only those eqluivalences defined for all initial states will be considered, To justify this strong restriction consider digital computers to be probabilistic machines, An automaton state model may compress all the intricacies of a stored program int-z a single initial state, eCgE,, different stored programs correspond to different initial states. Using such a model and defining behavioral equivalence for two machines from particular initial states makes heterogeneous kinds of computers behaviorally equivalents For exanmpTe, a computer could be in a behavior class with all other computers for which the.re are stored programs to do some particular computation. But behavioral equivalence from any initial states requires that the behavior class conlains ony those machines which have sets of stored peograms making them inf erchaangeab e0

90 6c2 TIHE GENERAL STABILITY PROBLEM Our goal in this section is to study changes in the switching of a machine which do not change the behavior class of the machine. Therefore the output vector F will be kept fixed, Usually we will think of A as a satisfactory machine and Al as the machine which A becomes after some permanent perturbation in switching, Of interest are those A' which are stable changes, ie,, behaviorally equivalent to A, Definition 6ol Al is a =-stable Perturbation of A if (i) A and Al have the same number of states,o (ii) F F', ie. A and Al have identical output vectors, (iii) A - Al for all I C S (where - is a machine equivalence). Rabin [1964] formulated a stability problem for =To However, the Rabin stability problem requires equivalence from fixed initial states rather than all states, Very few results concerning this problem have appeared in the literature since it was published more than two years ago, Progress has been made by Paz [1964] for certain special cases, Some of the results of Chapter 7 apply to the Rabin problemo Some of the general properties of the transformations T which map the machine A into a stable perturbation Al will be studied, In particular, those transformations T which map the monoid <A(Z*)o~> into the monoid (AI(E*).9> such that A = Al for some machine equivalence - will be called stabilit transformations When a stability transformation T is applied to symbol matrices, we obtain by definition~ T(A(i)) = A'(i) i C Z Let x = i 1 oo i i E j = 1, 2, oo, p 1 p

91 But A'(x) j A (il A" (i2), A (ip) since Al is a machine, I-ence T must be a monoid homorphism preserving the monoid operation matrix multiplicationt TCA(il:. ip) - T(A(il),Ai )) T (A (iT))T (A ri2))( t O T (A(in)) Theorem 6.1 The most general form of stability transformation is T'(A(x)) - A(xl + E x ~ s* whlere E is an n x n "error matrix" with (E) 0 and X, IEk=l x ik I A (X)ik + (Ex)ik 0 Proof~ Given a machine A and any stable perturbation An M T(A)= start each machine -n the states S1 (1 O0, 0) S = C0. O0 1) 1 (n Let Sl: A(x) - SI(TA(x)) ~(ellx n) x x S A(x) - S (TA(x)) -(e l e ) n n'n11 n " C. nnl Any initial state I is a linear combination o. the above basis state vectors, Let us call EB i- ex I which is uniquely determined= The error pattern for distributions over the states from any initial distribution I can be obtained by I(-E ) = IA(x) - IT(A(x)), Both T(A(x)) and A(x) are stoc'hastic, so summing over the rows r [T(A(x))]ij = [A(x)]ij + [E ij j~~L~ j-ij 1.e I-l~2:~X 1 1+'~'.

92 n n i-l j Since 1 > [A'(x)]iJ = A(x)ij + ex > ( the other part is established. Q. Ee D, Computation of the error matrix for a string x from the error matrices of the symbols of x is given by the next result. Theorem 6o2 Let T. A —->A be a stability transformation and let x = il1ik i* E; j = 1,.., k k E = n (A(i.) + E. ) A(x) x j J j=l j Proof: E = A (x) A(x) = A^(ilo oik) - A(x) = Av(il)o. oA(ik) - A(x) k = n (A(i.) + E ) A(x) j=l J j Corollary 6,2 Let T: A —->A be a stability transformationo Then the error matrix for xa is specified by the error matrices for x and a by E = ExA(a) + A(x)Ea + E Ea x C.* a E Z The previous two theorems taken together tell us that the most general form of stability transformation is uniquely determined by the error matrices of the symbols, Error matrices for some of the equivalences defined in previous chapters will be characterized0

93 6, 3 ENUMERATION OF STABLE MIAC;HINES FORP =I AND N' -NAN I.ot us first consider indistinguishability rI As one might expect, those machines which are indistinguishable from every initial state have closely related symbol matrices. Theorem 6,3 A' is a =-Istable perturbation of A iff there exists a subspace 11 VC Kern, such that (i) A' (y/a) = A(y/a) + It(y/) where (H y/))i V, i = 1 2 6 n (ii) V A(y/a)C V for all y a Y, a c Proof:O (necessity) i,/( S A(y/) = S, A (y/a) y c Y, a E C i = 1, 2 n lten ce A(y/a) ( ) = A (y/a) y Ec Y, a z Z The solution of the above equation is a particular solution plus a kernelo A (y/a) = A(y/a) + Hi(y/) where H(y/a) 1) =( For strings of length 2 we have A(y1y2/ala2) = A( 2/L2) / y1y2 c 1 2 Hence A9 (y1Y2/ala2) = A(yY2/O1y2) l2+ 12

94 where ~12 i I~o/ 11 (:) But \'iY 1y 2ji 1r 2) i [Ay (Y/)+ (t(yl/l)][ A(y2/02) t+(y2/o2)1 + I1(/) ~A(yl/ + 11' (Y1 / 1 2 (Y) /a 1) 2 where H' A(Yl/Ol)H/ + H yl/ HI /a 2 702) (Y1/c1) (y2/o2) Hence 112 ) (yl/l)A(Y2/02) I' H! (y /a g(Y21C-93 i eY1Y2 E Y a1" 2 1? 0 H H(/ /o )A(Y2/~'2)' K) Let y -,y1r y Y Y be any sequence o0i: outputs and x x r A xy/x) = I [A (y1 /x) + Hi /1) + -A i e v Y il Aiy/x). A~l/xel)H(y/x)l/+lHA(/2/2)AeYl2 Y~4 r/12X "X

95 - A(y/x) + H' where (H9)ic Kernj) i Hence A'(y/x)( = A(y/x)(if By substituting A(x) for A(y/x) and Kern,(Fr), r = 1,.., N, for Kern ( in the proof of Theorem 6.3, a proof (similar to the proof of Theorem 2Q5) of an analogous result is obtained for -N~ Table 6 1 summarizes these resultsd Stability Class Condition on Error Matrices I A'(y/) = A(y/) + H(y/) V II i = 1, 2, oo,p n, ycY, 5cE}> A"(y/ CT) A~/ (y/) i V oA(y/O) V C Kern.. -N A'(o) = A(o) + H V D { () i = 1 2,., n; o N V A(a) C V C Q)Kern( (Pr) r=l E'() = A(a) + 1i V O <{(H): i = 1, 2,., n; V A(a) c V c Kern.(P) Table 6.1. Characterization of the error matrices for =Is -N= and =E We take the following example from the literature of decomposition

96 of probabilistic machines5. Bacon [1964] has a notion of decomposition whlich i.s rejlate( to the eror mnatrIices of-I The following example is due to Bacon but is expressed in our notation, J = <4, B {Ul' u2}~ M(Ul), M(u2), Ff1 0M > whiere FF' F3 F3 mI(ul) = / 2 2 3 3 m(u = 3 0,7 0 0 c6 (0 e3 0,7,4 61 A4 41 \08 o12 032 r48 >225 5 225:25 4,25 06 14 624;56 Hence by definition Y = {F1,F3} M(F1/u1) = | 2 0 0\ 0M(F3/ul) = 0. 3,3 O,4 o 0) 0 0 0,6.4 1 0 0 4 61 >25 r25 0 0 \O 0 625 625 M(F1/u1) / 3 0 0 0 \ m(F3/u2) = /O 0 7 0 \,0 3 0 0 10 0 0 7 0o8. 12 0 0! 0 32 A48 \, 06,14 0 0/ \ 0 0 24 56 The subspace <{(1, 1, 0( 0), (0, 0, 1, 1)} > = V is invariant under all M(FP/uj) i 1, 2; j = 1, 2, since: (1 41 0, 0 O) l1(Pl/u1) =.2 (19 -1, 0, 0) (0, 0 1, ) M(F1/ul) = 15(1, -1, 0, 0) (1, 1. 0, o) M,(F3/ul) =,3 (o, 0, 1, -1) o(0 O, 1, -) (F3/u1 ) -= 15(o, 0, 1 -1) (1, -1, O, O) M(F1/u2) =.3 (1, -1, O, 0) (0, O 9 1, 1) M(F1/U2):,15(1, 41 b O 0) 1, 0, O) M(F3/u2) = 67 (0, 0,: 1,.1) (0, O l, 1 1) M(F3/2) = o08(0o 0, 1, -1)

97 Furthermore we note that V C Kern, (F) (1 l, 0, O) F\ F1 I;= 0 F3 (D, 09 1I 1) IF!. F\ 3 There seems to be a close relationship between Bacon's definition of lumpability of a matrix and ours of error matrices of a matrix, We shall take up this matter again in Chapter 10, 6~4 INVARIANT ERROR MATRICES Table 6,1 shows that the rovws of each error matrix come from a particular invariant subspace. In the next section, some general results will be proved about such error matrices, Hence we make the following definition in order to discuss error matrices without considering the kernel in which the invariant subspace is contained, Definition 6~2: H is an invariant error matrix for symbol a of machine A if there is a stochastic matrix A' (a) such that (i) A'(a) = A(a) + Ha (ii) There is a subspace V such that V A(a) C V Vo Ce and (fHi)i c V i = 1, 2, oo, n If there is some symbol a' such that H, 0 then there is a machine Al with symbol matrices {A9(a) ~a C C } such that A' A A. The simplest invariant subspaces are those of dimension one0 Mlost of tae results oaf tehis section will deal vith such ione dimensional int variant subspaces, Recall that a vector v is an cigenvector of matrix

98 A with eigenvalue A if voA = Xvo It is clear that a one dimensional subspace invariant under a linear transformation is just all the multiples of an eigenvector of the transformation, Definition 6035 H is an eigenerror matrix for s a of machine A if Definition 602 holds with the subspace V of part (ii) being one dimensional. Some symbol matrices give rise to non-zero eigenerrors while other symbol matrices have only the trivial zero eigenerror. A general theory which characterizes the existence or nonexistence of nontrivial eigenerrors from elementary properties of the symbol matrices does not exist. The next few results deal with certain types of machines which help to show the scope required of a general theory, Theorem 6a4 Probabilistic sequential machine A has nontrivial eigenerror matrices for every symbol a if there is a vector v $ 0 such that (i) vA(a) = X v Vo C n (ii) V. = O (iii) There exist constants K(i,ao) not all equal to zero for any a such that 1 > A(a)ij + K i,o) vj > 0 Proof: Construct II K)v l Ktn,a) Aa)>i + (Ho') ij = A(a)ij AU(a) is a stochastic matrix because of (ii) and (iii) above~

99 Furthermore - X 1 ~ I({1~2)v (lal 2) H oA(a1) = l all a2,a1 C E Xl i K(n,a2)v ioe,s (Ha )iEV = <v> and [(H )A(al)]iE i = 1, 2, ),. n a2 a. Cr2 Consequently, by (iii) H f 0 for any a, All H a e Z are invariant error matrices and are one dimensional by construction0 Q, E. D. Theorem 6.4 provides a simple method for constructing stable perturbations of some machines, Indeed, it was this result which was used to construct most of the examples of Chapter 1 and Chapter 2, Cycles are one of the most important features of the state behavior of machine, There is a relationship for deterministic machines between cycles in the state behavior and invariant subspaces of the transition matrices, Hence machines with cycles will now be considered, The next theorem, with the aid of the following lemma, shows that the only eigen. error of a symbol which causes a deterministic cycle of all the states is the trivial zero eigenerror, On the other hand, Example 6.2 shows that symbols causing deterministic cycles which do not involve all the states may have nontrivial eigenerrors, Lemma 601~ Let P be an n x n permutation matrix of the cyclic permutation 1 of the indices of the states. The only real eigenvectors of P are v = (v1,.oo, v1) if Xp = +1 and v = (v1 v1, ea v1] -v1) if Xp = -1 and n is even, Proof: Since P is a permutation matrix, the real eigenvalues of

100 P are'1 and -4. Case (i) A P +1 v(P) = v r = 1, 20 oo, n+l which gives v. V(i) = V(i) o3 - Vni) i = 1l 2~, o0o n For each k = 1 29,o, n there is some j such that nj(i) = k hence v1 v 2 v Case (ii) Ap = -I Vpr ( ), vr r= 1 2g, o~, n+l i s, vl-v,i) v (in): vno ( i) o()n_ V- o V n = )v. If n is odd we get v. = -v 0 But if n is even v = (v,) -v1,, Va, -v1) Theorem 6,5 Let A be a machine with 3 states or more such that there is a symbol a so that A(o) is a cyclic permutation of all the states, Then A a) has no (nontrivial) eigenerror matrix., ProofQ. Suppose to the contrary t at A(ao) has eige>'raror rat.ix }t3 The rows of H must be multiples of some e:.:,. n,::r, L,.r', ieL, I. O / v ~ I d Iv In order for H to be nonzero9 we require some di 0 and v (0,o0) Since (Ho)i is invariant under each synnbol matrix, it is also invariant under A(a) EH )iA(a) = di(vA(a)) = d o\v O11.

101 That is, v is an eigenvector of A(o), But by Lemma 6.1, the only nontrivial real eigenvectors of A(a) are v = (Vld V1) if X = 1 v = (V1-V1,$,~*v10lvI) if Xa= -1 and n is even. The vector v can not serve as an eigenerror since for constant c Zcv. = cnvl A O for vl O j=l ] The vector di,v' has n/2 negative components if vl Z 0. Hence adding this nonzerco vector to row i' of A(c) causes at least n/2-1 negative entries in the corresponding row of A'(a) so that A'(O) is not stochastic, which is a contradiction~ Qo Eo Do A symbol which causes cycles whose lengths are shorter than the number of states may possess non-trivial eigenerrors, O 0 1 Let A(K) = 0 1 0 I 0 0 O 1 =, 2 = 1, X3 = -1 Let the subspace associated with 1 be U1, with -1 be U 1. U1 = {V' v = (v1,-v2+V 1)} U 1 = {v V (V1, 0,o-1)} For rows 1 and 3 there are no eigenerrors from the space UlO However, there can be eigenerrors from the space U1~

102.L > ta 1 1 p1 ~~ i3 by dcfrei: ~ son a~" z:<l'1:1.:. 2'.'.. i:[0-C 3 a. fl eisc6tll-'r-;;':f o.:-OW 3 for; < c 1. i~,~.:,.?..:I'' -...' is an ei;cnerror nattern for row', I wIvw:v. U1 - 0, 0)} at.; t- J e case in general for the spaces; associ ated.d it eigenvectors of different ei enwtvalues. tience S;; cans h-ave eicenr'ror natterns of either A'(K) - A(K) + n 0 0 1 or 0 o oi A'(K) K) - 2 1P2 2> n > 0 Netc Chat th,. fi.x:d point S, has the eigcenerror pattec-n o-' therI oi.c,'z n: o >r;a.! ue!t it i. pw:sibic'h:It th is is true in Teeral, but:ceitk h:- eneral proof nor- a counter-examnle has been found. 6,v ifs.'~. S I xIY I I "TfN'LSVOW R I" \INS WC RTR IIRS`iE TIII. BEIIAEHVIOR LF EO lIiENSTATES SupF-;s. T T.",- A' is a sta, i"ity traln.formation which maps the ito atic ne vctot v ~ A( it v. i.e., v T(A(c-)). v' T'lih'; thii machin.te A't stays i, di.sti.t.tiltint).) for input seouenw c$ o',r;-, 2,.,, macchin Ak' will tnvra l t,rough a trajectory of' distributicrns v (TA;I) ), r 1,,,,. iianc; only for certain typ;es of eCquii.l-.ce:I lc. It'W:,eCCn machines can At -nrd A' be equivalent; i.e., the bcitavior of t'le trajectory vA' (or):: = 1, 2,., must be cquivalent to that of the distribution v\ Let tls formalize this concept, C

103 Definition 6A4: A stochastic vector v is an eigenstate for input a of machine A if v A(a) = v for some a c E Definition 6.5: The machine A started in initial distribution I will be written IA>, Definition 6,6: A stability transformation T: A -k A' is eigenstate behavior preserving for behavioral equivalence if for all eigenstates v of A <va, A'> E <Va A'(ar), A'> r = 1, 2, Theorem 6 o 6 If any stability transformation T: A — A' preserves N' ie. T(A(a)) = A' (a) = A(a) + Ht a C z where H = {(H).i i = 1, 2.., n: Ta C Z} and there is a space V such that N k H C V C (Kern,(F ) k=l V*A(a) C V then T is eigenstate behavior preserving for =N Proof:va A(a) = v. v A(a)r = v0. v A(ar) By Table 60l and Theorem 6,2 vR A'(orx) r v [A(arx) + l(orx)] where.[t(arx)]' Vi V= 1, 2, 0.4, n = v A(orx) + V.11 r a a (arx)

104 multiplying by (Fji % j < k i NR gives v A (rx) (Fj) = v oA(orX)(Fj) + v (I (rx) (F)) since II C V c ('Kerno (F') k=l = v( (orx) (Fj) = (VC A(or))A(x) (Pj) v.- A()(7j) Sin A k(X) i; (x) by Table 611, we get for any j.< k = v A'l (x) (Fj) we have (v AP (oT ))A (x)() v A' (x) (P) r = 1, 2, j < k < N which gives <v A(aor), A'> v, A> Q^ E. Do Theorem e67 If c.1y -;t.-:bility transformation T ~, - A' preserves -I, i6~ T(A(y/o)) = A' (y/a) A(y/a) + H(y/) a y C Y where H = <{{(11y/ ))i:i i= 1 2, o0, n; Vo E Z, Vy E Y > and there is a space V such that IiC V C Kernj i V A(y/a) C V Then 1 is e'genstate behavior preserving for rI~ Proof: Supnnose v A(e) = v Let y range over all output sequences of length r; ioeo y C (y)r

105 v A I(yw/oax) = v [A(yw/oarx) + =H- for all w c y*, x E * and some H' a a depending on yw and arxo Summing over each side 211 v A'(yw/aorx) = r[v A(yw/arx) + v HI] yE(Y) r cy VC r a y~(Y)r yo(Y) y. va A (y/ar)A (w/x) = [v A(y/ar)A(w/x) + v H'] yc yr oF y(y)r o Noting that A(y/or) = A(a r) yc(y)r since if y = YpYi Yi c Y yp E (Y)rl then >1 A(YpYi/ar) = 1 A(yp/rai) A(Yi/a) YpYi (Y)r yp (Y)rl Y.Y = /, A(p/ a )A(a) yp (y)r=l P Continuing the process (or using a formal induction) we get the result, HIence we obtain (va A'(or))A'(w/x) = va A(or)A(w/x) + v H" where H" = 21 H~ yc(y)r - V A(a)rA(w/x) + v H" = v A(w/x) + v H" Ila t a a multiplying by 1( and noting that H is closed under addition we get 0Ar (ar))Ar (w/x)(| = v A(w/x)( ) +v V10 |( a~~~~~~~~~~~~~~~~~~

106 = vc A(w/x):) 1By dcfers. n:.t. o (.,W/ tP (v:/x)'v,,\ (or)) (w/Ax).,(- /x But the trwinsforentinl T is a stability transformation for any initial state.- which here implies an)y initial distribution. A' A P (w/x) = PA (w/x) V V So by transitivity of equality we get p.AI P(v - A~ (r)) \l (wlx) wit ch means <V A'(.'), ^A'- <v, A'> whi~ch shows tr+.-t tfhal stability transformat'ir T' Ti s eig"enst>-te tc i:; Preserx~ n, 1.,,e Eo I)o

CHAPTER 7 ON THE STABILITY PROBLEM FOR TAPE EQUIVALENCE - T 7 1 INTRODUCTION Suppose malfunctions cause permanent changes in the symbo] matrices of a machine AD producing a machine Al where AQ (a) = A(o) + Ea a s Z As in Chapter 69 E. will be called the "'error matrix" or "Ierror pattern" for the symbol a, The stability problem for =T considered here involves characterizing those Ea so that for any initial state distribu= tion9 the set of tapes accepted is unchangedo Results which are true for any initial state are certainly true for particular initial states; hence the results of this chapter imply certain facts relevant to the Rabin problem, Usually such facts will be presented as corollaries0 Definition 7o01 Al is a X=stable erturbation of A if Definition 661 holds for - and A = 90o Although we are interested in X-stable perturbations of a machine, when a result holds for unequal cutpoints, i0e,, X A X A it will be proved for the general cutpoint case. 7,2 FARKAS' LE-MBiA AND TAPE ACCEPTANCE FOR NONSINGULAR MACHINES Definition 7,2: A string x of inputs to a probabilistic snquential machine A is backward deterministic if knowing x begins at time toZg(x) and the distribution at time t9 then the distribution at time totg(x) can be deducedo The notion of backward determinism has been discussed by Burks [1]957] for the deterministi c case0 This definition is equivalent to requiring the matrices of the symbols which occur in x to be nonsingular1 107

108 If all a C E are backward determlinistic, the machine A will be called'backward deterministiic or more frequently "nonsingularl"0 Farkas' lemma is widely used in the study of convex sets and linear programming0 For backward deterministic input strings for probabilistic sequential machines; it gives a necessary and sufficient condition that they will be accepted, A nonhomogeneous form of Farkas' lemma (Thrall and Tornheim, Vector Saces and Matrices. pp 291l292, Theorems 11.4 C and D) will be used, The theorem will be quoted from Thrall and Tornheim and then restricted to our application using notation appropriate to the stability problem. "Theorem 1194 D, Let A be an m-by-n matrix, let 6 be a vector in Vc m (m-dimensional column space), let k be a scalar, and suppose that there is at least one vector' in Vn (n-dimensional row space) such that AX 6 (1) Then a vector a in Vr will satisfy the condition a' z k for all' that satisfy (1) if and only if there exists a vector y > 0 in Vn such that a = yA and y6 8 k.," To apply Farkas' lemma to probabilistic machines, the above notation will be changed and simplifications made, Restricing a to the set of n-dimensional stochastic vectors, we replace it by I, F replaces 6 while X replaces ko Only matrices A whose inverses are stochastic matrices describing the state transitions of a machine will be considered, hence A is replaced by A(x)~lo Note that A(x)-l1p > F has the solution' = A(x)F which permits simplification of the statement of the lemma. In addition, y = IA(x) is stochastic since I and A(x) are stochastic~ Farkas' lemma can now be restated about nonsingular switching matrices of probabilistic

109 machines which gives a necessary and sufficient condition for tape acceptance o Theorem 7o 1 Let A(x) be an n-by~n nonsingular stochastic matrix, Let F and ~ be n-dimensional column vectors, let X be a scalar (such that X < max(Fi) to avoid trivial cases), Then an n-dimensional stochastic vector I will satisfy the condition I >~ X for all ~ such that A(x) + > F if and only if there exists a stochastic vector y such that y = IA(x) and IA(x) F _ X, The theorem can be restated in the following symbolic form: Is En{I: Iq > X} ) IA(x)F > X z < x c T(A,X)'.cK(A(x)1i 17) where K(A(x) PU) = {~~: A(x) 1 > P:} which is a convex set, Examle 7oi Thle convex set K(A(x) PF) Let A(x) - 1/2 1/2 0 F F 0 1/2 1/2 F2 0 0 1 F3 A\(x) = 2:2 1 0 2 41 0 0 1 K[A(x),l p] = all (10 m 3) such that 2~~1 - 2o 2 + 43 > F1 {3 > F3

110 Solving the above ineqt alit iLe; gi.ves> F1 + F2 F2 + F3 ~2 2 3 F3 Fi.-',re'71 shows K(A(x) -F) for this case. 2 3 F1+F2 F2 3 igulte 7, 1 K(A\(x),F) of lixample 741 is the positive orthant withl the rectangular solid removed, 7.3 A NECESSARY AND SUFFICIENT CON)ITION FOR X-STABILITY FOR NONSINGUI.LA MACHINES Theorem 7e2 (Stability result) Let A and AN be nonsingular probabilistic sequential machines both,V.;ith ini-i.al distribution It [TA<i (t T(AI" ( ) VI ~ S+I:if- ){I If XI = J {If I{ >. X'} Vx e pK(A(xfiF) 7 EK(A((x))l) P' t-'';rtaint tith styih'rjil c foarmulatiri of Theorem 7 1 ~EK((x)1 ) K 1 i' P~~)~:r:. ~s~a, t~.yI:b'.l~~cfomuatonof Teoem7~

111 qaK(A(x) 1FQ) Assuming that T(AX) = T(A" X) VI ~ S+ gives the necessary and sufficient condition Vx E z*: r' I: I~ > X} 7 >{I: I >9 XV}'K(A(x) F) K(A (x) Ft) Q~ E D1 Corollarl r72 (Application to the Rabin stability problem) Let A and A* be nonsingular probabilistic sequential machines with initial distributions I and I' respectivelye T(fAX) = T(At X){ [Isn {I ~ Iq > X}I"';(f{I: Iq > X} for all x c a*] 4aK(A(x) 9F) qcK(A' (x),F') 7,F4 A SPECIAL CLASS QF STABILITY TRANSFORM'ATIONS FOR NONSINGUITA? M11AChlINES Theorem 7e2 is identically true when K(A(x),F) = K(AQ(x) OF') Let us investigate the constraints imposed upon the stability transformation T: A + A' by this equality, assuming as well that F = F, Theorem 7 3 K(A(x).F): K(A' (x) aF) iff A (x): A(x)rx where (i) TX is a permutation matrix (ii) (nx)F = F Proof; We establish the theorem by showing a series of lemmas, Lemma 7, 1 Let A and B be n x n matrices K(AF): {4: Aq > F} K(B9Fj = {q: B4 > F}

112 K(AqF) = K(B9F) iff there exist non-negative matrices rA and rB such that (i) A = rBB B= rAA (ii) rAF > F? F > F Proof: If K(AF) C K(B,F) then for all 4 in K(A,F): [AO ~ F = Bo > F] Let B.i i= th row of B F. = i'th row of F i = 1, 2,,,e, n Then the above becomes: [A4 > F =- BiO > Fi] for all O e K(A,F) i = 1, 2, #e,9 n We use FarkasI lemma for each i getting: YiVn: i > 0 and (1) yiF > F.'~~~3Yi' n 2 1 (2) B yiA Rewriting the above in matrix notation where Yi = the i'th row of rA we obtain: There is a matrix rA such that (rA)ij. 0 (i)' B -= rAA (ii)' rAF > F Now assume K(BPF) C K(AF), By the same argument we get there is a matrix rB with (rB)ij > 0 such that (i)" A = rBB (ii)" IBF > F By (i)', (ii),* (i)", and (ii)" we obtain the necessary part, To obtain the sufficient part of the lemma, multiply the defining inequality for K(A,F) by rA, for K(B,F) by rB and use (2), QO Ee Dt

113 Lemma 7,2 If A and B are nonsingular fATrB = En the n-dimensional matrix identity. Proo f. B - rAA - rA(rBB) rArB = En Lemma 7.3 For any n x n nonsingular matrix A such that A. = i = 1 2,..0, n kl k=1 then _ (A) 1/z i 1, 2,.,,p n k=l Proof: E = AA _n.A (A-1 = t(A )ik(A)kj k=l i:= (En)ij = ~ (A) (A)kj n i( ik t (A)kj = 2 (Ai) ik (A~kj (A" 1z hence n k _(Ai1)ik 1/Z Lemma 7,4 Let A A(x)1 B: A' (x)

114 then -1 (i) rA rB (ii) rA and rB are stochastic matrices Proof~ By Lemmn 7el Ur > () so tlat A (x) = rAA(x) [A" (x)]i = ( [A(x)]k [A0(x)' l = fl (rA)ik[A(x )' ]k. ___ 1i j=1 k=l Z E rA, ik > [A(x) kj k=1 j=1 By the preceding Lemma 7 3 1 / [Al(x) ]kj = 1 since A'(x) is stochastic j=1 hence 1 = ~,_~ FrA)ik i - 1 2,,66, n k=l since rA > ~0 by definition rA is a stochastic matrix6 We know by Lemma 71l that there exists r B> 0 such that A(x)Vl = FBA,(x)'The same argument gives that FB is stochastic, By Lemma 702 we conclude that rr = E or r r= A n A B Qo E6 D6

115 Lemma 7, 5 The only stochastic matrices whose inverses are stochastic are permutation matri ces Proof: Let X be an n x n matrix which is stochastic such that X1 is stochastic6 If X is not a permutation matrix, then some state vector Si must be mapped by X into a vector with at least two nonzero components. Assume that (WLaGoa) (1 0, 6, O)X = (al, a2, o'o, 0) Where al + a2 1 O a2 > 0 (al a2p 0,,s O)X1 = (1, O, ao, 0) (*) Let X Hx'.H The above equation (*) gives alxl1 + a2x21 = 1 a Xi + a i 0 i 5 2p o, n li 2 2i - But 1 > x9. z 0 because x is stochasticd Hence the only solution to the previous equations Xl = Xl = 1, x xi X i = 2p oo, n X2j II J Li 2i But this requires that X be of the form 1 () a o o o 0 1 0 o o o 0 Xl =1a x X/ 31 32 0 0 jXV X X / \ r\X n2 nn which is singular since it has two rows eoual, contradicting the nonr singularity of Xa Hence X must be a permutation matrix~ It is well known also that the inverse of a perrmutation matrix is a permutation matrixD QQ Ea D,

116 We are rnow able to prove tle thleoren:. Proof: (of Theorem 7.3) By Lemma 7,1 there exists rA and rB such that A(x) - rAA' (x) -1A'(x) = rBA(x) and rBF > F rAF > F By Lemma 7.4 r = r1 and r and rA are stochastic, Mlultiplication by B A B A a stochastic matrix preserves a vector inequality, hence rA(rBF) 2_ rAF (rArB)F =(r r )F = F F > rAF but F < rAF consequently F = rAF By the same argument F = r3F Qe Ee D. Corollara 7,3a If A and Al are nonsingular probabilistic sequential machines then x C E* K(A(x) 1,F) = K(A'(x),F) - for all I e S+: A -D A' Proof: Immediate from Theorem 7,3 and the definition of distribution equivalence, Coroll 7,3b Let A and A' be probabilistic sequential machines with string x backward deterministic for both machines K(A(x),F) = K(A'(x),F) = A'(x) = A(x) + tH where tlI. Kern, (F) i - l, 2,,,eo n 21

117 Proofe = rF F F = E + HII where H! ~ Kern,(F) i = 1, 2#, ~o n A n 1 A (x) = rAA' (x) A A'(x) = A(x)= A A'(x) = A(x) (E + H') A' (x) = A(x) + A(x)tH' But (A(x)H')F = A(x)(H'F) = 0 H-lence A(x)I' = 11 where it.c Kern, (F), QO Eo Do Hence the result that Vx c Z*: K(A(x),F) = K(A' (x) FE) guarantees stability implies the result that A and \' are X-stable if they are distribution equivalent, Let us try to fired more general conditions on nonsingular machines A and A' such that A -T A' 7e5 P)LAR SETS OF VECTORS If u and v are vectors in the same space, thr inr dr product of u and v wil1. be written (u,v). Of course, regarding u and v as i x n matrices we have T T (u,v) = uv = vtou Definition 7.3' Let K be a convex set. The'ar set of Ks written K*, is defin.ed as: K* {u; (v,u) _ 0 for all v c K} In some definitions of the polar set, is used in place of > and the constant may be one rather than zero (Radstrom [19491), The follow, ing identity holds for any of these definitions of the polar set6 Lemma 7 6 If K1 and K2 are sets of points in a vector space V, then

118 1( K2 (K1 K2) lProof: (1VLoG,0 assume K1 and K are sets of column vectors)'>] re_1 2 ac(K1 U 2)*: aek > (0 for all k c K1 U K2 a k?0 for k c K1 and aek > 0 for k c K2 acK1 ~ IK, The argument reverses to give the opposite containment0 QO Eo Do lxample 7 2 Let K be the triangular region shown in Figure 7,2 S2 K* f i,'il (oiso!0(, ) S1 Figure 7,2, Polar set example, The set K is the region with extreme points (-1,+1), (-1,0) and (02>i) K.* is the infinite region bounded by the S2 axis and the line thrcou.gh l-1,+1) and (0,0)> 7,6 TAPE ACCEPTANCE EXPRESSEL) IN?OLAR SET NOTATION In this section we will show how Theorem 7,2 can be rephrased in the notation of convex sets using the polar set concept0 Theorem 704 Let x be a backward deterministic string in both A and A':

119 [Xe;~ ei- A ) v~ Ic].t>[K[A(x) l~x'- ].*1. SI = K[lA(x),F'AX~ |]*n Proof x c T(AX) < IA(x)F t X since TA(x) is stochastic for all x IA(x) oX ) = A.IA(x)( ) X Hence IA(x)F > X, IA(x) [F ( ) ] > O which gives x c T(AtX): — IA(x) F {)X ] > 0 Likewise we get for Al x - T(A')X')=) I9A'(x)[PF aI j) ] 0 In order for the above equations to hold for any initial distribution I, we use Theorem 7e2 getting /~{I [ I~ > }: >fl i ~ i 2~ 0 o1 p~K[A(x)::- [AX T ]x) But if we regard I and ~ as vectors in the same space 0 >( 0 (I,~) > 0 Hience translation of the above gives K[A(x) F xj) ]*( s = K[A'(x) l ]* s Qd En D4

120 7, 7 STABITI-TY FOR CERTAIN IHACHtINES l-lICIl tHAVE SINGl.lAR, SWITCHING AI1" KiCES D)efinition 7,4 r The set of singular switching matrices of a machine A will be written Z(A) = {A(o)~ a s Z and determ,(A(c)) = 0} For some probabilistic machines A with singular switching matrices Z(A), the methods of Chapter 6 can be used to obtain a machine A' such that Z(A') = ( while A =_ A' so that x c T(AX) < > x c T(A' X) Indeed, Example 1.1 is of this type if we interchange A and A' i,e., the machine A' has a singular matrix A'(l) while A(l) is nonsingular.. liHence ap)plying Farkas' lemma to the nonsingular machine A' we obtai: x e T(AZ X) - I~C/1 ~ Io >,} OeK(A'(x) F) TIIfor,A 2A' (), p1lays the role )of the inverse of the sin,;ular matrix A(x), Unfortunately, finding- conditions for the existence of a nonsin.ular machine A' which is expectation equivalent to an a.rbitrary machine A contains subproblems concerning invariant error patterns for A left open in Chapter 6,

C(IAPTERi 8 MIINIMAL PROBABILISTIC SEQUENTIAL IMACHINES This chapter will contrast the properties of E, and = with those of VT, The former equivalences lead to minimal machine results which are similar to those of deterministic machine theory, On the other hand, the later equivalence requires such disciplines as convex set theory to discuss minimality0 Consequently, such results which can be obtained represent a departure from deterministic machine theory, 8,1 STATE EQUIVALENCE OF MACHINES In previous chapters, the sanme symbolism was used to represent both abstract, or arbitrary initial distribution machines, and machines with a fixed initial distri:bution. In what follows we use K<IA> to mean the machine A started in initial distribution I, We let - be a machine equivalence variable ranging over the above equivalences, Definition 8,.1o: Machines A and A' are state euivalent for written A:(s) A'. If there is a (possibly multi-valued) function h such that: <1si,9 h(i)' = A -1, 2, n and KSh - A > i> = 1, 2, 9.c. nQ Definition 8. 1 requires that for every initial state of A there is an initial state of Al suc)h that the machines are equivalent and vicevers a 8e2 EQUIVALENT STATES OF A M-\CIIINE In order to discuss reduction of a machine to a smaller machine, an equivalence relation on states is needed~ They symbol "'" will 121

122 be used as a state equivalence variable, When "I-" is used in a context we will mean that any one of a designated class of state equivalences can be substituted for'"'" in the context, Definition 8.2: States S. and S. of a machine A are equivalent by N if <Si A> E <S.,A> Hence for ET, N,' and -I we have states Si and S. are equivalent for: (a) -E: Si' Sj SiA(z)F = S.jAz)F Vz s E* (b) SN: S. S.A(z) ( S. SAC(z) (k) Vz ) Z* k 1, 2, e,, N N N 1 (c) S Si SIAi I SA(y/x) Vy C Y Vx C Z 8,3 STATE AND DISTRIBUTION REDUCTION OF A MACHINE Merging some of the states of a machine to get a new (perhaps nondeterministic) machine is a notion from automata theory, For probabilistic machines, Carlyle [1961] has noted that a state S. can be "merged" with a distribution T = (010 dd, i1, 09 ls i+lp 5Aro In) over the other states of the machine, That is, when the machine attempts to enter state Si, it can be routed to the other states with distribution n, If the i'th row of every symbol matrix happens to be the same convex sum of the other rows, this possibility of reduction holds for every state equivalence, ide., [SiA(a(c) i _ E) ] <Si,A> E <A> jii As Carlyle has pointed out, a probabilistic sequential machine with a minimal number of states by state reduction may be reduced fur

123 ther by the use of such equivalent distributions: For any real model of a probabilistic machine, such minimizing of states would cause the machine to operate more slowly, Hence the results will be concerned only with equivalent state reduction although the definitions will be general enough for equivalent distribution reduction, Definition 8.34 The reduced machine Al = A[T(S.)] where distribution -r replaces state So of machine A is obtained by. (i) IJ = Ik + Ii k k = 1, 2, 9,,e n; k f i (ii) The i-th row is deleted in F (iii) The symbol matrices of A(r(Si)) are given by: A[r(Si)](o)jk = A(o)jk + A(a)j,ik k = 1, 2, ~, n k ~ i =: 1, 2,,ret n j P i This notation for the reduced machine has been selected for intuitive rather than esthetic reasons, If r is activated when Si is specified, then r depends on Si, ice e, (Si), 8o4 REDUCTION IN SEQUENCE Let a1 a be distributions and S1,, o. S be states. The re~oo,0 ~m m placement of the states by the corresponding distributions will be formalized Implicit in what follows is the assumption that (a ) = 0~ Definition 8.4: The sequentially reduced machine A[,,i [a1(S1)]a2(S2)] am(Sm)] is defined recursively by: A[ t [a1(S1)]( 2(S2)] t am(Sm)] = A [am(Sm)] where l A r 1[a 2 A = A[ar(S )] r = 1~ 2,,.~ mel and A = A r r

124 If al9 a2t coo am happen to be states and ar = Srl r = 1, tCe, m-l then the above defines state reduction in sequence, For this special case we use the notation A[ tot [S1 + S2] + S3] -+ + ] + Sm] = A 1[S (Sm 1)] Definition 8.,5 The index mergin function h (i) for state reduction in sequence r if i < r h (i) ri otherwise More generally, if Sli is a representative of a - class: 1# if Si C [S1#] hl# (i) = 1 (i) i otherwise Let P be a permutation on the integers 1,,,,, m, Tlhen we obtain the following results, Theorem 8, 1 Let, range over'Ei, PN and ~IV If S1iv S2 S2., S2 Sin. - SS then A (s) A [ S2] -+,,] + Sm] E(s) At. [S,' S(2)] *,; ] + Proof: The proof will branch into parts (a), (b) and (c) corresponding to -and — I For r - 1 A =(s),-u, Suppose by induction that Ar = ArA [s(S,)] (s) A for r < m-1l Then A =A [rl Sr) ] and

125 rfr Ar()j + Ar () for k = r; rj J j r+l j $r (A rs (Sr) I () k = j= 19 2o ~ nAr = Ar(a)j k for k ~ r, k $ r+l (a) We show that for any initial state Si of A, there is a state Sh L (i) of A [Sr (S) ] such that r+l 1i) r+l EA(a) E 7 +5 l(r)]( QJa e ~ Let k hr+l(i) Cal, i Ar[SS (S )](x) = A~(x) for x E ~* r+l r Ar [S (Sr)](o) = A(oc) for a z Z r+l r SrSr)]() = lbijl i = 1 2,,,o0 nAr = n' Then bll 1:~ + b bb + b ) b A2 (app = b rll r \ b 1 r+111 F111 ( k k; kl r kr+l r:l j =1 r rA1 S A(A)F S A(A)P r~ rF r r+l1 r Fr+1

126; bk FjF b F + b F (A~(~)F")k k kr r k,r+l r+l AbkF =F (A [Sr(Sr,-1)] (a))h [()' Fr r+ 1 F n By the induction hypothesis on states // F1 A(o)F = (A()) F = AI= (A [Sr(S )] ())h (i) r+ 1 r+l (Ar[S (sr)] F o)' )hfo rr+ l r((i) which shows that there are states such that A and Ar[Sr+l (S)] have rr 1r the same expectation for any string of length one, lHence we have set up the initial condition for an induction on strings x c ~* to prove that A and At[Sr+l(Sr)] are state equivalent for expectation equivalence, Suppose (A(x)F)i a (Ar[Sr,,l(Sr)](x)F)h (i) for all x e (.)d for d finite. Let F" = A'(x)F' and call h (i) = k5 By the induction hypothesis on r+1 strings Fk (A' (x)F')k - (A(X)F)i (*) (Aiojulk kj j k,ral r k,r r jPrra

127 1 r r+l F / Fr+1 we obtainO (A (oa)A (x)F9t)k = (A (ax)F" )k n (Ar-[Sr(S )] (X)) F*)k r r-~ k Noting that F is the output vector of Arl[S (S )] and using r r-l the induction hypothesis on states (Ar (S(Sr ) (ox) F*k = (A(ox)F1 HIence (A (ox)F) k = (A [Sr+ (Sr) ](ox) Fk' (A (x) f and the length of ax is d+l, which completes the induction on strings, Hence (Ar [S l(S)](Z))h lijF' (Aiz)PF) for all i and all z' Z* r+rl1 By definition Ar+l -E(s) A which completes the induction on states, Consequently, A[, [S1 S2] + S ] Es) A Given any permutation iP of the integers {1, 2, 0oo m} by the transiA tivity and symmetry of- 4we carl obtain (subject to the hypothesis of the theorem) S E'Sl si) S spj)

128 Letting Si1 = S pi)o the proof proceeds exactly as before so that: A[, [S1 S2] +:m S] E (s) A[,,[ S!1) + S(2) + ) (b) N-moment equivalence =N' It must be shown that for each initial state S. of A there is r a state h f Ali) [Sr+l(Sr)] such that for any string x (A (x)F)i = (Ar[Sr+l(Sr)]hr (i)()F N r+l AN) (A(X)(F ))i = (A [S r+ (S)] r lh (x)(F rl) 1 rxicNl rh1(i) Al By substituting (F k) k = 1; 2,,,. N for F in part (a), the proof proceeds in the same way, (c) Indistinguishability =I Define a new alphabet El = {y/a: y c Y, a eE } with the string operation o defined by Y1/al 0 Y2/a2 = Y1 o Y2/al o a2. Let F be replaced by ( ) and the proof proceeds as part (a), In all cases we have A =(s) Am Htence for any initial distribution A I, the iith equality defining,-Nfor equivalent states can be multiplied by ~ I. and adding over i gives A E Am for any initial distribution, A j Si] Q, (, I), 8, 5 M1INIM.AL nACIHINF.S FOR N ANDI The formalism necessary for proving a fundamental result has now been established, In the theorem which follows - will be restricted to {E - E" N} and' to ({".-1g -N}~ The reader may suspect that the result will hold for any equivalence relation on states which also

129 has the prooerties of the classes of the partition of V(A) discussed in Theorem I 1,4 eHowever, as we shall see in the next section, for /T/ Theorem 8i1 does not hold, Consequently the basic result which follows is proved only for the above equivalences, Theorem 8, 2 Let A be a probabilistic sequential machine, There is a machine Al with rank { states such that A -(s) Al and any machine B, such that B =(s) A has at least rank |A states. Proof~ Partition the states of A by ^. into classes wl, ~,i, w having representatives S 1# S2#, e,,,, Sr# Use Theorem 841 to reduce A to Al by collapsing all states in the class w1 = A[S1#] By Theorem 8,,1, regardless of the order of collapsing, we have A' -(S) A where we let hl#(i) be the index merging function for A' and A, i,e,, (AI (x) )hl (i) _ (A(x)F)i for all S. e S, for all x %*. Proceed to the class "[S2#]4 Suppose Si ~e[S ] and Sj A[S2 ] Clearly Si and Sj are not in A[S1#] so hl#(i) = i and hl#(j) = j, Hence we get S A I(x)F = S.A(x)F Vx E E*.inA(x) F = S.A(x) P Vx E Z* A _ A" But S, S S Sx Sf S Therefore we use Theorem 8,1 on Al A~ A (noting that \S 2 ] has the same number of members as~[S2#]), By substituting k# and (k+l) for I# and 2# above, an inductive argument is obtained4 A machine with rank i/}|l states is obtained at the termination of the indluction Thae artiCtular mnchblue obtained. de-t~endc. O 1 n the9r r'er~rescntiarv.eVs selected from the classes of A, and is hence not utniutleo

130 However if A" and A"' are two minimal state machines then A" -(s) A"' since each is individually equivalent by =(s) to A, If there were some machine B with fewer states than rank IAj then for some i and all x c E* Sh(i)B(x)FB = S.A(x)FA S (i) B(x)FB SkA(x)FA A A for Si #Sk, i,e,, Si and Sk in different classes under,V Therefore there must be a x' such that SiA(x')FA ~ SjA(x')FA which gives the contradiction Sh(i)B(x')FB Sh(i)B(x')FB Q. EA Do The previous two theorems are formalizations and generalizations of remarks made by Carlyle [1961] concerning =i Let us turn attention now to =- to see why it does not satisfy Theorem 8,1e 8 6 MINIbMAL MACHIINES FOR = We make a definition of T analogous to 8,2 (a), (b) and (c) T Definition 8,6: Equivalence of states for T Si AT Sj if SiA(z)F > X 4= SjA(z)F > X As in the definition of -ET the above depends on the cutpoint X, And as before it will be assumed that X is defined with A and remains A fixed in a context, The relation AT is an equivalence relation on states Furthermore SiiA(x) T Sj SAx A(x). However while multiples of equalities can be added, expressions involving > and < cannot be manipulated in this way~ This elementary fact leads to the conclusion that neither equivalence in Theorem 8.1 holds. Remark 8,3: There is a machine A such that for Si A S -,, l., —~~ TnmS j

131 (a) A[Si(Sj)] -T A while A ~T A[Sj(Si)] Furthermore9 there are classes of machines such that~ (b) Si -T S.j ==A[Si(Sj)] - T AT Demonstration (of the remark) Suppose Definition 8,6 is satisfied, It is not necessary for Fi Fj1 e So specify that Fi # Fi. In particular let A = <3, (o1) o3,, o6), Z A(s), F =(2n1,)D 0A> and X = 20 Let O 0 1 Z = {A1i} and A(1) O 0 1 IA(A)F = (49 o3, 66) (21 )= 3,13 so A c T(A,X) 4 A For all z e s* a S2A(z)F > 2 S3A(z)F > 2 so S2 T S A' = A[S2(S3)] = <2, (1,9, 9)9 D; A'(1)D (24 OA> IeA (A)F = IF = (1 9) 2)= 99 so A T(A' X) Hence A[S2(S3] T A On the other hand A" = A[S3 (S2)I = <2A (o,1 09)9 E A"(l1) (4) OA,,> I" A (A)F1F (,Io 09)() = 3,7 so A s T(Ao X) A For all z" e, I"A"(z")F" > 2 so T(A,2) = T(A",2), i e, A[S3(S2) ] A, So even though S3 lT S2 A[S3(S2) fT A[S2(S3) ] Of course much more elaborate machines can be constructed to show that this property is not dependent only on the identity but occurs for a large class of probabilistic machines~ Following in the same reasoning, a few other observations are immediate~

132 8,7 THE CRITICAL SET OF TAPES Definition 8~7: The sct of all tapes x such that IA(x)F a X will be called the eritical tapes for A, ided, C(A,X) = {xC: IA(x )F = X} c Theorem 8,3 If A[Si(Sj)] -T A and A[Sj(Si)] -T A while F. > Fi. and if the cutpoint X is not isolated for A, ide., C(A,X) # for all xc ~ C(A9X) o IA(x ) = 0 Proof: By assumption Fj. > Fi, then EA ) ( > E (x) Vx C I* i A[S.(S.)] A since the probabilities of S. will be shifted to state S. whose output 1 3 weight is greater, Equality can occur only when IA(x)i = 0O Likewise EA[x] ~ EA[Si(Sj)] (x) Vx * with equality only when IA(x)j = 0. Suppose the cutpoint is not isolated. Then for Xc s C(A,X) we have E A[Sj (Si)] (xc) EA[Si(sj)] C Suppose IA(x)j A 0; then Xc C T(A[ S(Si)],X) and XC T(A[SI(Sj)]X) Hence A[Si(Sj)] IT A[Sj(Si)] contrary to the assumption, Qo Ea Do 8,8 GEOMETRIC INTERPRETATION OF S+ The set of distributions over the states of an n-state machine, S+, is isomorphic to a set in n-l dimensional space because only n-l of the probabilities can be specified independently. For four state machines S+ can be depicted using a tetrahedron which has unit altitudes. Each vertex of the tetrahedron represents a state. Label the vertices S1i S2, S3~ S4 and let ai be the altitude from Si; i = 1, oo, 4. The distribution v = (P1, P2 P3, P4) is represented by

133 coordinates on the altitudes measured from the base, Figure 8,1 shows this convention for the first component, S 4 S2 S3 igure 8J1/, A coordinate of a distribution in the tetrahedron isomorphic to S+ for n=4 8,9 THE CRITICAL DISTRIBUTIONS OF S+ Let us generalize the notion of critical tapes (Definition 8,7) to distributions~ Definition 8e8~ A distribution vector ir is critical for machine A if nF = X, The set of critical distributions D(A,X) = {r e S+: *F -= X From the definitions it is clear that {IEA(xc) o X C C(A^X)}C D(AoX) Definition 8,9; A distribution rV is said to be ac ed if rF > X, A distribution a is reected if nF < X. The set of critical distributions D(AX) is a plane in S+ which separates the accepted distributions from the rejected distributions, The following example illustrates these notions, Examle 8,1 4 Let F: /2 and X = 2, The solution set of the elquation PF = 2 where r is stochastic is a convex set determined by its extremne points~ Using the well known

134 method of Kemeny et al [1959], we find the extreme points,i() = (3/7D 4/7, O0 O) r(2) = (0o 0, l, 0) - S3 r(3) = (1/39, 0 OD 2/3) The plane D(A,X) is shown in Figure 802, S2 S1.- S3 = Tr(2) S4 Figure 8, 2. D(A,X) is the area enclosed by the dotted lines, ie, the triangle i(1)-r(2),-(3)> 84,10 GEOMETRIC INTERPRETATION OF STATE MERGING All the distributions on the "S1 side" of D(A4X) in Figure 8,2 are accepted, while all distributions on the "S 2-S side" are rejected6 Clearly, identifying S1 with either S2 or S4 would change the acceptance of the empty string (and probably many other strings) into rejection and hence fox any machine A with output vector F ( 1/2) T(A.2) # T(A[S4(S1)]b2) On the other hands suppose S2 is identified with S4o Figure 8,3 shows the resulting homomorphism of the tetrahedron, S1. 3 =- (2)'( r 3) S4 (S2) Figure 803 The set of distributions when S is identified with S4D i,e S4(S2)

135 3From Figure 8,.3 we observe that 7/(1) (the projection of i(l) on S1 S4) lies inside the accepted region and is no longer an extreme point. Furthermore, all the distributions in the wedge-shaped volume (of Figure 8,2) bounded by'(3) (l)>(3)-(r(2) will be called the S4(S2) wedge and written W(S4(S2)) Wh~ile the distributions of W(S4(S2)) are rejected, their images after merging are accepted, Ilence if any accessible distribution I.A(x) c V(A) for any machine A with outpurt F and cutpoint 2 falls in W(S4(S2)) we have T(A,2) ~ T(A[S4(S2)] 2) Likewise we find that if I A(x) e V(A) and IoA(x) e IW(S2(S4)) then T(AQ2) T(lA[S2(S4)],2) We summarize these remarks in the following theorem, The first part of which will not be proved since the substitution of general variables in the above remarks sketchles a proof, Theorem 8. 4 Let A be a four state probabilistic sequential machine and let siT j A -T AL[S(SJ)] I WS(S)(Sj)) V(A) = { or W(Si(Sj)) = D(AX) Proof~ Sufficiency is sketched in the example above, For necessity we note that W(Si(S)) ( V(A) = V(A) C(S = W(Si(Sj))) But any distribution in S+ - W(Si(Sj)) does not have its acceptance changed by the identification of S. with S., Likewise if W(Si(Sj)) - D(A,,X) D(AX) is parallel to S.iS. and merging Si with S. does not change the acceptance of any distribution~ Hence if x E TsAk) then x T(A[Si(Sj)]bX) and vice versa5

1 36 Corollarj 84 If F. F. for a four state machine A and Si - Sj then 1 iT T A[Si(Sj)] 5 A -T A[Sj(Si)] Proof; For F. = F. and four states the extreme points on the lines Si-Sk and Sj-Sk (where Sk is adjacent to Si and Sj) are permutations of each other, and the line joining them is hence parallel to Si.Sj. 1 3 Therefore, identifying Si and S. causes D(A,X) to be compressed without sweeping out any volume; i,e W(Si (Sj.)) D(AX) and W(Si(Si)) = D(AX),4 Theorem 8.4 hence gives the result, Qc E0 D1) The proofs of this section concerning the four state machine admittedly make use of many implicit facts of the geometry of three dimensions. It seems likely that analogous arguments (except for Corollary 8A4) hold in n-dimensions, but the author has no proofs for the general case, 8,11 COMPARISON OF STATE REDUCTION FOR -ID9 -N AND E WITH -T The last paragraph of Theorem 8,1 shows the essential difference A A A between the former equivalences and -T: If Si T Sj0 Si - S or S. Sj then simple multiplication by constants and addition of equalities shows that for any initial distribution I and distribution I' (which is I with Si and S. merged) we have A - (S)A[Si(Sj)] [<ISA> - <I' A[Si(Sj)]>] However for = T(S) a system of inequalities is obtained which may not be able to be added and Si Sj is not enough to guarantee this property, A -T (S)A[Si(Sj)] = [<IA> -T (Ic, A[Si(Sj)])]

137 The results of the previous sections concerning convex sets show that merging of equivalent states does not preserve tape equivalence. The status of the vectors ItA(x) e x e ZE in the set SI relative to D(A,X) must be considered,

CHAPTER 9 APPLICATIONS TO OPTIMAL CONTROL PROBLEMS 9 0 INTRODUCTION The close connection between the theory of sequential machines and the theory of optimal control has been pointed out by various authors, Arbib [1964] has attempted to make a "rapprochement" between the two areas of study, The purpose of this chapter is to show how some of the notions of previous chapters apply to the optimal control problem as defined by Eaton and Zadeh [1962], Before applying some of the previous theorems to certain optimal control problems, let us consider some basic observable events of a probabilistic system, 9 1 INPUT EVENTS The most fundamental conditional probability of a machine is the probability which relates sequences of outputs to sequences of inputsd As has been observed in Chapter 3, such a joint probability is defined recursively from the joint probability of output symbol y and states S. given input symbol a and initial state So, The "Moore model" which has been used throughout requires y to be a function only of Sj. Hence this basic probability can be expressed as (using the notation of Chapter 3) P(y~Sj/~Sil) = A(y/a)ij = A(o)ij if O(Sj) = y Sy otherwise From the above conditional probabilities the system probabilities or stochastic inputsoutput relation can be computed, As in previous chapters0 the probability of observing the output sequence Ylo,6yr if the machine A is started in initial distribution i and the input 158

139 sequence l~ ~Oor is applied will be designated PA(y1 OOyr/Cl,oOar) Control engineers frequently need to use all the information available in order to control a process, The following concepts, systematized by Carlyle [1965], formalize other conditional probabilities which can be determined by experiments on a probabilistic system., Definition 901O A yinu t event for a machine A, written T (AX), is the set of input strings which produce y e Y* as the last sequence of outputs with probability greater than or equal to X0 T (A X) = {x e Z* o I' A(ypy/x) i ~ > X} y v y (y] Mg (x) -,g (Y) 1 p We note the connection between y-input events and the concept of tape acceptance: T1(AOX)'= T(AAX) T0(AgX) = Z* - T(AAX) 9,2 INPUT EVENT EQUIVALENCE~ =IE Definition 902: Machines A and Al are eX —nput event e uivalento written A -XIE A if for a fixed real number A and all y in the output alphabet Y~, T (ADX) = T (A OX) y' y Following Carlyle [1965] and Paz [1964] (who calls it "strong equivalence"), let us define a very strong equivalence for input events* Definition 9o3- Machines A and AA are ei n v event e uivalen t, written A IE A if for all X and all y in the output alphabet YO T (AA) = Ty(A,A) Y y

140 Note that IE is just the universal quantification over A of XIE It has been shown (Carlyle [1965], Theorem 7) that A =I A' - A E A' -I -I E Let us relate -IE to the other previously defined machine equivalences, Theorem 9,1 A A-Dt AA A A = A A A' Proof.~ A E AM Prob{OA(x) = y} = Prob{OA;(x) = y} Vy. Y, Vx C E* By definition'Prob {OA(): y} = I I'A(x)i A~~~ i) i 0(Si)=y Distribution equivalence implies,E IA(x).i q= ItA(x), for y p 0 and not with probability io i"1 O(Si)=Y ~'(S.i3)=Y zero of occurringe Hence taking sums over those y f 0 >,. ItA(x) i A=.II A' X)iA(x)i J IOAAl (x)i i'0 i': i' i~~ O(si )=Yo 0'(S 0 Si)=Y 0(Si)=O 09 (Si )= which gives the first part For the second implication, regrouping of the terms of Theorem 241 shows that AN(X) depends only on _ I"A(x)i and (Fk), k = 1,.,o, N, O(Si)=Y But from the above and noting that F = F', we see that the arguments of the function PN(x) are all equal for machines A and A'l Q, Ee D, Immcdiately from the definition we obtain: -IE A I A XIE A

141 In general, there is no relationship between -:IE and -T H -loweverP for the special case of Rabin automata, thhe inequalities involved are degenerate s> tlat A AlE A" - A JIE 9 3 INPUT -OUTPUT EVENTS 1We now consider events which require information about the outcome of initial portions of the experiment, Generalizing the definition of Carlyle [19651, we let q((y/uxv) be the probability of observing output string y as a consequent of input string x, having already observed v as output because of thel. input string u, The fundamental definition of conditional probabilities gives us thato I >A(vy/ux) ( I A(v/u) ) Definition 9,4: The input output event for y e Y* of machine A and cutpoint X?, written I0 (A, ) i.s given by~ 0I n(^ x) - {(uxv) O qcy/uxpv) > A Here we let x and y be arbitrary strings while Carlyle restricts them to symbols, Let us interpret one kind of input-output event for a computer, Suppose the input sequences are programs, the output sequences are results and X is a reliability, The input-output event for a particular result y consists of all programs ux and "intermediate results" v which guarantee that y will occur as an output sequence with reliability at least as high as X~ 9 4 INPUJTsOUTPUT EVENT EQUIVALENCE o - Analogous with the concept of tape equivalence, a fixed cutpoint

142 equivalence will be defined for input-output events6 Definition 9~5: Machines A and A' are X-input ou t event equivalent, written A xio A', if: I0 (A,X) = I0 (A,X) for all y c Y y y By universally quantifying the cutpoint of = IO, the definition of Carlyle [1965] is obtained, Definition 906: Machines A and Al are input-output event equivalent, written A = A if: A = Al for all real X'I0 The equivalence =IO amounts to the same thing as -I; ie, as has been noted by Carlyle, the definitions yield: Theorem 9,2 A =I AA = I0 A' Let us show the relationships among the machine equivalences defined thus far, A D A'A E A!4,A - i =m'EN A' -AEE A =~Im mT ll I IE N E IT ~antQ L Paz -iA A' i Rabin a#~ A~~-XIE A = A $ A A IO1 I XI0 Carlylei ie 9.1 Relationships among the machine equivalences, 9o5 OPTIMAL CONTROL PROBLEMS An optimal control problem is essentially a probabilistic sequential machine with costs associated with the action of inputs0 A certain state, or set of states (usually absorbing) is designated as the "target" and the problem involves getting the state activity to the target with

143 minimal costs- Usually information obtained from the outcome of an input or seqluence of inputs is used to evaluate some function which pec': i cfs the enext inmut or sequence of inputs T:O.1 llowin-f Eaton and Zadeh [1962], we shall assume that after each inp)ut the resulting state:i.s Xiown..Th -ie kinds of nText l.nput functions or polic's -hicih virZll be considersd here usually doepend only on the state., The o0utputs sf the probabilistic machine will not be used as outruts in the usu"mal einse but will be used to express the costs of the actions on the control system,. HIence the expectation of "output" EA(x) will actually be the expected cost at the time step after the input of the string x into control system A. it is helpful to think of the oultnut vector F as being expanded into a n x 2 matrix, the first column b.having arbitrary but unequal components which are used to identify the state- The second colunmn functions as the output in the sense of the formal definition but c o ntants the costs associated with the states. A minor diff-ic.ulty occurs bTecause zmlany optimal control problems have costs associated with the state transitions rather than states, However, we shall see in Leuna 9 1 that an expanded machine can be built for any such control problem with the costs associated only with the states, Lemma 9,1 also translates between the terminology of probabilistic machines and that of the previously mentioned paper of Eaton and ZadehW Lemma 9.I l'r.-aslation between probabilistic sequential machines and optimal 1 nt ro 1. p rob lerns; The following makes a correspondense between the terminology and assumptions of Eaton and Zadeh [1962] in "'Optimal Pursuit Strategies in Discrete State Probabilistic Systems,' and those of probabilistic sequen.

144 tial machincs$ Eaton & Zadeh First level Second level translation: translation: Machine A expanded Mlachine B (1) Transitions take Same Same place at integral values of t starting at t = 0 (2) Input ut takes on {alia2t, 60o am = {la2s"xam} values {al, e am} sometimes abbreviated 1,2, em (3) Inputs referred to Inputs called "input Inputs called "input as "conmands" or symbols" symbols" "'actions" (4) States States States ql.,t',dqn pqn+l S = {S l6s Sn9n } (sipsj k) where si,sjcS and keE (5) The state qn+l is sn+l is a sink or If the state the target state absorbing state sn+l k) is accessible then S. = S j n (6) P.ij (k) prob, of A(k)ij B(k)(miu)(,jv) one step transition A(k).j where from i. to q when v = k and i 1l j v= k and i = command ak is = 0 otherwise applied (7) IP (k) is stochas A(k)i j is stochas- all ( BJ v)B(k)( ic ttic = EA(k)i j = 1 j * so B(k) is stochastic (8) Associated with C(k) is the cost FB =. (k) each non-zero tran- matrix for A(k) (no sition s. + s. by way to express this ector of B contains 1 j vector of B contains ^command N is a tas a single output all costse com k ivector in machine A, positive cost but can be expressed C. (k) as the output vectors of a set of machines having the same transitions as A)0

145 (9) The cost of going C(k): F ~ i,n+l (i,n+l,k) to the target is 0 (10) By an P-'st) transii Same as E, &F Z, A seu ence of -states tion se -iCuce is a b - (s O.s,) sequence of states (a0~al.a~.,..i;s0~,,,':$9)F r~bl =- (sls2"a1),'' b 1 = (S S,S a.l) A = (11) A transition,. if F e if 1 j B)Bl sequence is ossiblej 0 ( B( if the conditonal f x 0 1 1' 2bl prob, of observing p > 0 the sequence given 0O (- b 0 Z the initial state C T > 0 qO is positive (12) Cost of the Same as L, F& Z,. but sequence of length outside normal PoS,M, C = ( ) R is the sum of the framework, r=( rS r+l r one step costs, C = c. (kr) (13) A strategy or a Same as E, 7Z. ft r is a projective jpoli;c is a function function defined by _which assigns Tr (S t st9 u ) a command to every state ut 7 f(q ) -' (st) write write - (a )'rr- (fl..~IT'' + t ) - CT D U IT tn 1 n (14) The policy matrix s A(rr) \ (7r) 10 1 A (7T).i is the optimal A(T)i. 1 s A(r >l cointrol for state Jl+ 1 lnl = O otherwise qli (15) The cost matrix for C(r Costs are embedded in pol icy r ( the output vector FB (c(IT, (Zi) | CC() C ( ) ij F B (Tr3CIT. (16) Xi (T) is the expect- Sanle as -, t Z, A\nalogous to E. & Z, ed cost if e olicy() ie. X' ((r) I 4 d V cost: o tVyy A(m.io ) from initial state i (17) A policy nf is pro- Same as.,;j X i T per if xi (It):oi i. i=1.2,..,r ivl2,. ~ ~,n

146 Lemma 9,.2 X'I() = Z B(x)iFB if X is a proper policy. i=l Proof: The basic recursive equation of Eaton and Zadeh will be formulated using the expanded machine B Xi(T): E{Cij (Ti) + Xj(T)} n+l = __ P. ij'w)[c..('r) + X. (')] j=1 1ij i ij i i,e, X(r) = C (7) + P(Xr) X X(r) (1) where the one step average cost from state i is: C()rri: c.ij (i)Pij (.i) where Piji) IPiji) i j=l If the target can be reached from any state with positive probability by the policy then P(r) is not stochastic, i,e., the matrix P(nr) has all eigenvalues less than unity in modulus and (1) can be solved as X(Tr) = (E - P(n)) C (r) equivalently 00 X (w) = p(,) C(wr) i=O Equation (1) can be expressed using machine B and definition (16) of Lemma 94., X'(I) = B(w)F + B(n)X'(n) which gives, if a unique solution exists (i ert, r is proper): X'(wr) = 2 B()i FB (2) i=l which is a sum of expectations of B, Q, En De

147 Lemma 9 3 x(:) = X(.O) 4# X'(~) = X~ (r) Proof From (16) of Lemma 9.1 we see that the elements of X(7n) are rewritten men times in Xv(nr) 9,6 EQUIVALENCE OF POLICIES Theorem 9,1 Let = and -1T be policies of an optimal control problem, Let A = <n+l,~ S = {Eit,r }~ A(n)o A(-n), F, 0> be a probabilistic sequential machine having input symbols' and Tr and switching matrices P(fr) = A(7r) and P(rr9) = A(w')o Let the cost matrix C be indeendent of the input,, Furthermore let the column space of C be generated by powers of the components of F: (CT)ic <{(F)o (F2) o (FN)}> Then for RN(A), the N-moment reduction relation defined by A irRN(A)Tr -r — X(rr) = X( ) Proof-: There must be a set of constants {dk} so i k j=l dk( ) - jj dssA() i h d (Fk) k i ()k k l dk"Si.A(T)j (F since iRNt' and A(r) and Arr") are assumed to be distinct; Theorem 1.8D gives:

148 A(r) = A(n') + H where Hi E V i = 1, 2t,... n and some Hi 0 while VA(a) C V oe{, On'} N V cj ()Kern, (Fi) i=l HIence A(r)j = A(lr')j + IIA(')j 1 + + = AC(')j + H' +.. + H " where Hti and H'i are in V for i = 1, 2',.,,, n 1 1 multiplying by (Fk) 1 < k < N A(n)j(Fk) = A( I')(Fk) + t1' (Fk) + ek + I"(Fk) = A()j(Fk) + 0 + + 0 Substitution in (*) gives for i = 1, 2,,,e n ~ i j j k) x () dks iA(r')j (F ) X()i 1 k i j=1 k=l Qe Ee D, We illustrate Theorem 9 1 with the following optimal control problem, Using the notation of Eaton and Zadeh: Let Q = {q1' qc2, q3} where q3 is the target state a = {0s 1} input alphabet, 0 3/8 5/8 A) = 0 3/4 1/4 (1/4 1/4 1/2 A(1) = 1/2 1/2 0 0 0 1

149 The cost matrix C is independent of input: 1 2 ~! C(1) t C(O) = C = 3 6 0 0 0 0 Let F ( 2 and N = 1 Claim:. The policies of i (1C, 0) and 1Tf = (0, 1) are equivalent, i.e., x(7) - X(n) d Note that = and i' are proper: 1/4 1/4 1/2 A(') = 0 3/4 1/4 0 0 10 0 o 3/8 5/8 = 1/2 1/2 0 \0 0 1 For N - 1 RN(A) - RE(A) From Thebrem 4a4: iRER nR — A(Trz)F = A('raz)F Vz C {rr,7r'}* such that Zg.(z) < 1 Htence 3/4 A(t)F = A(=r')F = (3/2 Likewise the reader may verify that A(aT')F = A(n'7T')F A(wTr) = A(Tr'i)F w'hich shows that iREr'. By Theorem 941 we know X(ir) o X(')

150 However, in the next example we will construct the expanded machine B from A in order to make this fact more transparent, Exame 9,2 Construction of the expanded machine B from machine A6 In writing down the matrices the following ordering is used for the states of B; i,e,, the r'th row or column of the matrix corresponds to the r'th state in the ordering: S110 Sl119 S120, S121J S130J S131' S210 I S211J S220J S221P S230J S231J S310 S311f S320t S321' S330, S331 Hence 1 1 2 2 0 0 3 3 B 6 6 0 0 0 0 0 0 while B(r) = 110 111 120 121 130 131 210 211 220 221 230 231 210 311 320 321 330 331 110 1/4 1/4 1/2 111 1/4 1/4 1/2 120' 0 3/4 1/4 121 1 0 3/4 1/4 130 1 131 210. 1/4 1/4 1/2 211! 1/4 1/4 1/2 220 0 3/4 1/4 221 0 3/4 1/4 2301 231 1 310, 1/4 1/4 1/2 311 i 1/4 1/4 1/2 320 0 3/4 1/4 321 0 3/4 1/4 i 330 1 331 " i

151 and B(1T1'110 111 120 121 130 131 210 211 220 221 230 231 310 311 320 321 330 331 11C 0 3/8 5/8 a Il 0 3/8 S:; 120 1/2 1/2 0 121 1/2 1/2 0 13C 13i 2 ( 0 3/8 5/8 2111 0 3/8 5/8 2201 1/2 1/2 0 221i 1/2 1/2 0 230! 1 2310 1 3101 0 3/8 5/8 311\ 0 3/8 5/8 320 1/2 1/2 0 321 1/2 1/2 0 330 \ 1 331 \ 1 We note that 3/4 3/4 9/2 9/2 3/4 3/4 B(TFB B (n) F B 9/2 0 0 3/4 3/4 9/2 9/2 0 0 which mean:. B(rT) - B(Tr) H } (1) B where t.i[ - 0 i = 1, 20, r, n (2) Call {lli' i Z 1, 2,,,,, 18}> = vRB

152 It follows that VB = <{1/8(Oj2,-3,2,-5,4,0,0,0,,O,,O,O,O,O,O,OO), 1/4(0,O0,OOO 0,0,O-2,3/4,-1/2,3/4,-1/2,1/4,OOOOO,,OO)}> It can be easily verified that the subspace VB is invariant under both B(r) and B(n') iet, VBeB(w) C VB (3) VBeB(-C') C VB From (2) we see that VBFB = 0 i.e,, VB C Kern.(FB) Hlence' for any r (using (2) and (3)) B(I)r a (B(i) + H)r U B()r + H' where H. C Kern,(FB) i = 1, 2,.., nB Consequently, B(')rFB = B(w) rFB Summing over r and making use of the fact that n and r' are proper policies X'('T) = E (i B(') FB BC ) X'-() ral r=X which means by Lemma 9,3 X(rr), X (') It is clear from the example that the simple expression for the expected cost provided by machine B is obtained at the expense of the large size and high redundancy of the matrices of B. Each row of the policy matrix A(w) is copied (n+l)*m times in the matrix B(r) which has order (n+l) 2m One can imagine very obscure situations in which this redundancy could be used for error correction, For instance, the formal construct of expanded machine B might be useful in studying certain high reliability biological systems for which the large number of states is not a prohibitive restriction.

St.il censidering those ontimal control nroblerms for which the cost matrix is independent of the input symbols we obtain a more general and easily testable version of Theorem 91.1e Theorem 9.2 Let C(frj C (,in) for proper policies 7r and x;' If A(T)i - A(')ieVC. Kern (CT)i and VA({') C V i = 1., 2, o... then X(rr) = X(r') Proof: A(')i = A(-')i + H. where I- t V i = 1, 2,,, n.... 1 1 1 TFor any j A(.)j (A(VfJa)+lt) (*) But LH,\(.g) \T i 1 2; n Ilence A( -'' A(IT.)1 + P11 where llt V = 1, 2,.., n Multiplying by S. and (C )i S-A()j(CT) i SiA( T)(CT)i + t'(CT)i = S1A() j(CT) i = 1., 2. *, n Summing over j we obtain X(s,) X(49) Q~, E, D, We note that Theorem 9 2 can not be generalized by the substitution of V. for V in the statement of the theorems Inasmuch as the terms 1 of (*) of the form H.l i- need not be in Kern (CT),i the result (oos not follow, TPh c-t. 3-n 9 ^ 3 Gitven a.n o pt nial control problem with n+l states, actions Ctl t..... and cost matrices C(.X1) ) e, C(am). Suppose the expected costs at times t = 1, 2,, 2n for policies. and r' are equal, ioe~,

154 X('nt) = X(t',t) t = 1, 2, D, 2n Then X(-) = x(I') Proof: We construct a family of probabilistic sequential machines and test them in pairs for expectation equivalence, Consider the machines Ai <n+l, Si, S, {a}, A(w), (C(r)i)T, OAi > At <nal, Si. S, {r'}, A(n'"), (C(~'t)i)T0 O i 1 129~~~~A i = 1, 2, ase, n+l From the definitions it follows that A- E AOi X(r)i - X(r')i i = 1 2,.o, n+l From Theorem 4,,3 we get that [SiA(n)t(C(7)i)T SiA( T)t(C('r)i)T t -, 2,..., 2(n+1)-2]-= Ai E A'i But by definition SeA(f) =(C(T)i) - Xi (Trt) Hence by the transitivity of implication X(an,t) a X(a' t) x(~) X(l') t = 1, 2,,,, 2n, E D1) The assumption that costs are independent of input does not yiel tta lower botund than Theorem 9,3, 907 CIARACTERIZATION OF EQUIVALENT PROPER POLICIES Theorem 9,4 Let r and ir be proper policies with reduced (target omitted) matrices B(r) and B(r') X(ir) = X(lr')(= D[E- B(iQ)] FB =t5 where D = B(w) - B(r')

Proof: By Lemma 9e,3 we have X(1T) = X(r') (-=> X'(r) = X'(rr) By Lemma 9,2 X'(n) = x(nl') = [E I B(Tr)]' B(r)F = [E B(Tr')]' B(7rT)F( X(7r) = X(-r) if and only if [E - B(rr)] B(nr) = [E B(T)] B(r') + 11 where 1-.e Kernl (F B)a But since =n is proper [E B(R)] B(Tr) ) = B(Tr)[E - (n()] B(=)[E B(iT)] = [E - B(')] B(f(t') + H Clearing this expression of inverses we get [1 - B(rr')]B(rr) 1 B(?r)[E - B(r)] + [E - B (iT')]H[ - B(r)] Hence B(n-) B(')I + [F. B(r' + [E )][E - B(r)] or D = [B(Tr) - B(')] = [E -B(i')]HI[E - B(r)] D)[E, B(T)]i [- ()]} Multiplying by FB D[E - B(Tr)] FB = [E - 3(rr')] (IHFB) 10 which establishes sufficiency, Necessity follows from the fact that D[E - B(w)] F _= 0 D [E - B()] = H' where H!e Kern6 (F ) i = 1p 2,.. nB Write H' = [E - B(w')] ([E -- B l')H') [E - B(,')]I

156 and since B -1 B HIF = [E B(7r')]. 1-FP _ 0 i = 1, 2, 4.,^ nB Reversing the previous argument for sufficiency gives the theorem, Q E,. D* Theorem 9.4 show that two policies are equivalent if and only if one can be interchanged on the first time step for the other without affecting the total expected cost, i.e,, B B()B(W')iFB = X(.') (*) 1=0 and X(X) -= E B(')B(') iFB i=O Reversing the role of X and n' in Theorem 9f4 was required to get (*) above, 9,8 EXPERIMENTATION ON THIE EXPANDED MACHINE Given an n+l state optimal control problemthe expanded machine B has (n+l)2m states where m is the number of possible input actions. Were we to use the methods of Theorem 9,3 to obtain a bound on t* the length of the sequence of costs {Xi(,t)}t the bound would be 2(n+l)2m-2, IHowever, Theorem 9,3 can be translated directly into machine B and we get: Theorem 9oS If for the expanded machine B associated with an n+l state optimal control problem with proper policies = and a' SiB(=)tF = SiB(a')tFB for t = 1~ 2,,e 2n i = (jl' 1, n1), o~o~ (jn' n an) for arbitrary jl Ova~ jn

157 then X r)= X(W) The difference between the bound 2(n+l) m-2 for arbitrary matrices and 2n for the matrices of B clearly illustrates the special character of the matrices of the expanded machines, 9.9 OPTIMAL CONTROL PROBLEMS WITH OPTIMAL POLICIES OF SEQUENCES OF INPUTS This section differs from previous sections in two ways, All costs will depend only on the stateso But more important, the concept of policy will be extended to a function which assigns to states strings of elementary actions) i.ee D strings of symbols rather than individual symbols, We might have done this in Lemma 901 when establishing the correspondence between optimal control problems and probabilistic sequential machines, Ilowever, treating the elementary actions as symbols seemed to be the most natural procedure, The consequence of this difference in viewpoint is that the states that the machine passes through while the string comes in are not used in the policy function, Theorem 9,6 Let B = <n+l, EZ B(o) o Va c E, B 0B > be an abstract probabilistic sequential machine constructed from the n+l1 state optimal control problem with set of actions Z; and costs FB associated only with the states. StLppOSe iB has a proper policy,*:.Let R1.(B) be the expectation equivalence reduction relation defined by B, If rank 1RE(B) I = r (finite) there is a deterministic policy of strings ) % (X1-, Xn) such that X(rnD 5 X(r) for any stationary policy <

158 Proof: We modify Theorem 1,3 by defining A, (RE(x)) to be a vector of outputs A(x)F, In fact, Example 1,3 illustrates this point, The initial state I is not used in showing that RE has finite rank, but only in computing the expectation, IA(x)F, serving as output for the "state" RE[X]I Instead, the vector A(x)F could have been used as the output, Hence if R1d has finite rank r there is a deterministic machine */RE = A' with r states such that OA,(RE(X)) = A(x)F, Schematically we have (for E = {0,1} W6LG,): RE[O] O O C A(O)(A A' L - A(A)P 1: e a A(xO)F ~~1 [x~ 0 4t A(')" ~ A(x)F RE[1], 1 Figure 9,2,o The constructed finite deterministic machine with vector outputs, Any stationary policy a is defined by a stationary stochastic process, Started from initial state Si at time t, the process causes a distribution of probabilities qT (xt) over all strings x of length t, We attach the stochastic process which provides the set of distributions {(, (xst) } over input sequences to the input of'S. t=O 1 B' = E*/RE(B) which has vectorial outputs OB,(RE[x]) = B(x)F Let Xl, 0ooz xr be arbitrary representatives of the class of REt Every string x produced by the stochastic process falls into some class of REb indeed a probability distribution is induced over the classes of RF which are just the states of B'6 Call this probability PI(R[xj],t)

for the class with representative x. The occurrence of different strings of the same length are disjoint events. IIenceo t:r(R e - X >jqst) s(X t) X 1 xz(z)tD RE[X] We now show that tih, expected cost of the t th sten is givenl by Xi (.:tt) ~ P(RE [Sj ] t)~B(xj)i g j=l Using the notation of 241 we havco (where S(O) is the state at time t = O) Xi(It) E{ 7 0*(x)/S(O) = Si =-? q (x~t)S B(x)FB x (z)t s i Since each x hts a representative x. P' (RE [xl]D B( )t)tl) B + ) + PR TE[X] t) ) B which establishes the above., Consequently, thI-e expected cost of the t th step for policy'n can be viewed as an expectation of the distribution (PlR'E[x1Lt). ^ Pi(RE[X ] r t)) over the states of B I Since we have assumed the existence of a proper policy- T* r X(7T*) = Xi(7*,t) must be finite for all 1 = 1, 2, 0, n t=O Xi( r ri B (/_,,- [x-i]t)),B(x~l ix + ( Since T P: (xint) n 1 and the costs are non-negative, at least ont of the terms, say the x: diverges for any station:-ry stochastic process~ In order for Xi (n*) to be

160 finite, it follows that at least (WLG,) B(x*) FB o 0 Consider the class of stings G*(i): G*(i) = {x. x c RE[x*]} Given the machine BV = E*/RE(B); G*(i) consists of all paths from RE[A] to RE[xi] The expected total cost of the deterministic sequence x = ili is just the sum of the i'th components of the outputs of the states of B0 E(total cost of x) = ( B(ioi j)i B The shortest path from RE[A] to RE[x;] is a good candidate for minimality of cost, Of course, depending on the values attached to states, a longer one might have lower cost, Suppose x~ is a deterministic string with minimal total cost of 1 all strings in RE[x'i]. We show any stationary policy is no less costly than the policy rrD = (Xl,,6 Xn) Let a be a stationary proper policy. As we have shown Xi ct t) j=l P(RE [Xj]t)B(xj)iPB In order for Xi (i) to be finite lim P. i(RE[xi]t) = 1 t -> 0 That is, all strings produced with positive probability by the stochastic process defining w are ultimately in RE[xi], If a string z has occurred, then its substrings must have occurred also, Ience the total cost of the policy T from state Si at time t is just: X'i(rt) _ Prob,(z occurs).E(total cost of z) all zsCZ* Lg(z)=t

161 1Hen ce X(it") li n Prob.(z occurs) E(total cost of z) Q, (z) —, F ze*, (=) =t Since all z with positive probability of occurring are in RE[xI]: Xi(n) U li r qs (Z, g(z))oE(total cost of z) 9, g(z) -.oo all z'. ZE K[] i But by (defi nit ion XXi(n) E (total cost of xi)! E(total cost of z) for all zeRE[x] Substituting we get Xi(,~) -_ lira (. _ q;s (ZEg.(z))),Xi(D) Qg(z) —->o zeR [xi ] i But since all z which occur with positive nrobability are in RE[xi] as Zg(z) — co lim() z( Ex s. (z,Zg(z))) = 1 Ro z [X] 1 which gives Xi (r) > Xi(nD) Q, E. De Derman [1962] has shown that the approach of Eaton and Zadeh cannot be improved upon by collecting added information, That is, a policy consistin7 of detcrrninistic selection of input symbols based upon knowledge of the state alone has as low a cost as any other type o.'C p.lticy~' IP.r Theoremi 9:6 iwe have shown that for the class of machines which have rank of RE finite, t~he state information is not always needed, An optinal policy consisting of sequences of inputs can be found which disre...-(.s intelrnmediate states.

CIIAPTER 10 SUMMARY AND OPEN PROBLEMS 10l 1 SUMMARY Let us summarize the most important original aspects of the work~ Chapter 1 contains a method of constructing a finite deterministic machine (if one exists) which is expectation equivalent to a given probabilistic sequential machine (Theorem 1,3 and Theorem 1,7), In Chapter 2 the method is extended giving invariant subspace conditions for the existence of an inputsstate calculable (deterministic switching but random outputs) machine which is N-moment equivalent to a given probabilistic sequential machine (Theorems 2,4 and 25S)> Chapter 3 raises the criterion of interchangeability as submachines as a condition of behavioral equivalence, Indistinguishability is the only equivalence which meets the interchangeability condition (Theorem 384), It is also proven that the class of finite deterministic machines can distinguish between two tape equivalent machines which are not expectation equivalent (Theorem 3,2)> Chapter 4 provides bounds on the length of strings necessary for deciding AE (Theorem 4o3)9 EN (Theorem 485), the reduction relations RE (Theorem 4,4) and RN (Corollary 45)> Chapter 5 extends the Rabin reduction theorem for probabilistic automata to probabilistic sequential machines (Theorem 5o2) A finite set of invariants exists for =T for isolated cutpoint machines (Theorem 5.3)> Using the notion of stability congruence relation, a sufficient condition is found for the existence of a finite set of invariants for -T (Theorem 5 4)8 Lastly, two new properties are defined, the balance property and stability near the cutpoint, which lead to a class 162

163 of machines with a finite set of invariants for -T (Theorem 5.5)0 Chapter 6 considers stability problems for EE -N' and simultaneously, Properties of general stability transformations are obtained (Theorem 601 and Theorem 602) in terms of error matrices, Properties of error matrices for the three cases above are summarized in Table 6010 The notions of invariant error matrices and eigenerrors are introduced to allow the three equivalences to be studied together. A sufficient condition for the existence of eigenerrors is found (Theorem 604). A symbol matrix of a cyclic permutation of all the states (n > 3) has no nontrivial eigenerrors (Theorem 6~5), Finally, it is shown that stability transformations associated with invariant error matrices preserve the behavior of eigenstates the stochastic vectors mapped into themselves by the symbol matrices (Theorem 6~6 and Theorem 6~7)> Chapter 7 brings out the essential difference between stability problems for aT and the machine equivalences of Chapter 6 by translating the tape acceptance of a machine into the notation of convex sets, For nonsingular machines, necessary and sufficient conditions are obtained (Theorem 7~2 and Theorem 7~4), A sufficient condition for nonsingular machines to be equivalent by ED is found as a by-product (Theorem 703)~ Extensions to certain singular cases are possible~ Chapter 8 shows the essential difference between minimization of states for T and the equivalences EE, EN, and io It is proven formally that a minimal state machine can be obtained via state reduction for and I (Theorem 8l1 and Theorem 8~ 2)o Examples are presented showing the failure of state reduction methods for ET (Remark 803)0 A necessary condition is provided for state reduction to be commutative for.T (Theorem 8,3). Finally, machines with four states are discussed

(Teonfl. ri "1 i]. Iv {C u;' c-l i t, ttimEd rt;t ics, f i ct-;s r-dCt i -' i for T (Tcoheorcm 8 4:t Chapter 9 connects probabilistic sequcntial machines with oTptimal control theory, Several ecuivalences hav, i L; experimental intcrpreta. tions are defined for i';i.put a-nd outtput eventse.A\I machine equivalences in this thesis are related by Figure 9,1, A transiation i.s made between the terminology of probabilistic machines and that of optimal control (Lemma 9,1). The subject of equivalent policies is considerctd for certain special cases (Theorems 9,1, 9,2 and 9,3)> A characterization of equivalent proper nolicies:For machines with costs associated with the states is obtained (Theoremr 964), The reduction relation R.E of Chapter 1 is used to obtain a sufficient condition for the existence of optimal policies consisting of strings rather than symbols (Theorem 9,6), 1002 OPEN PROBLEIS:t gret m::rj'y o)0enl Problems have been suggested by this researclh, There are a rather large number of new concepts contained in th}is thesis which seem to deserve further study, In Chapter 2, there is no bound relating the number of states of an input-state calculable machine to the number of states of an N-moment equivalent probabilistic sequential machine, Perhaps the method of Theorem. 5,1 could be used to obtain such a bound, In addition. the methods of (hapter 2 may be able to be extended to given an inputstate calculable machine equivalent by -T to a probabilistic sequential mach ine It would be interesting to know tbounds similar to those of Chapter 4 for d(eclidinr' whether the new eqoulivalences of Chapter 9 hold between arbitrary machines~

165 O'rtO. hi thie princirTal results of Chapter 5 is not satisfactory in that it is either a characterization of certain isolated cutpoint machines or else is a characterization of a broader class than isolated cutpoint machines (Theorem 5.5)., Finding out. which is the case involves either finding a very fortunate examrsle or additional results concerning the basic concepts involved, The discovery of any classes of nonisolated cutpoint machines which have a finite set of invariants for -T would be very enlightening, The stability problems of Chapter 6 and Chapter 7 seem to provide a very rich area of study, The concept of invariant error matrix, when fully developed theoretically, may provide insight into decompositions of probabilistic machines,. The notion of lumpability used by Bacon leads to one kind of invariant error matrix and to a decomposition which is equivalent to the original machine by =-I Other kinds of invariant error matrices may lead to decomposicLions eqAuivalent by different machine equivalences, The theorey of convex sets seems especially relevant to the equivalences E and =i The machine interpretation of additional basic notions of convex sets and the convex set interpretation of additional machine concepts would make the connec~ tion shown in Chapter 7 more important, An interesting class of problems arises if the probabilities of the probabilistic sequential machines are identified with physical quantities, Suppose we construct a voltage analog of the expectation of a probabilistic sequential machine using n voltage storage devices, The output values i" correspond to amplification factors, Let us pass its analog output (corresponding to EA(x)) through a threshold device which emits a 1 when the voltage is greater than or equal to X

1 06 anrd a () othe-rwise, Consequently, the constructed device is able to compute T(A,X) by analog means, But how is the size of such a device related to the size of a deterministic machaine which also recognizes the set T(A,A)? If we have a deterministic machine D which recognizes a set T(D) how do we find the smallest (by number of states which are voltage storage devices in this example) probabilistic automata;\' which recognizes T(A',X) = T(D)? A partial answer to the first question is provided by the generalization of the Rabin-Paz bound (Thecrem 5,2)> We let nD be the number of states of the deterministic machine, n the number of voltage storage devices of the analog tape recognizing machine: n (1 + )n-1 where d =F F VD 26max min The term 26 can be interpreted as the minimum separation in expectation that can be allowed between inequivalent input sequences. In this example, 6 is related to the precision of voltage measurement, Unfortunately no characterization of the class of machines for which the bound is met exactly is known, The Appendix contains probabilistic machines for which no tape equivalent deterministic machine meets the bound. It is easy to show that the following relationship holds between two state "cores" of D and the voltage storage devices, Number of voltage storage devices > 6 Number of "cores" * 1, log2 (1 + 2 The answer to the second question, providirg an algorithm for going from a set of tapes accepted to a minimal state probabilistic machine which accepts the same set of tapes would provide interesting results for the representation theory of abstract algebras6

A^;A PP ENi)I X We exhibit a class of machines for which the Rabin-Paz bound is non-minimal. Each member of the class will accept the same set of tapes as some finite detcxrministic machine, but the bound will be too high by a factor of at most ea Let the class be C = {A): p F [0,1]} where A(p) <29 (1, 0), {0}, A(0), t)' 0A> and A(O)- 1-p pl [IA(Or)]1 = [(10) P Pr r E or'r.Ax (P) =' (l'p) pick some finite r' and let r_ rf+l (lp) - l-p) 26 Hence 26 (l-p) p Pick = (-) - 6 which makes it an isolated cutpoint with separation 6, It is clear that T(A(P) X) = {Ot 0 t 5 r'} Consequently there is a deterministic machine D with rt+2 states (and no less) such that T(A() X) = T(C)) The Rabin-Paz bound gives D ( 1 1 (Nap) p 167

1i68 N,"oItc that,:;, r:, h-te utnctt on (i-p)" p is con)tinulous with respect to n and has i "inimuin vai ue ot r-r r= -. Hence r'*l f (p) > f(P) = As we have seen above, nD = r'+2, Substituting the values in the bound we get r'+2 < 1 + i r We observe that l imn " = e,= 2,71828,,, rt -- (1o Furtherrmore, by checking a few terns, it can be seen that -] ~ T- r"~;"< <e for any finite r' ( -r )iewever, for these probl)abilistic machines, a good approximation for nD is obtained by dividing the Rabin-Paz bound by e, For the following case the bound can be divided by e also It is clear that if the n state "probabilistic" machine of the bo.und is actually a deterministic automaton, we obtaini 1 nIl.. <....'or nr S 5 It is not known whether there are other classes of machines for which the Rabin-Paz bound is too high by the factor of e,

B I BL IOGRAPI IY lo Arbib, l, [1964], "Automata Theory and Control Theory -— A Rapprochement' (unpublished manuscript to appear in SIAM), 2. Arnold, R, Fo [1964], unpublished communication, 3, Bacon, CG C, [1964], "'The Decomposition of Stochastic Automata," Information and Control, September 1964, ppo 320-329~ 4, Braines, S, Na and Svechincky, V, B, [1962], "'Matrix Structure in Simulation of Learning" IoR,Eo Transactions on Information Theory Vol ITSS, Nod 5, Sept,, 1962, ppa 186-190O 5, Burks, Ad W, [1961], "Computation, Behavior and Structure in Fixed and Growing Automata," Behavioral Science, Vol, 6, No, 1, January 1961, 6, Burks, A, W0 and Wang, H, [1957], "The Logic of Automata -Part II," JACM, Vold 4, No, 3, July 1957, pO 286, 7, Carlyle, J, W, [1961], "Equivalent Stochastic Sequential Machines," Institute of Engineering Research Report Series, No, 60, Issue No, 415, Electronics Research Laboratory, University of California, Berkeley, Californian 8, Carlyle, J, W, [1965], "State-Calculable Stochastic Sequential Machines, Equivalences, and Events," 1965 IEEE Conference Record on Switching Theory and Logical Design, pp, 258-263, 9, Derman, C, [1962], "On Sequential Decisions and Markov Chains," Management Sci, Vol, 9, p, 16G 10, Eaton, J, oH and Zadeh, Ld A, [1962], "Optimal Pursuit Strategies in DisCrete State Probabilistic Systems,"' Trans, ASME (Journal of Basic Engineering), March 1962, pp, 23-29e 11, Gantmacher, F, R, [1959], The Theory of Matrices, Vol, 2, Chelsea Publishing Co,, N, Y,, N, Y,, pp. 236-237, 12, Howard, R, Ao [1960], Dynamic Programming and Markov Processes M,1I,T, Press and John Wiley and Sons, Inc,, New York, (Chapter 3) 13, Kemeny, J, Go t al [1959], Finite Mathematical Structures, Prentice Hall, Inc,, En glewood Cliffs, NJ 1959,-pp, 345 346, 14o Moore, E, FG [1956], "Gedanken-experiments on Sequential Machines", Automata Studies, C, E, Shannon and J, McCarthy eds,, Princeton University Press, pp, 129-153, 169

170 15, Paz, Ao [1964], "Probabilistic Automata, Part I and Part II,'" (to appear in Information and Control), 16. Rabin, M4 0 [1964]1'"Probabilistic Automata," Se uentia Machines, Selected Papers, Edited by E Fr, Moore, Addison-Wesley Publishing Co,, Reading, Mass,,, ppd 98-114a 17, Rabini, MN1 0,, and Scott; D [1959], "Finite Automata and Their Decision Problems," IBM Journal Res,, and Dev,, 3, 1959, pp, 114-125, 18, Radstrom, II, (19491K "Polar Reciprocity," Institute for Advanced Study Seminar on Convex Sets, 1949-1950, (U, of Michigan Library QA-248aP95), ppo 27-29, 19e Shannon, C, E, and Weaver, W. [1948], The Mathematical Theory of Communication, University of Illinois Press, Urbana, pp. 346 20, Thrall, R, M, and Tornheim, L, [1957], Vector Spaces and Matrices, John Wiley and Sons, Inc,, New York, pp., 298-300,

DISTRIBUTION LIST (One copy unless otherwise noted) Technical Library Naval Electronics Laboratory Director Defense Res. & Eng. San Diego 52, California Room 3C-128, The Pentagon Attn: Technical Library Washington, D.C. 20301 Dr. Daniel Alpert, Director Defense Documentation Center 20 Coordinated Science Laboratory Cameron Station University of Illinois Alexandria, Virginia 22314 Urbana, Illinois Chief of Naval Research 2 Air Force Cambridge Research Labs Department of the Navy Laurence C. Hanscom Field Washington 25, D.C. Bedford, Massachusetts Attn: Code 437, Information Attn: Research Library, CRMXL R Systems Branch U. S. Naval Weapons Laboratory Director, Naval Research Laboratory 6 Dahlgren, Virginia 22448 Technical Information Officer Attn: G. H. Gleissner, Code K4 Washington 25, D.C. Asst. Dir. for Computation Attention: Code 2000 National Bureau of Standards Commanding Officer 10 Data Processing Systems Division Office of Naval Research Room 239, Building 10 Navy 100, Fleet Post Office Box 39 Washington 25, D.C. New York, New York 09599 Attn: A. K. Smilow Commanding Officer George C. Francis ONR Branch Office Computing Laboratory, BRL 207 West 24th Street Aberdeen Proving Ground, Maryland New York 11, New York Office of Naval Research Office of Naval Research Branch Branch Office Chicago Office 219 South Dearborn Street 495 Summer Street Chicago, Illinois 60604 Boston, Massachusetts 02110 Commanding Officer Naval Ordnance Laboratory ONR Branch Office White Oaks, Silver Spring 19 1030 E. Green Street Maryland Pasadena, California Attn: Technical Library Commanding Officer David Taylor Model Basin ONR Branch Office WashingtonD.C. 20007 1000 Geary Street Attn: Code 042, Technical Library San Francisco 9, California

DISTRIBUTION LIST (Concluded) The University of Michigan Lincoln Laboratory Department of Philosophy Massachusetts Institute of Technology Attn: Professor A. W. Burks Lexington 73, Massachusetts Attn: Library National Physical Laboratory Teddington, Middlesex, England Office of Naval Research Attn: Dr. A. M. Uttley, Supt. Washington 25, D.C. Autonomics Division Attn: Code 432 Commanding Officer Kenneth Krohn Harry Diamond Laboratories 6001 Dunham Springs Road Washington, D.C. 20438 Nashville, Tennessee Attn: Library Mr. Laurence J. Fogel Commanding Officer and Director General Dynamics/Astronautics U. S. Naval Training Device Center Division of General Dynamics Corp. Port Washington San Diego, California Long Island, New York Attn: Technical Library Department of the Army Office of the Chief of Research and Development Pentagon, Room 3D442 Washington 25, D.C. Attn: Mr. L. H. Geiger National Security Agency Fort George G. Meade, Maryland Attn: Librarian, C-332

UNCLASSIFIED Security Classification DOCUMENT CONTROL DATA - R&D (Security claealltcation of title. body of abetract and indexing annotation must be entered when the overall report ia claaltlJd) I. ORIGINATIN G ACTIVITY (Corporate author) a. REPORT SECURITY C LASSIFICATION Logic of Computers Group UNCLASSIFIED The University of Michigan ab. GROUP Ann Arbor, Michigan 48104 3. REPORT TITLE EQUIVALENCES BETWEEN PROBABILISTIC SEQUENTIAL MACHINES 4. DESCRIPTIVE NOTES (Type of report and Inctusive datee) Technical Report S. AUTHOR(S) (Loet name. firat name, initial) Page, Carl V. 6. RNEPORT DATE r7. TOTAL NO. OF PAGES 7b. NO. OFr NRe November, 1965 178 20 -a. CONTRACT OR GRANT NO. 9a. ORIGINATOR'S REPORT NUMBIR(S) Nonr 1224(21) 03105-41-T b. PROJECT NO. c. A.6. OTM.IR R PORT NO(S) (AnY other numbre that ny be asbened thia report) d. 10. A V A IL ABILITY/LIMITAtION NOTICES Qualified requesters may obtain copies of this report from DDC. 11. SUPPL EMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY Office of Naval Research Department of the Navy Washington 25, D.C. 3 BSTRACT This report introduces new definitions of behavioral equivalence and shows relationships among the various notions of behavioral equivalence between probabilistic machines. Four basic problems are discussed for several different behavioral equivalences between machines. In what follows we use the symbol "-" as a variable to denote one of the several behavioral equivalences considered in this report. Given an arbitrary probabilistic sequential machine A: (1) Is there an input-state calculable machine A' (a machine with deterministic switching and random outputs) such that A _ A'? (2) What are the machines A' with minimal number of states such that A - A'? (3) How does one obtain all stable modifications of A, i.e., all machines A' such that the switching probabilities of A' and A differ but A - A'? (4) Is there a finite bound on the length of experiments required to establish whether A A' holds? (U) The four basic problems are solved completely for some equivalences between machines and are left open for other equivalences. Some applications are made to optimal control problems. (U) DD *J 1473 UNCLASSFIED Security Classification

UNCLASS IF IED Security Classification 14. LINK A LINK B LINK C KEY WORDS ~KEY WORDS ROLE WT ROLE WT ROLE WT 1. sequential machine 2. automata theory 3. communications channel 4. stochastic machines 5. discrete stochastic processes 6. markov processes with rewards 7. congruence relations 8. probabilistic automata INSTRUCTIONS 1. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statements of the contractor, subcontractor, grantee, Department of De- such as: fense activity or other organization (corporate author) issuing (1) "Quaed requesters may obtain copies of this the report. (1) "Qualified requesters may obtain copies of this the report. report from DDC." 2a. REPORT SECURITY CLASSIFICATION: Enter the, over (2) "Foreign announcement and dissemination of this all security classification of the report. Indicate whether "Restricted Data" is included. Marking is to be in accordance with appropriate security regulations. (3) "U. S. Government agencies may obtain copies of this report directly from DDC. Other qualified DDC 2b. GROUP: Automatic downgrading is specified in DoD Di- users shall request through rective 5200. 10 and Armed Forcet Industrial Manual, Enter the group number. Also, when applicable, show that optional." markings have been used for Group 3 and Group 4 as author- (4) "U. S. military agencies may obtain copies of this ized. report directly from DDC. Other qualified users 3. REPORT TITLE: Enter the complete report title in all shall request through capital letters. Titles in all cases should be unclassified.." If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis (5) "All distribution of this report is controlled Qualimmediately following the title. ified DDC users shall request through 4. DESCRIPTIVE NOTES: If appropriate, enter the type of.. report, e.g., interim, progress, summary, annual, or final. If the report has been furnished to the Office of Technical Give the inclusive dates when a specific reporting period is Services, Department of Commerce, for sale to the public, indicovered. cate this fact and enter the price, if known. 5. AUTHOR(S): Enter the name(s) of author(s) as shown on 11. SUPPLEMENTARY NOTES: Use for additional explanaor in the report. Enter last name, first name, middle initial. tory notes. If:r.litary, show rank and branch of service. The name of the principal auithor iM an absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the name of the departmental project office or laboratory sponsoring (pay6. REPORT DATE. Enter the date of the report as day, ing for) the research and development. Include address. month, year; or month, year. If more than one date appears on th report use date of publication13. ABSTRACT: Enter an abstract giving a brief and factual summary of the document indicative of the report, even though 7a. TOTAL NUMBER OF PAGES: The total page count it may also appear elsewhere in the body of the technical reshould follow normal pagination procedures, i.e., enter the port. If additional space is required, a continuation sheet shall number of pages containing information. be attached. 7b. NUMBER OF REFERENCES: Enter the total number of It is highly desirable that the abstract of classified reports references cited in the report. be unclassified. Each paragraph of the abstract shall end with Ba. CONTRACT OR GRANT NUMBER: If appropriate, enter an indication of the military security classification of the inthe applicable number of the contract or grant under which formation in the paragraph, represented as (TS), (S), (C), or (U) the report was written There is no limitation on the length of the abstract. How8b, 8c, & 8d. PROJECT NUMBER: Enter the appropriate ever, the suggested length is from 150 to 225 words. military department identification, such as project number, 14. KEY WORDS: Key words are technically meaningful terms subproject number, system numbers, task number, etc. or short phrases that characterize a report and may be used as 9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi- index entries for cataloging the report. Key words must be cial report number by which the document will be identified selected so that no security classification is required. Identiand controlled by the originating activity, This number must fiers, such as equipment model designation, trade name, military be unique to this report. project code name, geographic location, may be used as key 9b. OTHER REPORT NUMBER(S): If the report has been words but will be followed by an indication of technical conassigned any other repcrt numbers either by the oriinato text. The assignment of links, rules, and weights is optional. or by the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATION NOTICES: Enter any limitations on further dissemination of the report. other than those UNCLASSIF ED Security Classification

UNIVERSITY OF MICHIGAN 11111103695 2292 IIlll Illlt ifillltil4im