LOGIC OF COMPUTERS GROUP Department of Computer and Communication Sciences 2080 Frieze Building The University of Michigan Ann Arbor, Michigan 48104 W- I-IV., ^^ ^A

THE UN IVh RSITY OF MI CHI GAN COLLEGE OF LITERATURE, SCIENCE, AND THE ARTS Computer and Communication Sciences Department ERGODIC MACHINES —PROBABILISTIC AND APPROXIMATE HOMOMORPHIC SIMPLIFICATIONS Sudhir Aggarwal August 1975 Technical Report No. 184 with assistance from: National Science Foundation Grant No. DCR71-01997 Washington, D.C.

CHAPTER I INTRODUCTION Since the age of the electronic computer, modelling and simulation has become an extremely important and popular activity in the investigation of complex problems. We find computer simulation models being used in every imaginable discipline. These simulation models usually have the common feature of being simplifications of a more complex, hypothesized "base" model. In the study of the brain or the economy, for example, we do not have the capability of simulating the base model that we imagine represents reality, and instead, we are forced to simulate a more tractable "lumped" model. The models that we simplify in order to simulate often appear conceptually as probabilistic models. This thesis investigates methods to study probabilistic models as well as the simplification of probabilistic models. Before elaborating on this, we discuss the formal framework that we shall use and we make clear what we mean by a model. 1.1 The Formal Framework Recent work by Zeigler [41-45] has provided a formal mathematical foundation for modelling and simulation. In this approach, modelling and simulation is viewed as the study of the interrelationship among the quintuple of Real World, Base Model, Lumped Model, Computer, and Experimental Frame. The Real.World is considered to be a set of input-output data. The modeller conceptualizes his structural beliefs 1

2 about the real world by constructing a base model that is capable of accounting for the real world input-output data. This base model is in general very complex and cannot be studied by simulations on a computer. Thus, of necessity, the modeller constructs a lumped model that can be simulated and that accounts for some input-output behavior of interest. Since the lumped model is a simplification, we do not expect it to be able to explain all the input-output behavior. The role of the experimental frame is to provide a formal method to restrict the input-output behavior that a lumped model must match. Different experimental frmnes of course yield different possible valid lumped models. Finally, the conceptual lumped model, which is considered as simply a mathematical entity, must be translated into a program that can be run on the computer. If we let R be the Real World, B the Base Model, L the Lumped Model, and C the Computer, we can visualize the process as one involving the following transformations: R - B -> L + C. The transformation B -+ L is the simplification stage, and it is here that model validation should also be found. We shall be studying this stage of the transformation in the case of a probabilistic base model. We study simplifications to both probabilistic as well as deterministic lumped models. The stage L -+ C is a verification stage that checks that the computer model in fact does what it is supposed to. This approach may be compared with a more typical approach. In the more classical approach, one considers essentially R -+ L + C. The validation stage now involves comparing R to C. Unfortunately, since R is the real systemand Cthe computer model, the criteria for validation appear imprecise and ad hoc, often depending totally on statistical tests.

ERGODIC MACHINES —PROBABILISTIC AND APPROXIMATE HOMOMORPHIC SIMPLIFICATIONS by Sudhir Aggarwal A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer and Communication Sciences) in The University of Michigan 1975 Doctoral Committee: Associate Professor Bernard P. Zeigler, Chairman Professor Frederick J. Beutler Professor Arthur W. Burks Professor John H. Holland Associate Professor John F. Meyer

ACKNOWLEDGEMENTS I express my appreciation to Bernard Zeigler, Frederick Beutler, Arthur Burks, John Holland, and John Meyer for their faithful service as members of my committee. John Holland first sparked my interest in this area of computer science, and Arthur Burks first initiated me into research. Most of all, I thank Bernard Zeigler for giving so generously of his time, his ideas and his experience. He was always ready to discuss any problem, and invariably successful in suggesting new leads. The Logic of Computers Group provided me with partial financial support as well as a congenial atmosphere in which to pursue research. I am grateful to the present and past members of this group for their many kindnesses. In particular, a sincere thank you to Albert Bethke with whom I spent long hours developing and testing the simulation package. The moral support that I needed was derived from my wife, Mridula. My parents and my brother provided encouragement. The excellent and speedy job of typing was done by Alice Gantt to whom I extend my thanks and my appreciation. ii

TABLE OF CONTENTS ACKNOWLEDGEMENTS......................... ii LIST OF TABLES...........v.............. v LIST OF ILLUSTRATIONS................... vi LIST OF APPENDICES........................viii Chapter I. INTRODUCTION.................... 1 1.1 The Formal Framework............... 1 1.2 Homomorphisms..................... 3 1.3 The Context......4........... 4 1.4 Thesis Organization.................. 6 II. ERGODIC MACHINES.................... 8 2.1 Ergodic Transformations................ 8 2.2 The Ergodic Theorem................. 10 2.3 An Ergodic Machine.................. 11 2.4 Probabilistic Automata................12 2.5 Ergodic Machines and Probabilistic Automata......15 2.6 Information.................22 2.7 Canonical Ergodic Machines and Multiplicative Generators.................. 32 2.8 Universally Faithful Transformations........ 38 2.9 Pseudo-Ranaom Number Generators......54 2.10 Parallel Seeds and Permutation Independence..... 56 2.11 Nonautonomous Ergodic Machines..........64 2.12 Conclusion......................65 III. APPROXIMATE HOMOMORPHISMS —SIMPLIFICATIONS IN A NETWORK OF NEURONS...........67 3.1 Approximate Homomorphisms.......... 69 3.2 The Neuron Model...................70 3.3 The Computer Simulation............... 79 3.4 Single Block —No Connections............. 81 3.5 Single Block —Positive Feedback........... 93 3.6 Two Blocks — A Flip-Flop............... 115 3.7 Complexity...................... 127 3.8 Conclusion...................... 127

IV. PROBABILISTIC HOMOMORPHISMS AND SIMPLIFICATIONS.... 131 4.1 Probabilistic Homomorphisms................ 131 4.2 Simplification of the Ideal Base Model........135 4.3 Finite Number of Components in the Base Model..142 4.4 A Probabilistic Lumped Model............. 154 4.5 Finite Number of Neighbors............ 165 4.6 Conclusion......................179 V. FUTURE DIRECTIONS.....................180 APPENDIX................184 REFERENCES...............88 i:.i

LIST OF TABLES Table 3.4.1 X2 Statistics of Base Model vs. Lumped Model (Part 1)... 87 3.4.2 X2 Statistics of Base Model vs. Lumped Model (Part 2)... 88 3.4.3 Second Level Chi-Square Analysis............ 89 3.6.1 Base Model Unset Times from the Set State....... 124 3.6.2 Minimum Input to Lumped Model to Unset It....... 125 3.6.3 Reset Behavior of the Base Model from the Set State.. 126 3.7.1 Complexity Analysis of Base Model vs. Lumped Model.... 128 4.4.1 Probabilistic Lumped Stability vs. Base Model Stability.. 163 4.4.2 Probabilistic Lumped Reset vs. Base Model Reset.... 164 5.1.1 Synopsis of Results for Neural Network Example...... 181

LIST OF ILLUSTRATIONS Figure 2.8.1 How S(t) Monotonically Decreases.............. 43 2.8.2 An Oscillating Function................ 51 3.4.1 Lumped Model Steady State Distribution a = 10....... 84 3.4.2 Lumped Model Steady State Distribution a = 20....... 85 3.4.3 Error Propagation.................... 91 3.5.1 Lumped Model Output Behavior............... 94 3.5.2 Heuristic Calculation of Lumped Steady State........ 101 3.5.3 Lumped Model Two Dimensional Force Field......... 103 3.5.4 Base Model Run 400 Neurons 50 Neighbors....... 104 3.5.5a Base Model TwoDimensional Run 400 Neurons 50 Neighbors.. 106 3.5.5b Base Model Two Dimensional Run 400 Neurons 50 Neighbors..107 3.5.6 Base Model Run 400 Neurons 100 Neighbors......... 108 3.5.7 Base Model Run 1000 Neurons 20 Neighbors......... 109 3.5.8 Base Model Run 1000 Neurons 30 Neighbors...... 110 3.5.9 Base Model Run 1000 Neurons 50 Neighbors........ 111 3.5.10 Base Model Run 400 Neurons 60 Neighbors........ 112 3.5.11 Base Model Run 800 Neurons 60 Neighbors........ 113 3.5.12 Base Model Run 50 Neurons 50 Neighbors........ 114 3.6.1 Partial Region of Stability Graph.......... 119 3.6.2 Entire Region of Stability Graph........... 120 3.6.3 Lumped Model Two Dimensional Force Field for Flip-Flop.. 122 3.7.1 Running Time Complexity (1)............ 129 3.7.2 Running Time Complexity (2)................ 130 4.4.1 Probabilistic Lumped Model Run 50 Neurons......... 156 vi

4.4.2 Probabilistic Lumped Model Run 100 Neurons......... 157 4.4.3 Probabilistic Lumped Model Run 200 Neurons........ 158 4.4.4 Probabilistic Lumped Model Run 400 Neurons........ 159 4.4.5 Base Model Run 100 Neurons 100 Neighbors......... 160 4.4.6 Base Model Run 200 Neurons 100 Neighbors........ 161 m.i

LIST OF APPENDICES I. MATHEMATICAL NOTATION..................... 182 II. THE SIMULATION........................ 184 viii

ABSTRACT ERGODIC MACHINES —PROBABILISTIC AND APPROXIMATE HOMOMORPHIC SIMPLIFICATIONS by Sudhir Aggarwal Chairman: Bernard P. Zeigler Computer simulation models have been widely used to analyze complex problems. However, it is only recently that there has been an attempt to provide a formal foundation for valid modeling and simulating. One promising approach is to consider the quintuple of real world, base model, lumped model, computer, and experimental frame. The base model incorporates all structural beliefs about the "real world", whereas the lumped model is a simplified version that can be simulated on the computer. In this dissertation, we introduce the idea ot a formal deterministic model, whichwe have called an Ergodic Machine, to study.probabilistic systems. An ergodic machine represents probabilistic features by using an ergodic transformation on a "hidden" state space. An ergodic transformation can be thought of as an ideal random number generator. The concepts of ergodic machines are developed and connections are made to Markov chains, and to random number generators. Next, we consider a particular example of an idealized network of neurons in a formal framework. We experimentally study computer simulations of base and lumped models of this network of neurons, using 1

2 a series of computer programs developed for this purpose. We, show how a deterministic lumped model can preserver information, although approximately, of a probabilistic base model. Finally, the concepts of ergodic machines are shown to be- useful in considering valid simplifications of probabilistic models. The particular example of the network of neurons is used to illustrate a valid probabilistic lumped model. This lumped model is formalized as an ergodic machine that is a homomorphic image of the ergodic machine. representing the base model. Thts, precise meaning is given to valid simplifications of probabilistic models. Hopefully, work such as this will provide the foundations necessary to allow modellers in such varied fields as business, economics, and biology to more easily construct meaningful and valid computer models.

3 In the approach adopted by Zeigler, the problem of validation appears quite clearly in the B - L transformation. A lumped model is valid if it generates the same input-output behavior as the base model under some experimental frame. Validation thus involves studying two formal models and how they relate. This brings us to the question of the types of models that we wish to study. We are considering formal mathematical models as defined in Mathematical Systems Theory [Kalman, Falb and Arbib (22)] or Automata Theory [Hartmanis and Steams (21)]. The probabilistic models that we consider are stochastic automata such as in the development by Paz [31]. In the systems theory and automata theory approach, the concept of a state variable is fundamental. Given two different models with welldefined state spaces, one can study maps between such models that preserve properties of interest. Two particular types of maps are those that preserve input-output behavior and those that preserve state transition behavior. A map from the state space of one deterministic model to the state space of the other model that preserves state transition behavior is said to be a homomorphism. Homomorphisms are natural simplifications since they preserve input-output behavior. A homomorphism is really a technique that allows us to aggregate variables and still preserve this input-output behavior within an experimental frame. This concept of a homomorphism is one that arose in the study of deterministic systems and is well-defined in such a context. 1.2 Homomorphisms Let A and B be two deterministic systems with state sets SA, SB and state transition functions 6A,6B. Let AAB be the output functions. I ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ A A h uptfntos

4 Then a mapping h:SA - SB is said to be a homomorphism if it satisfies two conditions: (1) ho6A = 6Boh (2) iA = XBoh These two conditions allow us to show that if a homomorphism exists from A + B, then the input-output behavior of A and B are also the same. Consequently, when we consider simplifications of deterministic base models B to lumped models L, we shall attempt to find a homomorphism from B o L. Unfortunately although the idea of homomorphisms has been extensively studied for the case of deterministic systems, the use of the homomorphism idea has not really been applied to probabilistic systems. 1.3 The Context Much of the literature on probabilistic automata has concentrated on analyzing probabilistic systems in isolation, analogously to studying deterministic machines in isolation. The question that isnot discussed is to what extent a deterministic system can serve as a model of a probabilistic system. Paz [31] provides a good account of the work on probabilistic automata. We find that much of the work is oriented to problems such as state minimization and problems of state equivalence. Work along such lines has been done by Carlyle [10] and Bacon [6]. Other- investigations are also directly related to this, such as the work on state splitting by Santos [36]. Page [29] does consider deterministic and probabilistic sequential machines but from a different aspect. He considers the expected value of output of a probabilistic machine given a specific input string, and then defines two machines as equivalent if they have identical

5 expected values of output. Again, approximate homomorphic behavior is not investigated. We analyze these problems by taking a totally different approach. Since we know much more about deterministic systems than we do about probabilistic systems, why not find a way to make use of this knowledge? We do this by introducing the concept of an ergodic machine that is a deterministic representative of a probabilistic automaton. It turns out that it is more natural to consider structural simplification of ergodic machines than it is to directly consider simplified probabilistic systems. The translation into ergodic machine terms also gives more insight into the structural behavior of probabilistic automata. The approach allows us to use existing concepts of homomorphisms and apply them directly to the ergodic machine representations. We also discuss the simplification;of a probabilistic system to a deterministic lumped model. We analyze this problem in the context of a neural network model proposed by Zeigler [44]. In this analysis, we consider an aggregation of neurons into blocks and study the relationship between this new component, the block and the original pool of neurons. Although we study the dynamics of collections of neurons, the aim is to study the lumping process. Investigations along these lines are also being done at Syracuse University. Harth, et al. [20] and Anninos, et al. [3] have explored cooperative phenomena in neural networks and have proposed an approach they call Netlet theory. Netlets are simply collections of neurons that have statistically identical properties and are connected in a uniform (though probabilistic) manner. Although the netlet model is a deterministic one, the assumptions involved in the way neurons are interconnected allow one to probabilisti

6 cally predict the firing level at time t+l given the firing activity of the net at time t. The approach adopted by this group is a detailed analysis of the netlet dynamics using the assumptions they make about the connection pattern of the neurons. The model that we use is a probabilistic one (because of assumed noise) and reduces to their model when noise is removed. We also use the lumped model explicitly to predict base model behavior. We are in general interested in aggregating variables and forming lumped models that are more amenable to computer study than the. original conceptual base model. The problem of such modelling is really a question of interest to general systems theorists. General Systems Theory [Klir (24)] however, is more interested in more unifying structural and behavioral questions rather than the validity questions in the framework of Zeigler [45]. Foo [16] has also considered validity questions in a topological investigation using the Zeigler paradigm. Corynen [11] discusses modelling and simulation in a more abstract setting than we are interested in here. The aggregation of variables to a lumped model has implicitly been done in many interesting computer simulations among which are the investigation of impulse conduction in heart muscle [Foy (17)] and the models of Mortimer [28] and Plum [32]. 1.4 Thesis Organization Chapter II introduces the concept of an ergodic machine. As mentioned before, an ergodic machine is a method to make a probabilistic ~system deterministic and thus make it more amenable to study. The investigation also points out the role of random number generators under simplification and what one might consider the ideal properties of such

7 generators. We develop the mathematical machinery that allows us to aptly represent probabilistic automata by ergodic machines. Next, Chapter III is an investigation of the idea of using a deterministic lumped model to study a probabilistic base model. In a sense, we are studying an approximate homomorphism. The investigation is in the context of an example of a network of neurons. It is shown exactly how a great deal of information is retained by the lumped model when one investigates it properly. Limitations of the deterministic lumped model are also pointed out. Chapter IV undertakes to apply the ergodic machine ideas to simplifications of systems. We see how naturally we are led to consider specific ergodic models that are simplified models of the base system. The simplification is presented in the context of probabilistic homomorphisms which is the concept in the probabilistic case that corresponds to the ordinary notion of homomorphism for deterministic systems. Finally, we make some comments about future directions in Chapter V.

CHAPTER II ERGODIC MACHINES In this chapter, we formally give the definition of an Ergodic Machine, and develop related concepts. The terminology that we are usi.ig is that of Mathematical Systems Theory and further information on tfie concepts of a machine and a system may be found in Kalman et al. [ 22 ], Arbib [ 4 ] or Zadeh [ 40 ]. Intuitively, an ergodic machine is a specification for a deterministic system that has probabilistic features when viewed properly. The method by which probabilistic features are introduced is by considering part of the state space as unknown or hidden. This part of the state space will be called the seed space and one can imagine drawing elements or seeds from this seed space as analogous to choosing a pseudo random number. The method by which a new seed is chosen is by having a transformation on the seed space that defines the next element to be chosen. We shall require this transformation to be an ergodic transformation, so, before defining an ergodic machine, we first define this concept and give some examples. Billingsley [ 7 ] gives a good development of ergodic theory and the following examples are drawn from there. 2.1 Ergodic Transformations Let H = (Q, F, P) be a probability space (see Appendix I). A measurable transformation r:Q + g is called an ergodic transformation if it satisfies two properties: (1) S is measure preserving, that is, for 8

9 any A C F, P(C- (A)) = P(A), and (2) the only sets that are C-invariant, have measure 0 or 1. A set A is C-invariant if -1(A) = A. An ergodic transformation thus determines a next point of the space Q, given a point w E Q. Much of our development will actually depend only on 1 being measure preserving. The reasons for desiring C to be ergodic are partly due to the ergodic theorem described below. Example 2.1.1 Let Q = [0, 1). F = [80,1), i.e. the a-algebra of Borel Sets, also called the Borel Field, P = dx where dx is a Lebesque measure. Given X e [0,1) let ((x) = (x+X)[mod 1]. (A number x [mod 1] is simply the fractional part of the number.) We shall call a transformation of this type an additive generator. It can be shown that c is ergodic iff A is irrational. Example 2.1.2 (the m-adic shift) s = [0,1), F = B[0,1), P = dx Let r(x) = mx [mod 1] where m is a positive integer. It can be shown that this transformation is ergodic for any such m. Notice that if we consider the base m expansion of x, that is x =.w w w..., then i(x) is simply C(x) =.w2W3w4.... This is the reason it is called the madic shift. When m = 2, we notice that it is a shift of the binary representation of x. The 2-adic shift is also called the dyadic shift. A transformation of this type will also be called a multiplicative generator. Example 2.1.3 Q = {a,b,c,d,e}, the five point space. Let F be the power set of Q, that is, all subsets of Q. For each point x E S let P(x) = 1/5.

10 Let r be the cyclic permutation (a,b,c,d,e). That is, 4(a) = b, 4(b) = c,... 4(e) = a. It is easily checked that C is measure preserving and that the only invariant sets are 4 and Q. This example illustrates how simple an ergodic transformation may be. 2.2 The Ergodic Theorem One of the reasons for considering ergodic transformations is because of the following powerful theorem. There are several versions of what is called an ergodic theorem and we present only a simplifie version of what is better known as the pointwise ergodic theorem. Theorem 2.2.1 L.et IA be the indicator function of the set A, that is IA(x) = 1 if x c A and I^(x) = 0 otherwise. Then if C is an ergodic transformation on (2, F, P) and w e Q n-1 k lim - E IA ((Xk)) = P(A) a.e. (almost everywhere). n-+ n k=O A Intuitively, the ergodic theorem simply says that if we have an ergodic transformation on a space Q and look at a small portion of that space A, then the number of times that we will observe the orbit of w in A approaches P(A). The orbit of w is defined to be the set of points w, (W) 4(X).... * If we consider the analogy to choosing pseudo random numbers, we would certainly wish that the long run frequency has this property. That is, if we are choosing pseudo random numbers from some unifQra distribution, say [0,1), then we would wish that for large N, the number of times a pseudo random number comes from some subinterval [a,b) is approximately NxP{[b-a)} if we choose N random numbers. We are now in a position to define an ergodic machine.

11 2.3 An Ergodic Machine In the following, we assume that the time base is the integers. Definition 2.3.1 An Ergodic Machine is a 7-tuple E = <X, S, I, Y,, 61, A1> where (a) X is the set of inputs. (b) S is a set called the Pseudo State Set. [We shall let S be at most countably infinite.] (c) I = (R, F, P) is a probability space. R will be called the seed space and an element of R will be called a seed. (d) Y is the output space. (e) r is an ergodic transformation on R. (f) 6 is the one step transition 61:XxSxR - S. 6, defines the next pseudo state, given the current input, the current pseudo state and the current seed. (g) Al is the output function X1: SxR + Y. The above definition of an ergodic machine specifies a deterministic sequential machine M = <X, SxR, 6, Y, X> where X is the input space, Y is the output space and SxR is the state space. 6 and X are the transition and output functions respectively with 6(x,s,r) = [61(x,s,r), C(r)] and X(s,r) = 1 (s,r) The deterministic sequential machine of course yields a system SM when we consider an input string composed of elements of X and extend the one step transformation and output functions. Whenever we now talk

12 of an ergodic machine, we shall be referring to the deterministic sequential machine that the specification of an ergodic machine defines. Thus, we in particular will be talking about transformations on part of the state space of this deterministic machine. Although we have defined the output function to be any mapping on the state space, we shall in general use only the following map. X:SxR + Y = S X(s,r) = s That is, the output map observes as much as possible since ine seed space is considered to be hidden. Notice that although the seeds determine the next pseudo state, the next seed depends only on the present seed. The seed space essentially works autonomously, generating new values of seeds which can be used the way pseudo random numbers are. 2.4 Probabilistic Automata The importance of ergodic machines derives from the fact that they can be used to deterministically model probabilistic automata. Probabilistic automata arose as a generalization of the concept of a sequential machine and Paz [ 31] provides an excellent reference for many aspects of this theory. Aside from giving the definition of a probabilistic automaton and considering the concept of strong lumpability, we stall make no other use of the theory of probabilistic automata. A oefinio 24.1 A probabilistic automaton is a five-tuple <X, S, {Z(x)}, Y, X> where (a) X is a set of inputs.

13 (b) S is a set of states with IS| representing the cardinality of S or the number of elements of S. We shall consider S to be at most countably infinite. (c) {Z(x)} is a set of IS| x |SI matrices, one for each input element of X. The i,jth entry Zij(x) represents the probability of the automaton moving to state j given that it is in state i, under the input x. It is clear that we must have the following conditions holding. Z Zi (x) = 1 for J 13 all i for a consistent definition. (d) Y is the output set. (e) X is the output map X:S + Y. The generalized feature of a probabilistic automaton is that instead of a definite transition from one state to the other, there is only a probability of transition. One way to study probabilistic automata is to consider starting in an initial distribution of states say 1(0) = <P1 P2' *'*> where Pi represents the probability of being in state i and observe with what probability one is in a particular state at time t. Notice that if we consider an autonomous probabilistic automaton, that is, one with no inputs, then we exactly have the structure of a Markov Chain (see Feller [ 15]). One of the concepts that naturally arises when one considers simplifications of probabilistic automata is the concept of strong lumpability. The concept of strong lumpability was first introduced in connection with finite Markov Chains by Kemeny and Snell [ 23 ]. Definition 2.4.2 A probabilistic automaton M = <X, S, {Z(x)}, Y, k> is strongly lumpable if there exists a nontrivial partition on the state set S such

14 that for any two states i and j in the same block, the sum of the row entries of i and j corresponding to any block are the same. Example 2.4.1 Consider an autonomous probabilistic automaton M with transition matrix 1/8 1/8 1/3 0 5/12 1/4 0 0 1/3 5/12 Z = 1/16 1/16 1/6 2/3 1/24 0 1/8 1/3 1/2 1/24 1/4 1/4 0 1/3 1/6 We see that the following is a partition that satisfies the'definition <{1,2}, {3,4}, {5}>. The lumped transition matrix, that is, the one with state set the blocks of the original partition and the entries as the row entry sums, is just 1/4 1/3 5/12 Z' = 1/8 5/6 1/24 1/2 1/3 1/6 J Intuitively, all we are doing is keeping track of the transitions from block to block. If, for any two elements of a block, the probability of transition to another block is the same, then the block to block transitions are well defined. The condition necessary to insure this is -strong lumpability. We shall call the probabilistic automaton that has the blocks as states and the appropriate transition probabilities a lumped vesion of the original automaton. Thus, in the above example if M' =:<X' S', Z', Y', X'> where X has one element and S' has three elements, then M' is a strongly lumped version of M. For the concept of strong lumpability, the output set and output transition function play no part.

15 2.5 Ergodic Machines and Probabilistic Automata We have mentioned before that we shall use ergodic machines to model probabilistic automata. We explore some aspects of the connection between these two concepts in this section. The next definitions are based on the following informal discussion. Suppose we are given an ergodic machine M. Then, if we are told that we are in a particular pseudo state sl and the input is x, and no other information, what can we say about the next pseudo state that would occur? Clearly, the next pseudo state would depend on the actual state that we are in, that is, on the hidden seed value that we happen to have as the second and hidden component of the state (i.e., we are actually in some (sl, rl), but do not know this). Suppose instead, that we consider all the seed values that take us from sl to sl again at the next time step. That is, let A = {rER|6l(x, sl, r) = s }. Then, the most that we could say is that the chances of observing s! again at the next time step depend on the chances of being in A. But, it might be appropriate to think of the chances of being in A as just the probability of the set A. This might be the case if, for example, we had initially chosen our initial seed with just the probability distribution induced by P on R. (Recall that R is a sample space of I1 = (R, F, P).) Then it would be reasonable to say that the probability of going from s1 to sl is just P(A). With these informal remarks as background we shall define a probabilistic automata that corresponds to a given ergodic machine. We also note at this point that we are only considering ergodic machines with at most countable pseudo space for convenience. Definition 2.5.1 Given an ergodic machine M = <X, S, ii = (R, F, P), Y, >, 61, X1>,

16 for any two pseudo states si, sj, the si, s., x, transition seed space A.j(x), is defined to be A. (x) = {reR|6l(x, si, r) = s}. Notice that for all si, s. and x, Aij(x) is just a subset of R. Since we shall be interested in the probability of such sets we make the following definition. Definition 2.5.2 An ergodic machine is normal if for all si, sj, and x, Aij(x) F or, equivalently, Aij (x) is P-measurable. From now on, we only consider normal ergodic machines and shall assume that ergodic machines means a normal ergodic machine. Definition 2.5.3 Given an ergodic machine M = <X, S, II, Y, l, 61, XA>, the associated automaton CM is the probabilistic automaton CM = <X, S, Y, {Z(x)}, X> where Z(x) is the matrix whose i,jth entry is: Zi (x) = P(Ai.(x)) X is just the same as for the sequential machine that M represents. Observe that we have made no mention of the ergodic transformation i. Thus, it is apparent that the same probabilistic automaton C is associated with many ergodic machines. In fact, if M has the associated machine CM, then any ergodic machine with simply a different 5 has the same assciated machine CM. From the definition then, the mapping from the class of ergodic machines to the class of probabilistic automata is many to i'.,..aethermore, it is an easy exercise to check that this mapping ias in fact surjective (onto) since given any probabilistic automata C it is quite easy to find an ergodic machine M such that C = CM.

17 Example 2.5.1 Consider an autonomous ergodic machine M with (a) X = {4} That is, X is a singleton set. (b) S = {1,2,3} There are three pseudo states. (c) R = {a,b,c,d,e} with F the set of all subsets of R. Let P({a}) = P({b}) =... P({e}) = 1/5. (d) Let 4 be the cyclic permutation on R that is (a,b,c,d,e). (See example 2.1.3.) (e) Let Y = S and X be the pseudo state output. (f) Since M is autonomous, we can ignore the input in the definitions of the transition. 1 is defined as follows 6 (1, r) = 1 for r = a,b = 2 for r = c,d,e 61(2, r) = 1 for r = a,b = 2 for r = c,d =3 for r = e 6 (3, r) = 1 for r = e =2 for r = a,d =3 for r = b,c Since M is autonomous, it is apparent that the associated automaton is also autonomous and therefore simply a Markov Chain. The associated automaton has transition matrix 2/5 3/5 0 Z = 2/5 2/5 1/5 1/5 2/5 2/5 Notice that for the ergodic machine, if we start in any of the three states, there are only five different trajectories possible. (A

18 trajectory will be the sequence of outputs of the sequential machine that the ergodic machine represents.) For example, starting in state 1, we can calculate the first few elements of the five possible sequences. If the initial state was actually (l,a) then 6[(l,a)] is [l,b). 6[(l,b)] = (l,c). 5[(l,c)] = (2,d). Thus the sequence of outputs (in this case just the pseudo states) is 1, 1, 1, 2. Doing this for every possible initial state, we get the five possibilities, indexed by the initial seed. (a) 1, 1, 1, 2, 2, 3, 2,... (b) 1, 1, 2, 2, 3, 2, 1,... (c) 1, 2, 2, 3, 2, 1, 2,... (d) 1, 2, 3, 2, 1, 2, 2,... (e) 1, 2, 1, 1, 2,2, 3,... The sequences beginning with the other initial states can be simiiarly calculated. If we did not know the initial seed but knew only that it was one of the five seeds with equal probability, then the associated Markov Chain would represent the probability of going from state i to j at the first time step. Since we are interested in using homomorphisms as simplification procedures, it is important to understand how homomorphisms of ergodic machines relate to the associated probabilistic automata. LDefinition 2.5.4 Assume M, M' have the same input, output sets. An ergodic machine M' is a hommorphic image of the ergodic machine M if there exists a homomophism h (surjective) from the sequential machine that M represents to the sequential machine that M' represents. That is, h:SxR - S'xR' such that (1) h~6(x,s,r) = 6'(x, h(s,r)) and (2) X'[h(s,r)] = X(s,r). The definition of homomorphism is just the standard automata

19 theoretic definition for sequential machines. We now see what happens when the homomorphism h can be represented as a cross-product. Theorem 2.5.1 Let h be a homomorphism from the ergodic machine M = <X, S, H, Y,', 61 A1> to the ergodic machine M' = <X, S', n', Y,', 6 1, %'> If h = kl x k2 where kl:S + S' and k2:R -* R' with h(s,r) = (kl(s), k2(r)), then CM' is a strongly lumped version of CM. The proof of the theorem depends on the following lemma. We shall define 6d(x,B,r) = {seSI there exists beB with l6(x,b,r) = s}. Assume the same hypotheses as for the theorem. Lemma 2.5.1 For all i, j e S' and input x, d6(x, k-1(i),r) C k (j) iff r C k2 [Aijx) where A!.(x) is just the i,j,x transition seed space of the ergodic 1,J machine M'. Proof: (<=) Let r c k [A!.(x)] 1 3J then since h(k1 (i,r) = (i,k2(r)) with k2(r) E A! j(x) we have -1ohlx~k -1 6'oh(x,k-l(i), r) = 6'(x,i,k2(r)) = [j,'k2(r)] = ho6(x,k1 (i),r). Thus, klo6l(x,k1 (i),r) = j -1 -1 6(x,k1 (i),r) C k1(j). (=>) The forward implication follows just as readily. Simply use r k2l[AI.(x)] and show that 61(x,k1 (i),r) ~ kl(j). Although the above lemma is notationally complicated, it is really rather trivial and simply says that if one considers a seed r that takes you from pseudo state i to pseudo state j in the image

20 machine M', then, the set of seeds which are inverse images of the seed r take you from the inverse image set of i to the inverse image set of j. Since h is a homomorphism this clearly has to hold. We now present a lemma due to Corynen [ 11 ] that we shall also use in the proof of the theorem. Lemma 2.5.2 Let C and' be ergodic transformations on the spaces (Q, F, P) and (S', F', P') respectively. Suppose k is a surjective mapping k:Q + Q' with k C = V'ok. Then, P'(A') = P'(k-1(A')) for all A' e F'. Proof: The proof is straightforward using the ergodic theorem we have presented andthe fact that korJ = (') jQk. Proof (Theorem 2.5.1) To follow the proof of the theorem the following commutative diagram should be kept in mind (the diagram commutes because h = ki x k2 is a homomorphism)., x X x S x R, - S x R id k4 k21 k1j k2 X x S' x R' I S' x R' Let u e k1l(i) and v E k-l(j) for i,j e S'. Then, if we let B - k, (A j (x)) it is easy to see that A (x) C B. This follows simply 2 U,'v from the commutativity of the above diagram, or from an application of lemma 2.5.1. Lemma 2.5.1 further asserts that 61(x,u,B)C. kll(j). Thus, we also have U A (x) =B vk1 (j) u, We notice now that since 61 is a function, the above union is, in fact,

21 a disjoint union. That is, A (x) n A,(x) = u for vfv' and -1u -u1 v,v'sk (j). Now, for u, u'sk1 (i) we clearly have U A (x)= U A (x) = B VCk-j) U, V -Ckl- U),V''U A W Py A W PIIk-l(j) ux) P[v JAu, v(x)] - 1 U.'V 1, vEk1 (j)' v Ek1 (j) for any u,u'ek1 (i). Furthemore, since h is a homomorphism, ko% =?'ok. Since ~, 1' are both ergodic transformations we can apply lemma 2.5.2 and get P[A! (x)] = P[B] = E P[A (x)] for uEk(i). (2) vk1 (j) UV Now if we let the partition of the associated automaton CM be just the disjoint sets k-l(i) over iES' it is easy to see that conditions (1) and (2) are just the strong lumpability condition that was previously defined. Thus, as asserted, CM' is a strongly lumped version of CM. So far, we have considered the concept of the associated automaton of an ergodic machine, without worrying about the transformation g and probability space n. However, from example 2.5.1, it is apparent that with a simple r and small sample space, we may in fact be poorly representing the associated automaton. It is clear from examining any of the fifteen possible output sequences of that example that each sequence will soon start to repeat. (Maybe it might have an initially distinct subsequence). This is so because of the fact that the sequential machine is a finite state sequential machine and it is easy to see that within a length twice the size of the state space, a cyclic subsequence can be found. We, however, wish to be able to consider more general seed spaces,

22 although eventually we shall of course discuss the situation for the only practical case of finite seed spaces. In the next section, we consider the concept of information related to an ergodic machine which will lead us to a better representative for the associated probabilistic automaton. 2.6 Information We are interested in finding a I and a C that will allow us to aptly represent a probabilistic automaton C. Recall that the mapping from the class of ergodic machine M to the class of probabilistic at'.omaton C was a surjective and not injective mapping. If we call the mapping i, then we wish to explore the subclass 1~ (C) given the probabilistic automaton C. Hopefully, some elements of this subclass will be such that any sequence from a proper representative ergodic machine, say E, would never lead one to doubt that the ergodic machine is in fact a probabilistic automaton, if it was not known in advance that E was in fact an ergodic machine. To simplify concepts, we consider the case of autonomous ergodic machines for the next few sections. The generalization to non-autonomous ergodic machines is quite simple and we indicate how this is done later on. Thus, from now on, ergodic machines will mean an autonomous ergodic machine, that is, one that has only a single input value. Because we consider only autonomous ergodic machines, the associated automaton will of course be simply a Markov Chain. Thus, we are searching for a proper representative of a given Markov Chain. Suppose we are given an ergodic machine and we observe the outputs for n+l time steps. That is, we observe the pseudo states s(O), s(l), s(2),..., s(n) where each s(i)ES. We shall assume that the initial seed (unknown by us) was chosen with the distribution induced by the

23 measure P. Thus, if B C R then the probability that the initial seed r(O)~B is just P(B). For example, if R = [0,1) with Lebesque measure, then the probability of the initial seed being in any interval is simply the length of the interval. We know that the ergodic machine should have trajectories that match the associated automaton. But, the fundamental feature of the associated automaton which is a Markov Chain is simply that P{X(n+l) = s(n+l)|s(0),..., s(n)} = P{X(n+l) = s(n+l)|s(n)} where X(O), X(1),... is a sequence of random variables whose distribution defines the Markov process. Can we find something analogous to this for our ergodic machine? We shall see that we can, but first we need the following. Definition 2.6.1 Let s(O),..., s(n) be a sequence of observed pseudo states. The information gain (IG) of this sequence is the set of seeds rcR at time step 0 that could have given rise to the observed trajectory. That is, recalling that Aij is the i,j transition seed space, we have n-l IG[s(0),..., s(n)] = n i(Asi).s(i+ i)) We shall abbreviate the above to IG(s) when it is clear that s means the sequence s(0),..., s(n). The reason for calling IG(s) the information gain is that we initially know only that the initial seed is an element of R but after observing the output trajectory we now know that the initial seed was an element of IG(s) C R. Notice that the IG is a monotonously decreasing set as we observe more and more time steps. That is, if r is a subsequence of s then IG(s) C IG(r). Having observed the sequence s(0),..., s(n), we are interested

24 in what the next pseudo state could possibly be. The next pseudo state of course depends on the actual seed at time step n, that is, on r(n). Thus, although we know what are the possible initial seeds we could have started from at time step 0, we need to know the possible seeds that the ergodic machine might be in at time step n. This motivates the concept of the information loss. Definition 2.6.2 Let s(O),..., s(n) be a sequence of observed pseudo states. Then, the information loss (IL) of this sequence is the set of possible seeds at time step n, given the observed trajectory. That is, IL[s(O),..., s(n)] = n(IG[s(O),..., s(n)]) n" i(A " i=0 s(i).s(i+l) We have mentioned before that we shall assume that the chances of an initial seed r(0) being in a set B is just P(B) at time step 0. Because C is a measure preserving transformation, we have the following. Proposition 2.6.1 Given a measurable set K C R at time step n, the probability of r(n), the seed at time step n, being in K is just P(K). Proof: Prob{r(n) K} = Prob{r(O) E-n(K)} = P(-n(K)) by our definition of the chances of the intial seed being in a particular set = P(K) since r is measure preserving. We are now in a position to derive conditions so that the ergodic

25 machine appropriately model a Markov Chain. Given the observed sequence, s(O),..., s(n), we wish to investigate what we would mean by Prob{s(n+l)Is(O),..., s(n)}. Clearly we simply mean the chances of the seed at time step n being a seed that gives rise to s(n+l) at the next time step, knowing that the seed must be consistent with the observed trajectory. Thus, we first calculate the set of seeds at time step n that has this property. Clearly it is just IL(s(O),... s(n)) n As(n)^s(n+l) Similarly, having only observed the output at time step nto;bes(n), the set of seeds that gives rise to s(n+l) at the next time step is of course A sn) sn+l) Thus, we are justified in claiming that if s represents s(n),s(n+l) the sequence s(O),..., s(n), (1) Prob{s(n+l) Is(),... s(n) } = P[IL(s) n A ]s(n+l/P[IL(s)] (2) Prob{s(n+l) Is(n)} = P[As(n) s(n+l)] The above two statements are the informal justification for the following definitions. Definition 2.6.3 Given the sequence s = s(O),..., s(n), the information loss is strong at time n if for all jeS P[IL(s) n A ] = P[IL(s)] P[A(n)j] s(n),j s(n),j The information loss is weak at time step n if there is some j for which the above does not hold. Definition 2.6.4 Given an (autonomous) ergodic machine, 5 is a faithful transformation if the information loss is strong for every possible sequence at every time step.

26 Definition 2.6.5 Let E be an (autonomous) ergodic machine for which CE is the associated probabilistic automaton, in this case, a Markov Chain. Then E faithfully represents CE if C is a faithful transformation. The discussion preceding the last three definitions gives ample justification for our definition of an ergodic machine faithfully representing a probabilistic automaton. Intuitively, we see that the expected trajectories of the probabilistic automaton are just the same as the expected trajectories of the ergodic machine if we assume an initiali unknown seed. We further see that no more useful information about the next transition can be gotten from observing a particular sequence than can be obtained from just observing the last output. It is clear that if one were to run a faithful representative ergodic machine, an observer would never suspect that the ergodic machine is deterministic. Proposition 2.6.2 Let E be an (autonomous) ergodic machine with ~ a transformation such that for every sequence s, IL(s) = R. Then, C is a faithful transformation. Proof: It is immediate that the information loss is always strong. 0roposition 2.6.3 If the information loss of a sequence s is null, that is, IL(s) = (, then the sequence s could never have arisen. Proof: IL(s-) = q )=IG(s) = f ==-no possible initial seed could give rise to the sequence s.

27 Before continuing on, it will be helpful to get a feel for the concept of information loss and information gain by considering two examples: Example 2.6.1 Suppose we wish to model a four-state Markov process with state set S = {1,2,3,4} and Prob{i -* j} = 1/4 for i,j = 1,..., 4. Let us try to model this by a generator that is a dyadic shift on [0,1). That is, R = [0,1), F = B[0,l), P = dx, e(x) = 2x (mod 1). (See example 2.1.2.) Furthermore, let the transition function be such that Ail = [0,1/4) Ai = [1/4,1/2) Ai = [1/2,3/4) A4 = [3/4,1) for i = 1,..., 4. Assume that we could have observed the following output sequence through time step 3. s(O) = 1 s(l) = 3 s(2) = 2 s(3) = 1 If we calculate the information gain at each time step for this sequence we get: IGO = [0,1) G s(0),s(l) = 1,3 [1/2,3/4) IG2 = As(,s( n A= 3 = {[1/8,1/4) U [5/8,3/4)} n [1/2,3/4) = 2 s(l),s(2) 1,0 [5/8,3/4) IG3 = ~4i' [As(2) s(3)] n A3,2] n A13 = C{-11 [0,1/4) n [1/4,1/2)} [1/2,3/4) = -1{q} n [1/2,3/4) = q IG. = q for i > 3 since IG is monotone decreasing. Because the information gain at time step 3 is null, we know that the observed sequence is impossible and that r cannot be faithful. This will become even clearer when we examine the information loss. IL0 = [0,1) IL1 = r[l/2,3/4) = [0,1/2)

28 2 2 IL2 = IG2 = [5/8,3/4) = [1/2,1) IL3 = Notice that even at time step 1, the sequence is weak. For, we have P(IL1n A,4] 0 1/8 = P[IL1] P[A14]. What this says is that having observed a 1 and a 3 as the output, since the possible seeds at time step 1 are only [0,1/2), there is no way that we could possibly observe a 3 or a 4 at the next time step. Clearly, this is more than sufficient for a sequence to be weak. Of course, since IL3 = (, it is immediately obvious that s cannot be faithful. Example 2.6.2 Let everything be the same as in the previous example except that we use c(x) = 4x (mod 1) as the generator. We examine the same.output s(0) = 1 s(1) = 3 s(2) = 2 s(3) = 1 Let us again calculate the information gain and the information loss. IG = [0,1) IG = A3 = [1/2,3/4) IG2 = -1[l1/4,1/2) n [1/2,3/4) = [9/16,10/16) IG3 = -2[0,1/4) n [9/16,10/16) = [36/64,37/64) Similarly, through three time steps we can calculate IL0 = [0,1) IL = rl[/2,3/4) = [0,1) IL2 = [29/16,10/16) = [0,1) IL = 3[36/64,37/64) = [0,1) We notice that for this sequence, the information loss is R = [0,1) at each time:.step and therefore strong. If it could be shown that

29 for any possible sequence, the information loss was similarly [0,1), then by Proposition 2.6.2 we would know that r was faithful. We come back to this question of showing the existence of faithful transformations in a later section. For completeness, we present the following proposition. Proposition 2.6.4 Suppose U is an infinite sequence with Un being the subsequence through time step n. Assume m - n. Then, (1) IG(Un) C IG(Um) (2) IL(Un) C n-mIL(Um) (3) -nIL(Un) C IL(U ) in- n-1 (4) IL (U,) C [A C -[A (4) IL(U) = nu(i)u(i+l) [Au(i),u(i+l) We have previously defined the information loss function as kk-1 -i IL(s) = rq C [A ] i=0 s(i),s(i+l) for s the sequence s(0),..., s(k). Although this definition arose naturally, it is not very convenient to work with. Fortunately, there exists an alternative definition that is often more convenient to work with. Let us call the original definition the backward information loss (BIL) and use an argument to indicate the time step through which we are considering it for a hypothesized sequence s. Thus, k-1 BIL(k) = r; fl [- A ] BIL(k) = n As(i),s(i+l)] i=O Now, we can recursively define a new concept which we shall call the forward information loss (FIL). Definition 2.6.6 Given a sequence s(0), s(l),..., s(n) the forward information

30 loss (FIL) is defined recursively as (1) FIL(O) = R (2) FIL(k) = r[FIL(k-l) n A s 1)]s(k) for 0 < k < n Notice that FIL(1) = r[R n AO)s = (l) [A S Furthermore we can represent the forward information loss explicitly as FIL(k) = [n...[[) s(l)] A(l),s2)] As(k-l),s(k)] k I's we need to show that the two concepts of backward information loss and forward information loss are equivalent and to do this we need the following lemma. Notice that r is not 1-1 in general. Lemma 2.6.1 For any sets A, B and any transformation i, s[d[A] n B] c 2[A n s-1(B)] Proof: Let y e d[i[A] n B] Then, 3x c i[A] n B such that r(x) = y x c [A] and x c B. Now, z e A such that {(z) = x. Since x - B = z c i(B) and thus, z c A and z e 1(B) 2(z) c 2 [An C- (B)] = y {2 [An 0 -(B)] The Iemma can be thought of as allowing us to factor the inner r through the parentheses. Theorem 2.6.1 For any infinite sequence s(0), s(l), e..

31 FIL(k) = BIL(k) Proof: (a) We first show that FIL(k) C BIL(k) The proof is by induction. It can trivially be shown to be true for k = 0,1. Assume that it is true for all sequences of length k-1. Then, FIL(k) = [tFIL(k-l) n A(kl)s(k)] = [SBIL(k-l) n As(k-l),s(k)] =;[{k1kln [A ) s+l)}n As s = C{k n As(i),s(i+l) ] n As(k-l),s(k) i=O C i- [A s(i) s(i+l)]} n As(k-l),s(k)J ~Egk1i=0' k~k-2 (by the previous lemma) c k n - [As(i) s(i+l)] n lAs(k-l),s(k)] (by repeated application of the lemma) rk-i: s~o [A s(i),s(i+l)] Li=o = BIL(k).'. FIL(k) C BIL(k). (b) We now show that BIL(k) C FIL(k). The proof in this direction is much easier. We simply use the fact that c(A n B) C C(A) n )(B) kk-1 BIL(k) = k n - iAs(i),s(il) i=O kk-1 -i+l i=l

32 ~~k-i~~ ~k-i - = {R[AS )S(1)] n s(l) n [Asi) s(i+l)] C {[sO) ] n s(l) s(2) (i) s(i+l) Cs {[ (O),s(1) n s i),s 2) s (i (i+l) i=2 k- ([C2{[[As(o),s(1)] As(1) s(2)] n 2 ~~~~k-2~~k= FIL(k). Because of this theorem, we shall feel free to use the recursive definition as our definition of information loss. We now turn to a consideration of conditions that tell us when we have a faithful transformation. 2.7 Canonical Ergodic Machines and Multiplicative Generators When we actually start defining an autonomous ergodic machine to represent a Markov Chain, it seems most natural to define the transition seed spaces as contiguous to one another if we use the space R = [0,1) as the seed s pace. Also, multiplicative generators seem a natural class of transformations to consider, especially when we realize that most pseudo random number generators that are implemented are either of the multi icatip ve type or possibly of the mixed congruential type. Knuth i4 25] provides a good source for information on types of pseudo random chinese a rthough i s meot n l to efore, he genraization to ergodic IChines, athuh as mentioned before, the generaization to erodi

33 machines is quite simple and will be done later. Although we have allowed the pseudo state space to be countable so far, we now wish to consider ergodic machines for which the pseudo state space is finite. This would, of course, correspond to finite Markov Chains as the associated probabilistic automaton. Definition 2.7.1 An ergodic machine E is semi-finite if the pseudo-state space is a finite set. Definition 2.7.2 An (autonomous) ergodic machine, E, is a canonical autonomous machine if (1) R = [0,1) F = Borel Field P = dx (2) If Aij is the i,j transition seed space with P[Aij] = Pij then if we define Ai0 = (, j-l j Ai = [. Pik' Pik) ~1 k=0 k=0 Notice that we are simply requiring that the transition from pseudo state 1 to pseudo state 1 be represented by the first interval, from 1 to 2 be represented by the second interval, etc. In particular, a transition from i to j is not the union of two or more noncontiguous intervals. The first result that we present tells us under what conditions we have a faithful multiplicative generator for semi-finite canonical ergodic machines. The result is useful because it establishes the fact that faithful transformations exist.

34 Theorem 2.7.1 Let E be a semi-finite canonical ergodic machine with pseudo state set S. Suppose 4 is an m-adic multiplicative generator, that is, C(x) = mx (mod 1). Let p = min {P[Aij]} for P[Aij] ~ 0. Then if-' ijsS 13 13 m -> -, is a faithful transformation for E. Proof: Be definition of a canonical ergodic machine, given a non-null Aij, we know that Aij = [a,b), that is, Aij is some interval. Now, 4[a,b) = {[ma,mb)} (mod 1) We can consider mod 1 as being equivalent to wrapping the interval [ma,mb) around [0,1) starting at ma(mod 1). Since m(b-a) > l(b-a) > 1 by hypothesis, we see immediately that S[a,b) = [0,1). Thus, if we now calculate the information loss for any sequence we have IL(O) = [0,1) IL(1) = [[0,1) n ] Aij ] = [A] = [0,1) By induction IL(k) = [0,1) for any k. Notice that we have been using the forward information loss. By P-oposition 2.6.2,.- is a faithful transformation. Theorem 2.7.2 Given any finite Markov Chain, C, there exists an ergodic machine E such that E faithfully represents C. Lett C be a finite Markov Chain with transition matrix [Pij]. Let m = min{pij} for Pij.. O. Define an ergodic machine E with pseudo state e set of the Markov Chain. Let E be the canoni set S the sa~ne as. thaeL state set of the Markov Chain. Let E be. the canoni

35 cal ergodic machine with generator f(x) = mx (mod 1). (Since we defined E to be canonical, the transition function is implicitly defined.) By the previous theorem ~ is good for E and thus, E faithfully represents C. The importance of a canonical representation should be emphasized in Theorem 2.7.1. The following example illustrates that if we consider non-canonical machines, the theorem does not necessarily hold. Example 2.7.1 Consider a very simple two-state Markov Chain as follows: 1/2 1/2 [Pij] = /21/2 Suppose we try to faithfully represent this by a dyadic multiplicative generator ((x) = 2x (mod 1) but that we use a non-canonical factorization of the seed space. That is, we let A11 = [0,1/4) n [1/2,3/4) A12 = [1/4,1/2) n [3/4,1) A21 = [0,1/2) A22 = [1/2,1) Let us calculate the information loss function for the following potential sequence. s(O) = 1 s(l) = 1 s(2) = 2 s(3) = 1 Then, we have, IL(O) = [0,1) IL(1) = [[0,1) n Al] = [0,1/2) IL(2) = ~[[0,1/2) n A12] = [1/2,1) IL(3) = e[[1/2,1) {F A21] = We see that the information loss for this sequence is weak since at time step 3 we find that this sequence cannot occur. Thus, although for the canonical ergodic machine associated with the Markov Chain, the dyadic

36 transformation is faithful, it is not faithful for the particular noncanonical ergodic machine chosen. The example suggests that if we are using an unknown multiplicative generator, it might be wise to model a Markov Chain by the canonical ergodic machine. Theorem 2.7.3 Let E be a semi-finite canonical ergodic machine with pseudo state S, ISI = n. Suppose that for each partition of i of [0,1) by tire transition seed spaces, Ail,... Ai there exist at least thre.e non-null seed spaces. Let C be a m-adic multiplicative generator, that is, 4(x) = mx (mod 1). Let p = min {P[A]} for P[A] # 0. Then, if ijsS J [Ai m < -, C is not a faithful transformation. Proof: Suppose P[A.i] = p. We can represent Aij by the interval [x,y) since E is a canonical machine. If we consider a potential sequence beginning with s(0) = 1, s(l) = j, we see that IL(1) = 4{[x,y)} by defini-:tion. Now 4{[x,y)} canhave two forms. (1) x{[x,y)} = [a,b) with b-a < 1 (2) C{[x,y)} = [0,c) U [d,l) with c+d-l < 1 The reason that the length b-a or c+d-l is less than one is because of the hypothesis that m <. Now, since Ajl,... A. has three non-null sets, let the first oneb;Aj i, a centTal one be A. and a last one be Aj. Given ase (1), [a,b) must intersect Ajf, otherwise the information loss is weak, and [a,b) must intersect Aje, otherwise the information loss is weak. But, this implies that [a,b) contains Ajc. We thus horse

37 P{[a,b) n A. } = P[A j] P[Aj] (b-a) since b-a < 1. Thus, the information loss is still weak. The argument for case (2) is similar. In this case we find that [0,c) cannot contain all of Ajf and [d,l) cannot contain all of Aje. But this implies that {[0,c) U [d,l)} n A. Again, the information loss is weak. The previous theorem is almost a converse of Theorem 2.7.1. Notice that anytime we have a situation such that for all pseudo states i, there are always chances of going to three different states with positive probability, then the condition m >- is a necessary and sufficient condition for the multiplicative generator to be a faithful transformation. We next present an example that shows why Theorem 2.7.3 is not a true converse of Theorem 2.7.1. Example 2.7.2 Suppose we consider the Markov Chain 2/5 1/5 2/5 [pi] = 1/2 1/2 0 1/3 1/3 1/3J We model this as a canonical ergodic machine with Al = [0,2/5) A12= [2/5,3/5) A = [3/5,1) A21 = [0,1/2) A22 = [1/2,1) A23 = A 1 = [0,1/3) A2 = [1/3,2/3) A = [2/3,1) Although we see that for this example p = 1/5, it turns out that even with m = 4, the multiplicative generator f(x) = 4x (mod 1) is a faithful transformation. Let us attempt to construct a sequence that is weak and see where this fails. It can easily be checked that until we get a 1 in the sequence, the information loss is always [0,1) and thus

38 trivially strong, since i[0,1/2) = [1/2,1) = C[0,1/3) = r[1/3,2/3) = G[2/3,1) = [0,1). Suppose then that a 1 occurs. Without loss of generality, assume it occurs at time step 0. If a 1 or 3 occurs after the 1 at time step 0 we see that IL(1) = [0,1) and is still strong. Thus, the only possibility for a weak sequence is to have a 1 followed by a 2. Let s(0) = 1 and s(l) = 2 without loss of generality. Then IL(1) can be seen to be i[2/5,3/5) = [0,2/5) U [3/5,1). Notice that the information loss is not all of [0,1). However, it can easily be checked that P[IL(1) f A21] = P{[0,2/5)} = 2/5 = P[IL(1)] * P[A21], and P[IL(1) ( A22] = P{[3/5,1)} = 2/5 = P[IL(1)] - P[A22]. The case with A23 is trivially satisfied. We see that the information loss is still strong. Now consider the possible 1 or 2 at time step 2. In either case, the information loss is [0,1) since we have either IL(2) = r[0,2/5) or IL(2) = r[3/5,1). Thus, the information loss is strong and in particular is [0,1) again. Clearly, this implies that for all sequences, the information loss is always strong, and thus we see that ~ is a faithful transformation. In the above example, it is worth noting that the information loss is not always the entire seed space R = [0,1). Thus, although this condition is sufficient for a faithful transformation, it is not a necessary condition. We now turn to considerations of whether or not there exist "ideal" generators.,-. U niversalLy Faithful Transformations We are now interested in investigating whether or not there are transformations that are ideal in a particular sense. For example, it would be nice if we could find a transformation 4 such that for any

39 ergodic machine, s was a faithful transformation. Unfortunately, there is no such transformation. Proposition 2.8.1 There is no "ideally" faithful i. That is, given 5, we can always find an ergodic machine for which C is not faithful. Proof: Choose a set B C R with P(B) < 1. Let A = B let A21 (B). ~11 = 21 It is clear thait the information loss is weak for the sequence s(O) = 2 s(1) = 1 s(2) = 2. To ask for such an ideal or universal transformation, then, is asking too much. Perhaps what we really need is to say that given any probabilistic automaton C and given a "universal" transformation C, there exists some ergodic machine E (not necessarily canonical) such that E faithfully represents C using the transformation C. Definition 2.8.1 A transformation r is universally faithful if, given any probabilistic automaton C, there exists an ergodic machine E such that C is faithful for E and E faithfully represents C. Although the above definition is nice, in a way it is too general. For, in general, one only builds ergodic machines for which the transition seed space is composed of finite unions of intervals and for which the transformation has some reasonable properties such as being bounded or continuous. We shall show that transformations that are multiplicative and additive, among others, are not universal, but first we need to develop some machinery.

40 Lemma 2.8.1 Consider an ergodic machine with a positive measure transition seed space All. Then, if we consider the information loss function corresponding to the sequence 1,1,1,1,..., we have IL(k) D IL(k+l) for k = 0,1,2,... Proof: The proof is trivial by induction. Clearly, IL(1) C IL(0) = R Assume true for k-l, we get IL(k+l) = C[IL(k) n All] C S[IL(k-l) n All] = IL(k) Definition 2.8.2 A transformation C is bounded if there exists some M such that P[r(A)] < M-P[A] for all sets A C R. M is the bound of the transformation. Lemma 2.8.2 Let C be a bounded transformation with bound M, and consider an ergodic machine, E, for which P[All] > 0. Consider the information loss for the sequence 1,1,1,1,.... Then (1) either lim P[IL(k)] = 0 or,-~~ n.~^~~~k-o (2) M P[A1l]. Proof: Since the information loss is monotonically decreasing by the previous lemma either condition (1) has to hold, or there exists a set K with?:PK) > 0 such that lim IL(k) = K. Assuming there exists such a set k-oo K, we see that it must be that K = I[K n All], that is, if we let B =- K P All, B is a subset of All that expands under r to the entire set K.

41 Now, for the information loss to be strong, we must have P[K n All] P(K) = P[A11l Now, since C is bounded by M, 1 < PA] for any set A with P[A] > 0. M P[1(A)] Since B = K n All and c(B) = K we have 1< P[B] P[A - P[~(B] ilP[A ll] M- 1[Az Thus, the second condition must be satisfied. Theorem 2.8.1 Let r be a bounded universally faithful generator. Then lim P[IL(k)] = 0 k-+ Proof: By the previous lemma, either (1) lim P[IL(k)] = 0 or (2) P[A. The second case cannot be tru for a universally f -k (2) M- P[A.] The second case cannot be true for a universally faithful generator, since then we could not model a probabilistic automaton which had the probability of transition from state 1 to 1, pll < M. Thus, the first case must hold. Notice that this theorem does not assert the existence of universally faithful bounded generators. It only says that if one does exist, it must require lim P[IL(k)] = 0. k-^oo We now consider a sequence of lemmas that investigates what happens when the probability of the information loss of a sequence 1,1,1,... approaches 0. We have shown already that the second condition in the previous lemma will exclude universal transformations. From now on, we

42 shall consider the case where the seed space is R = [0,1) and where any transition seed space is composed of a finite union of intervals. We first present some notation that we shall find convenient to use. Definition 2.8.3 Given the sequence 1,1,1,... we let S(t) = IL(t) All. That is, S(t) is the part of IL(t) that is a subset of All. Notice that S(t) is monotonically decreasing and that IL(t+l) = r[S(t)]. Also, S(0) = Ali -Definition 2.8.4 Let Y be some interval in R = [0,1). Y will be called the target region. Let X be a union of intervals {X.} that contains C (Y). X will be called the initial region. Furthermore, define IR(t) = X n S(t) t > 0 TR(t) = Yn S(t) t > 0 IR(t) is the initial region at time step t and is part of the initial region that is also part of S(t). The interpretation for TR(t) is similar. In the application of the following two lemmas, it turns out that we only use the entire interval [0,1) as the target region. However, since this special case does not particularly simplify the proofs, we consider an arbitrary target region in the two lemmas. Figure 2.8.1 below is a pictorial representation of how S(t) de-,reases with time. Notice that at least initially, the number of inter-:als -of S(t) in the target region may increase.:Definition 2.8.5 n Let a set A = Tj Ii where Ii is an interval. Then #: {Finite i=1 1 unions of intervals} + Integers by #(A) = n. # counts the number of

43 0 1 _ -.,. —,I______ _____ S(O) = All,_ =-. 1-l _ -E EE -EE 1 I S(1) I _1,'_', ", " "".. f, I s(2) Represents part of S(t) Fig. 2.8.1 How S(t) Monotonically Decreases finite intervals that compose a set A. It can easily be checked that n < n # U A. - #(Ai). i=l i=1 Lemma 2.8.3 TR(t+l) = C[IR(t)] n TR(t) Proof: TR(t+l) = TR(t) n Y n S(t+l) {TR(t) monotonically decreasing} = TR(t) n Y n [s(t)] n Al {using definition of S(t)} = TR(t) n r[S(t)] {since TR(t) C TR(O) = Y n All} Now, by the definition of the initial region X, we have S(t) = IR(t) U [X n s(t)] {bar represents set complement} but, r[X] n Y = [X] n TR(t) =.'. Is(t)] n TR(t) = {~[IR(t)] n TR(t)} U {r[x n s(t)] n TR(t)} = r[IR(t)] nTR(t) U,.'. TR(t+l) = T[IR(t)] (n TR(t)

44 We now come to the crucial lemma. This lemma will be used to show the limitations of transformations for which IL(k) -+. An intuitive interpretation of the following lemma is in order. Suppose we consider the transition seed spaces All,... Aln. Then, if we consider the information loss for a sequence 1,1,1,1,... we know by hypothesis that the probability of the information loss decreases to 0. However, for the information loss to be strong we see that S[IL(t) n All] must always intersect each All,... Aln non-trivially. We have defined IL(t) A 1 to be just S(t) so this says that r[S(t) must always ircersect each All,... An non-trivially. Clearly since the size of the intervals is shrinking we are getting smaller and smaller image intervals in any region. See Fig. 2.8.1. The following lemma in a sense puts a limit on how many intervals can be spread throughout the space [0,1) as images of S(t). Lemma 2.8.4 Let C be a bounded generator, with bound M, on [0,1) that maps intervals into intervals assuming wrapping around [0,1). Suppose that lim P[S(t)] = 0. Then, given any target set Y, there exists t1 such t-~o that for all t2 >-t #[IR(t2)] = j X #[TR(t2+1)] < j Proof: n Suppose that All = U Ii where I. = [aibi) i=l 1 - n-1 Consider c =- min[{ai-b i }1n al+ -bn]'That is, we consider M times the minimum distance from one interval to the next, assuming we are wrapping around [0,1). We also assume that all n such unions of intervals such as Al = U I. are proper, that is, for no i-1

45 i,j, is I. U I. an interval. Let tl be such that P[S(t)] < e. In particular, then, an interval of S(t) can never span the region between I. and Ii+l. Now, since IR(t) and TR(t) are always composed of finite unions of intervals, we let J IR(t2) = U J. i=l TR(t2) = U L. i=l We claim that r[Ji] cannot intersect non-trivially with more than one Li,. [Remember that ~[Ji] cannot span two intervals of the original All]. For, suppose 4[Ji] did intersect with more than one Li,. Then, since 4[Ji] is an interval, we must have, for some i', Li, = [Xx2) i'+l = [x3x4) x3 > x2 [Ji] = [Y1'Y2) with Y1 < 2 and Y2 X3 But now, consider the region [x2,x3] which is not part of TR(t2). Since C[J.] spans this region, and since IR(t) is monotonically decreasing, for every t < tl, some interval of IR(t) must have spanned [x2,x3). Furthermore, since [x2,x3) must have been part of All, [we have already eliminated spanning the region between two intervals of A11], we see that [x2,x3) is contained in TR(t2). But, this is a contradiction, since we assumed that U L. is a proper union. Thus, each [Ji] intersects noni=l 1 1 trivially with at most one Li, of TR(t2). Now, we have, #[TR(t2+1)] = #[S[IR(t2)] nTR(t2)] by lemma 2.8.3, lLj n. = #[l ij'n u L,] i=l i

46 #[ (i3J) n uLi] i=l i=l:z #[S(J.) n U Li,] j 1i=i i'=l This completes the proof of the lemma. We have seen how the lemma limits the number of image intervals of S(t). The following theorem shows that we in fact have succeeded in limiting the number of intervals that can arise in the target region [0,1). Recall from fig. 2.8.1 that this is limiting the number of hatched regions that can develop. Theorem 2.8.2 Given the hypotheses of the previous lemma, there is an upper,bound to the number of intervals of S(t). That is, for some L, lim t-oo #[S(t)] < M Proof: Let us consider the special case in which the target region is [0,1). Then, we can consider [0,1) to be the initial region as well. By the previous lemma, 3tl such that for t2 > tl, #[IR(t2)] = j #TR(t2+1) < j Let #[IR(tl)] = L., Then since we have IR(t) = S(t) and TR(t) = S(t) for this case we have IR(t1+l) = TR(tl+l)' L. That is, S(tl+l) = IR(t +l) < IR(tl) = L. By induction, it follows that for any k > 0 S(ti+k) < L. NQtile that any generator that is continuous on a metric space maps.intervals into intervals. Furthermore, both multiplicative and additive generators are bounded. In particular, the hypotheses of the previous lemma and theorem apply to generators of the form r(X) = mx+b (Dad n) for positive integers m and.n.

47 We now know that if P[S(t)] + 0 that the number of intervals that it can be composed of is at most L for some L. We shall use this fact to help us in Showing that the information loss in such cases can never be strong for all sequences. Notice of course that L depends on the initial partition of the transition seed spaces. Thus, we cannot simply say that considering an ergodic machine with a large number of states, the information loss is weak since [S(t)] can never span all of the transition seed spaces. However, we do notice that S(t) is eventually located at very few places. That is, Lemma 2.8.5 Let ~ be a bounded generator, with bound M on [0,1) and let lim P[S(t)] = 0. Then lim S(t) = K where K is a set of at most L dist-*+ t-*o tinct points, and L is as in Theorem 2.8.2. Proof: Each interval of S(t) decreases in length. Since eventually, by Theorem 2.8.2 there are at most L intervals, each interval always contains at most one point within it, as t-*+. Otherwise, the interval would have to divide into two, creating more than L intervals. Thus, there can be at most L points which are in S(t) for every t. Definition 2.8.6 If lim S(t) = K, where K has at most L points, then a point of K t-*co will be called a final point of S(t). We are now in a position to give any easy proof for the fact that an additive generator cannot be universally faithful for ergodic machines whose transition seed spaces are composed of finite unions of intervals and seed space is [0,1). Later on, we shall prove a more general theorem

48 for which an additive generator will be a special case, but the -special proof for the additive case is illuminating. Lemma 2.8.6 Let C be a faithful additive generator on [0,1) and let 0 < P[A11] < 1. Then, for the sequence 1,1,1,..., S(t) has L final points for some L. Proof: By definition, C is measure preserving. Also, since C is additive, r is 1-1. Thus, it is easy to see that P[4(A)] = P[A] for any set A. Now, for C to be faithful, we require that P[IL(k) n All] = P[IL(k)] * P[All] Thus, corresponding to the sequence 1,1,1,..., we have P[IL(k)] = P[l[IL(k-l) n A1l]] = P[IL(k-l) n A1] = P[IL(k-l)] * P[All]. Since P[IL(l)] = P[Al], we have that lim P[IL(k)] = lim {[P[A ]]k} = 0 11 k+~o k [ 11k Since S(t) = IL(t) n A1, we have lim P[S(t)] = 0 By Lemma 2.8.5, the conclusion follows. Lemma 2.8.7 Let r be a faithful additive generator on [0,1) and let 0 < P[A11] < 1. Then, the L final points of S(t) can be partitioned into disjoint sets such that within each set, the points can be ordered tso that xi+l - x. = a [mod 1], where a is the additive constant of the generator. (The disjoint sets will be called cycles.) Suppose for some final point, which without loss of generality we call xL, there is no point x. with xi+a = xL [mod 1]. In fact, let xi+i: = XL for i = 1,... L-l, with Bi # a. Since P[S(t)] + O, we can 1 1 L1

49 choose a to so that each interval of S(to) has length less than c = min l[2i-al. It is clear that if Ii(to) is the interval containl<i<L-l ing xi, then cIi(t0) is disjoint from IL(t0). Consequently, i[S(t0)] is disjoint from IL(t0) =-xL cannot be a final point of S(t). Proposition 2.8.2 If there are L final points corresponding to an additive generator on [0,1), L > 0 then if a is the additive constant a is rational. Proof: Since the L filial points occur in cycles by the previous lemma for some integer I, we have Ia = 1 [mod 1] =) a is rational. Theorem 2.8.3 An additive transformation r(x) = x+a [mod 1] on R = [0,1) is not universally faithful for ergodic machines with R = [0,1) and whose transition seed spaces are finite unions of intervals. Proof: We can choose a probabilistic automaton to model such that 0 < P[A11] < 1. Now, we know that P[S(t)] + 0, and thus, there are at most L final points of S(t). But, since G is an ergodic transformation, a must be irrational =- L = 0. Thus, after some time, IL(t) = c. But IL(t) = c =information loss is weak for the sequence 1,1,1,... => r is not faithful. The previous theorem actually says more than just the fact that an additive generator is not universally faithful given a few reasonable restrictions. It in fact shows that for any ergodic machine with 0 < P[All] < 1, r is not faithful; Thus, this in effect shows that additive transformations are very poor transformations since they can only

50 trivially faithfully represent probabilistic automata. We can now proceed to prove a more general theorem that will show, in particular, that multiplicative transformations on [0,1) are not universally faithful. Proposition 2.8.3 If P[S(t)] + 0 for a bounded transformation T, that maps intervals to intervals, then the L final points of S(t) can be partitioned into disjoint cycles, such that within each cycle there is an ordering of the points, say xl,..., Xk, such that C(xi) = xi+1 for i = 1,.. k-1 and e(xk) = x1. Proof: It is immediate that for any final point xj, there must be at least one final point xi such that S(xi) = x.. The proof of this fact is exactly as the proof of Lemma 2.8.7. Call x. the predecessor to x. and x. the successor to x.. The proof will be complete when we show there is exactly 1 predecessor. But this is trivial, for we know there are L successors and thus exactly L predecessors. Since each final point must have at least 1 predecessor, it can have at most 1 predecessor. Lemma 2.8.8 Let r be a bounded transformation on [0,1) that maps intervals to intervals. Let x1,... xL be the L final points of S(t) and assume P[S(t)] -+ 0. Then if for all xi, xi is not a point of closure of Aij, then [S(t0)] n Aij = n for some to. Proof: Since x. is not a point of closure of A.. for all i min {d(xi Aij)} = E > 0 where d(xi,p) = |xi-p| for any point p and the distance to a set is the natural generalization (remember that [0,1) is

51 a metric space). Now, choose to such that the length of each interval of S(t) < ~/M where M is the bound of the generator and such that there are only L intervals of S(t). Now, I[S(t0)] can only cover the points xl,... XL and each interval covering xi has length less than e = [S(t )] is disjoint from A.. Thus, 4[S(t )] n A... Definition 2.8.7 A transformation ~ oscillates if there exists some point x such that (1) for all h, ~ on (h,x] is not monotonically increasing or decreasing or (2) for all h, 1 on [x,h) is not monotonically increasing or decreasing. Notice that if a transformation doesn't oscillate, this is equivalent to saying that for an h sufficiently close to any point x, if the image of x is x', then C{[x,h)} = [x',h') or (h',x'] for some h' and similarly for (h,x]. An example of a function that oscillates is f(x) = x sin-, ~(0) = 0. From fig. 2.8.2 below we see that the function oscillates at 0. C(x) Fig. 2.8.2 An Oscillating Function It is clear that for most generators that one would think of, they would be non-oscillating. It is certainly true thatmultiplicative and additive generators are non-oscillating.

52 Theorem 2.8.4 Let C be a bounded transformation thatmaps intervals into intervals and suppose that C is non-oscillating. Assume 0 < P[A11], P[A12] < 1 and that lim P[S(t)] + 0. Then, C is not a faithful transformation. t-oO Proof: By a previous lemma, there are M final points of S(t). Furthermore, these points are on disjoint cycles. Since P[A12] > 0, we see that some final point must be a point of closure of A12. Otherwise, the iformation loss is immediately weak for an appropriate sequence. Nrtc, let x be a final point that is a point of closure of A12. Let the sequence of final points on the cycle of x be x = xl, x2,... x = x where x is the predecessor to x. We assume that since P[S(t)] + 0, we look at S(t) for t such that any interval of S(t) containing an xi satisfies the nonoscillating hypothesis. That is, in particular, the image of [xi,h) is either (hi,xil ] or [xi+l h2) The following diagram will be helpful. x x Part of All > Part of A12 Since x is a point of closure of A12 and a final point of A12, we have two possible cases. (1) (z,x] e A11 and Cx,y] e A12 for z < x < y or!(2) [y,x) e A12 and [x,z) e A1l for y x < z We assume case (1) as in the diagram above. The other case is:handlred similarly. Let us consider the part of S(t) that contains x. Suppose it is an interval (a,b) that properly contains x. It can be seen

53 that since we assume that S(t) is such that ~ is monotonically increasing or decreasing on each interval containing xi, after at most n time steps, the part of S(t+n) that contains x must be either [x,hn) or (hn,x] for some h. In either case, the image of this set, call it I; must either lie entirely in All or be the single point x. If it is the single point x, then after n more time steps, S(t) is only composed of at most n points and thus the information loss will be weak. In the contrary case, 4[I] n A12 = -. Consequently, since this can be done for every interval of A12 (recall that there are only a finite number), for some to, IL(to) n A12 = => is not faithful. Corollary 2.8.1 Suppose r is a bounded, non-oscillating transformation on [0,1) that maps intervals to intervals. Then, r is not universally faithful for ergodic machines with R = [0,1) and whose transition seed spaces are the finite unions of intervals. Proof: Theorem 2.8.1 and Theorem 2.8.4 Corollary 2.8.2 A transformation f(x) = mx+b (mod 1) is not universally faithful for ergodic machines with R = [0,1) and whose transition seed spaces are the finite union of intervals. Proof: It is each to check that ~ satisfies the hypotheses of Corollary 2.8.1.

54 Although further results on universally faithful transformations could be derived, the results of this section are adequate for our purposes. We now turn to a consideration of the relationship between pseudo-random number generators and representative ergodic machines. 2.9 Pseudo-Random Number Generators We have investigated ergodic transformations on [0,1) of the multiplicative and additive type. These transformations are similar to actual pseudo random number generators commonly used. For example, a mixed congruential method is one in which n.i+ = (ani+b) (mod m) where no is the initial seed, a is the multiplier, b is the additive constant and m is often taken to be the word size of the computer. The sequence {n.} is, of course, the sequence of pseudo random numbers generated. Since we have been working mainly on the interval [0,1), it would be nice to have some way of translating results on [0,1) to a more general interval [0,m). The following result is useful along these lines. Proposition 2.9.1 A transformation 4(x) = [ax+b) [mod m]on the space ([0,m), O[0,m), P =!dx) is isomorphic to the transformation I'(x') = [ax'+b/m] (mod 1) on the space ([0,1), B[0,l), P = dx). Proof: Consider the mapping h: [x] (mod m) -+ [x/m] (mod 1). For convenience we will use [x] to mean [x] (mod m). Then, it is easy to check thtEti his well defined. Furthermore, h is bijective (1-1 X onto.). We need only check that the following diagram commutes.

55 [X].m* [X]m ) ~.'m' m h{[x],} - {'h{[x]} h{[x][ } Now, h[{ [x]}] = h{[a ax +b]} ] m m m and' [h{[x]] ]= {[m} = m + = +b For example, a commonly used generator is the multiplicative generator T(x) = 513x (mod 235). One way of investigating what our results yield in the case of actual generators is to consider the interval [0,235). If we had a generator on the entire interval we know from our previous work that for canonical ergodic machines, the generator would be faithful as long as the probability of transition from one state to the other was at least 1/513. Clearly, this result on the interval [0,235) would still make sense if we considered only integral points in [0,235) as the only positive points that could arise. Thus, limitations of generators on the entire interval [0,235) would still hold if we considered only integral points in this interval. We would thus make the observation that to have a successful multiplicative generator the multiplicative constant should be fairly large. This is a theoretical result that has been seen to be true experimentally. The main problem in studying actual pseudo random number generators is that the seed space in this case is finite. For a finite seed space, however, it is easy to see that an ergodic transformation for an ergodic machine is never faithful if the ergodic machine is nontrivial. There are two ways to see this. First, observe that if the seed space is finite and the pseudo state space is finite, we have a finite

56 deterministic automaton, then one can always find a repeating cycle within a length twice the number of states of the automaton (we are looking at the autonomous case). Secondly, assuming the transition from 1 to 1 is nontrivial, that is 0 < P[A11] < 1, for a finite seed space it is straightforward to see that the information loss of the sequence 1,1,1,... becomes % after some time and thus the information loss is not strong. Consequently, r on a finite seed space could not be faithful. These remarks give us the following. Proposition 2.9.2 If the seed space is finite, then the ergodic machine E will not faithfully represent its associated automaton unless the associated automaton is trivial. [For example, has 1 state.] The information that we can directly apply to pseudo random number generators is thus somewhat limited. We now wish to turn to the question of considering transformations that act on R extrapolating from transformations operating on R. 2.10 Parallel Seeds and Permutation Independence So far, we have been considering pseudo space S and a probability space I = (R, F, P) with an ergodic transformation ~ acting on this space. The situation can be thought of as investigating a single stream of pseudo random numbers used to effect certain probabilistic transitions. However, it often happens that we wish to consider not S but an N:old:.product of S which we call SN, and rather than considering R we wish to consider R. Naturally, since both these spaces can be used for the pseudo state space and seed space respectively, previous theory appliaes to an ergodic transformation on this space. But suppose the

57 transformation on RN is to be just an N-fold product of the transformation r. That is, given a seed on R which will now of course be a vecN tor, <rl,... rN>, assume that the transformation C on R is just C<rl... rN> = <2(rl), I(r2),... 1(rN)> where C is a transformation on R. What can we now say about r and what can we say about the resulting machine whose pseudo state is a vector <si,... SN>? The first consideration that we must handle is what is meant by the cross-product of two ergodic machines. Definition 2.10.1 Let E and E' be two ergodic machines E = <X,S,T,Y,C,6!,X1> and Et = <X',S' i',Y',',,',X'>. Then the cross-product machine is defined to be <XxX', SxS', HxI', YxY', x, x, 6x6, XXXI where nx' is the product space (RxR', F, P) with P the product measure such that for measurable sets A and A' in R,R' respectively P[AxA'] = P[A]. P'[A'] and F is the smallest a-algebra that contains all sets of the form AxA' where A and A' are in F and F' respectively. The transformation CxS', 5x6, and AlxxX of course, act componentwise. An N-fold cross-product machine is the obvious generalization. Notice that although an N-fold cross-product machine is specified just like an ergodic machine, the necessary condition for it to be N N an ergodic machine is that X Ci be an ergodic transformation on R i=l This is not true in general if {.i} are ergodic transformations, for we can consider the following example. Example 2.10.1 Let r be the ergodic transformation on R = {a,b,c,d,e} that is cyclic. [See example 2.1.3.] Consider zxr on R2. It is obvious that

58 Cxl is not ergodic since it cycles through only a proper subset of R2. Since we shall only be concerned with generalizing to the case N N of using X = CxtxGx... x on R, we would like to have sufficient coni=l N times N ditions that allow us to tell when X C is ergodic. We first need some i=l classical notions from ergodic theory. The following definitions and theorem are found in Halmos [ 19 ]. Definition 2.10.2 A measure preserving transformation c is strongly mixing if lim P(a-nF n G) = P(F)P(G) for every pair of measurable sets F and G. n-too Definition 2.10.3 A measure preserving transformation ~ is weakly mixing if,n-l lim z P CJrF n G) - P(F)P(G) | = 0 for every pair of measurable sets n-*co nj=0 F and G. Since ergodicity can be shown to be equivalent to 1n-1 lim - Z P( F n G) = P(F)-P(G) it is easy to show that strong-mixing n00 n jO = weak-mixing =- ergodicity. We have the following result. Theorem 2.10.1 (Halmos) A transformation t on R is weakly-mixing iff Cxr on R2 is ergodic. Proof: The proof can be found in Halmos [ 19 ], page 39. The reason that this theorem is useful is now clear. It can readily be shown that multiplicative transformations are strongly-mixing

59 and thus, the product machine is an ergodic machine for such generators. For more information on the concepts of mixing and ergodicity, the reader is referred to Billingsley [ 7 ] or Halmos [ 19 ] where it is also shown that a large variety of transformations are strongly-mixing. This, in fact, turns out to be an easier condition to show than to prove ergodicity. A generalization of the previous theorem will show that if 4 is N N a multiplicative generator on R, then X C is ergodic on R. i=l We now consider the problem of faithful transformations in this general setting. In discussing the concept of a faithful transformation, we assumed that the initial value of the seed at time 0 could have been any seed in R. Similarly, we shall now assume that when we work with N-parallel transformations, each of the N initial seeds is independently chosen and can be any of the seeds in R. That is, the initial seed vector in particular is composed of independently chosen seeds from R with the distribution induced by the measure P on R. Lemma 2.10.1 Let E be an ergodic machine and let E be the cross-product machine ExE. Then, given the vector sequence <s(0), s'(0)>,... <s(n), s'(n)>, the information loss IL(s,s') on the cross-product space R2 is just IL(s) x IL(s') where s is the sequence s(0),..., s(n) and similarly for s'. Proof: The proof is by induction. Assume it is true for all vector sequences of length 5 t. Then,

60 IL[s(t+l), s'(t+l)] = S[IL(s(t), s (t)) n {As(t) s(t+l) X A' = [{IL(s(t)) n A (ts(t+l)} X {IL(s'(t)) n Als'(t),s'(t+l) = C[IL(s(t)) n As(t),s(t+l)] X [IL(s'(t)) n A's,(t)s (t+l)] = IL(s(t+l)) X IL(s'(t+l)). The proof for the initial case is trivial. Theorem 2.10.2 Suppose C is a faithful, weakly-mixing transformation for the N N ergodic machine. Then, X C is a faithful transformation on E i=l Proof: We first prove the case for N = 2 Let <s(0),s'(0))>,... <s(n),s'(n)> be any sequence in R2. We must show that the information loss for this sequence is strong. That is, we require that P[IL(s,s') n {A(n),i X A' S(n)}] = P[IL(s,s')]P[As(n X A' ( ] for any pseudo state <i,j>. Now, P[IL(s,ss) n {A(n)i X A' }] s''n),ij - P[IL(s) n As(n),i X IL(s') n A' S(n)j] (by previous lemma) ^ uU > s' (n),j - P[IL(s) n A (n; i]P[IL(s') n A's j] (by definition of probability on the crossproduct space) P[IL(s)]-P[Asn)i]-P[IL(s')]-P[A'. ] since r is faithful s(n),i s (n),j. P[IL(s,s')]'P[A X A', ] s(n),i s (n),j Since <s,s'> was any arbitrary sequence, this proves that the infirmation loss on R2 is always strong and thus, CxC is faithful. It is clear that a simple induction argument allows us to generalize this N N result to the case of a transformation X ~ on E. i=l

61 The situation for N seeds operating in parallel with the same generator has thus been fairly easy to handle. This corresponds to having N-pseudo random number generators working in parallel starting with independent initial seeds. In the case we have been investigating so far, the ith vector pseudo state component uses the ith power of the ith initial seed. However, intuitively, it seems that it should not matter which seed stream one uses, since the next seed element just depends on the generator 5 and, in a sense, is completely random. In a case that we actually consider later on, it turns out that we in fact do want to effect a permutation of the seed vector before using it on a particular pseudo state vector. More formally, let <r,... rN> be a seed vector. Then, a permutation of this vector is a rearrangement of the seeds. That is, let p be a bijective mapping from {1,2,...,N} onto itself. Then, the permutation p of the seed vector <rl,..., rN> will be the new seed vector <rp (l...., rp(N)>. Although in the actual application of permutations, there will be a particular method as to which permutation is applied at a particular time step, we investigate the more general question of applying any arbitrary permutation at each time step. Since we are allowing arbitrary permutations of the seeds at each time step, the transN formation X = X C does not represent the transition of the seed vector i=l at time t but rather we first apply the permutation at time step t, that is, apply p(t), and then apply t to get the effective transition. That is, we have Cop(t) as the seed space transition at time step t. Now, we are still interested in the fundamental question of information loss for sequences. Even with the notion of permutations, this concept is the same. What we want is transformations that are faithful even when

62 permutations are applied. Definition 2.10.4 N A faithful transformation x = X C is permutation independent if i=l the information loss function is strong for every vector sequence allowing an arbitrary permutation of the seed vector <rl,..., rN> at each time step. We say that r is permutation independent if for every N, I is permutation independent. We now consider the information loss assuming we first have a permutation at time step t. Proposition 2.10.1 N N Given a = N on E i=l Let p be a permutation on {1,..., N} that occurs at time step t, assuming no permutations occurred previously. Let s.(j) be the ith component of the pseudo state at time step j. Then, IL(s(0),... s(t+l) = N x IL(sp (0)..., sp (t+l)) i=l p1i)' p(i) Proof: We notice that since p is a permutation, Cop = pot. Thus, IL(s(O),...,s(t+l)) = ~op[IL(s(O),...,s(t)) n A(t) st+l)] = PO X S[IL(si(0),...,si(t)) n A (t),S(t+l)) = p X (IL(si(O)... si(t)) n A (t) (t+l) ] = =p xaL(s i(0),.,s (t+l)) N i= x (Sp(i) ( sp(i)t+l We see that the information loss is simply a rearrangement of

63 the cross-product terms in the information loss function without permutations. This observation will give us sufficient conditions under which a transformation can be permutation independent. Proposition 2.10.2 For the information loss to be strong at time step t+l, given a permutation p at time step t and not before, the following must hold. P[IL(sp(i) (0),...,Sp(i) (tl)) n A(t+ l),j] = P[IL(Sp(i) (0),..,Sp(i) (t+l))]-P [Asi(t+l)j] for all pseudo states j. Proof: The proof of this is simply the definition of the information loss being strong and the previous proposition. Notice that in the previous proposition, the intersection is with the nonpermuted ith pseudo-state component. Thus, the intersection of the information loss must be strong with every transition from state i to state j in general. For our purposes, these two propositions give us the sufficient conditions we need, for we get the following theorem. Theorem 2.10.3 If the information loss is RN at every time step for a faithful N N transformation X C on E, then r is permutation independent. i=l Proof: By proposition 2.10.2 and the fact that each component of the information loss is R at time step t, we see that the information loss is R at time step t+l for each component. Thus, we are effectively starting at time step 0 after the permutation. Consequently, for

64 permutations at each time step, we still have the information loss R for each component at every time step and thus it is strong. We see that for multiplicative generators, if they are faithful with the information loss R at each time step, then permutation does not alter the information loss. We conclude this chapter with remarks about considering ergodic machines that are not autonomous. 2.11 Nonautonomous Ergodic Machines Most of our results up to now have been derived for the case of autonomous ergodic machines. The definitions of information loss and gain and faithfulness of a transformation C have also been geared to autonomous machines. However, this was for the sake of understandability and convenience of notion only. It is clear that the generalization to nonautonomous ergodic machines is quite easy. For the case of autonomous ergodic machines, the seed transition space was the set A.. from state i to state j. For the general case, this seed space must be defined for each xcX where X is the input space. Thus, given the input x, there is a set A..(x)ER such that if the ergodic machine is in pseudo state i then the next pseudo state is j. Clearly, we can still consider the concept of information loss and information gain except now, we shall have to consider the loss and gain of a sequence s(0),...,s(n) as also indexed by the input sequence x(O),...,x(n). Similarly, we would have to generalize the concept of a sequence being strong to include every possible input at the next time step. For example, the following would be the definition of the forward information loss. Definition 2.11.1 The FIL of a sequence s(0),s(l),...,s(n) with input x(O),...,x(n)

65 is defined recursively as follows. (1) FIL (0) R (2) FIL (k) = C[FIL(k-l) n As(k sl) (x(k-l))] Similarly, we would define the concept of the information loss being strong as: Definition 2.11.2 Given the pseudo state sequence s(0),...,s(n) and the input sequence x(0),...,x(n), the information loss is strong at time n for this paired sequence if for all jeS P[IL(s,x) n As(n)(x(n))] = P[IL(s,x)]-P[As(n)(x(n))] The rest of the definitions are essentially based on these concepts and generalize easily. Care must be taken however in realizing that we are considering X to be an arbitrary set in general. Thus, when we take the minimum measure of the transition seed spaces min P[Aij(x)], iAj,x we must realize that we are now minimizing over a possibly uncountable number of seed spaces. Clearly, the faithfulness of generators will depend on how complicated the resulting seed spaces Aj (x) are allowed to be. Keeping in mind these considerations, the generalization to arbitrary ergodic machines is thus straightforward. 2.12 Conclusion The theoretical investigations of this chapter have provided us with enough machinery to aptly represent probabilistic models in a deterministic framework. The development has shown that faithful representative models exist. We have.also shown that one should be careful in trying to use the same generator for different ergodic machines. We

66 shall return to the theoretical development in Chapter IV where we, introduce the concept of a probabilistic homomorphism. In the next chapter, however, we experimentally investigate a neuron network model. It will be apparent that the model can in fact be represented in ergodic machine terms. We now turn to this neuron network model which is essentially an investigation of how an approximate homomorphism can preserve information.

CHAPTER III APPROXIMATE HOMOMORPHISMS — SIMPLIFICATIONS IN A NETWORK OF NEURONS In this chapter, we study the simplification of a network of interconnected neurons. The idealized neuron, which will be the atomic unit or component of the network, can be formally represented by a probabilistic automaton and thus, by an ergodic machine. The network of neurons will be a base model, and we investigate the relationship of this model with that of a specific lumped model that we shall consider. The atomic unit of the lumped model will be called a block, and will represent an aggregation of similar neurons in the base model. These neurons that form the block will be required to satisfy certain regularity conditions in their connection pattern. Also, each neuron in a block will have exactly the same parameters, that is, will be represented by the same ergodic machine, although the neurons will have varying behavior since they will be perturbed independently by "random" noise. The number of neurons in a block and the number of neighbors per neuron will be a measure of the "size" of the block. Although this notion can be formalized, we shall be content with observing that a block with more neurons and more neighbors per neuron has a greater size than one with fewer neurons and fewer neighbors per neuron. In actuality then, rather than considering a single base model, we are really considering a class of base models. Two base models can 67

68 differ either by having different parameters for their components, or by being of different sizes. It will be the case that we shall use the same lumped model for base models that differ only in size. Although we are carrying out a detailed investigation of a network of idealized neurons, the fact that they represent neurons is not fundamental. We are actually studying a method for reducing complexity in a network of ergodic machines. For example, in studying an ecological system, one might model an individual by an ergodic machine and consider a species to be the aggregation of similar units. Thus, a block wcrid represent a species. Similar translations can be undertaken in analyzing networks in other disciplines. However, for simplicity, we shall use neurons as our continuing example. It is useful to explicitly make some comments on the type of simplification that we are considering. We are considering aggregations of identical components (e.g. neurons) into blocks whose output we wish to preserve under the lumping. For each block, the output will be the percentage of neurons in the specific neuron states. This viewpoint influences what we consider important behavior. Thus, there may be micro-oscillations in the same block of the base model that we would not be aware of if the percentage vector for that block remained the same. For example, suppose two neurons in the same block simply alternated statas. This would be a micro-oscillation that we would not be aware of since the percentage vector for that block could not discriminate for $u:~h ad~i~wr. On the other hand, suppose each block represented a cell assembly, and activity of an assembly corresponded to percent activity of the block. Then, we could recognize oscillations between two such blocks. We consider such an example later on. Whitehead [38] for ex

69 ample, is studying groups of cell assemblies represented by blocks where an assembly (block) is dominant over another if the percentage firing of that assembly is greater than another assembly. We now turn to the concept of an approximate homomorphism. 3.1 Approximate Homomorphisms The base model-lumped model pair that we investigate consists of a probabilistic base model and a deterministic lumped model. The lumped model is, in fact, derived from a consideration of the limiting case of the base model. We take the idealized case of a base model that has a countably infinite number of neurons and a countably infinite number of neuron neighbors per neuron. This base model, call it B, can be thought of as the limit as the size becomes infinite of models B where n indexes a particular base model of a specific size, that is, B = lim B. n n In the next chapter, we show that the lumped model, call it L, is a homomorphic image of the model B. Actually, since the base model is probabilistic, we introduce the concept of a p-homomorphism between two ergodic machines. This allows us to say that the homomorphism relation exists between B and L. However, in this chapter, we will only investigate the relationship between B and L for various values of n. It turns out that much information is still preserved and that analyzing L gives us a reasonable idea of the behavior of B. We shall call the relationship between B and L an approximate homomorphism. That is, L is an approximate n homomorphic image of B if there exists B such that lim B = B and L is n n n a homomorphic image of B. We shall not bother to be more formal about the concept of approximate homomorphism since we do not use it formally in any manner.

70 3.2 The Neuron Model The base model-lumped model pair that we give a detailed description of in this section was first considered by Zeigler [ 44 ]. Both lumped and base models operate on a discrete time scale. This is natural in our context since We are studying simplification in the context of ergodic machines which also operate on a discrete time scale. Since the time scale can be made arbitrarily fine, this is really no restriction in modeling such processes as neuron behavior. Furthermore, most actual models of neural nets actually are modeled on a discrete time scale 3.2.1 The Base Model Components NEURONS —elements modeling basic behavior assumed to be characteristic of real brain cells. INPUT-WIRES — sources of external net excitation. Descriptive Variables For each NEURON: RECOVERY-STATE —with range the non-negative integers RECOVERY-STATE = i indicates that i time units have elapsed since the NEURON last fired; thus i=0 means the NEURON is now firing. NOISE —with range the real numbers; (when NOISE = r the actual threshold for firing of the NEURON is THRESHOLD (i) + r where THRESHOLD (i) is the threshold value characteristic of RECOVERY-STATE i. NOISE is a random variable generated for each NEURON. STRENGTH —with range the real numbers; STRENGTH = x indicates that the sum total of all NEURON inputs is x.

71 FIRING.STATE —with range {0,1}; 1 means the NEURON is firing (emitting a pulse), 0 that it is not firing. For each INPUT. WIRE INPUT-LEVEL —with range the real numbers; INPUT-LEVEL = x indicates that the excitation level on the INPUT-WIRE is x (measured in the same units as STRENGTH). For each BLOCK (designated set of NEURONS) and for each RECOVERY-STATE-i: %*IN.STATE.i —with range the rationals in [0,100]; the percentage of NEURONS in the BLOCK in RECOVERY STATE. i. PARAMETERS: For each BLOCK: NOISE-DISTRIBUTION —the probability distribution for random variable NOISE. THRESHOLD —a real valued function with RECOVERY-STATE as argument, giving the minimum strength needed to fire a NEURON in the absence of NOISE. For each NEURON: NEURON.NEIGHBORS —with range the subsets of NEURONS indicates the NEURONS whose outputs impinge on the NEURON. EXTERNAL-NEIGHBORS —with range the subsets of INPUT-WIRES indicates the external inputs impinging on the NEURON. and for each NEIGHBOR (NEURON or EXTERNAL) of the NEURON: SYNAPS —with range the reals; determines the influence that a NEIGHBOR has on the NEURON: positive indicates

72 facillatory, negative indicates inhibitory. Component Interaction Postulate 1. Each NEURON receives inputs from other NEURONS called NEURON-NEIGHBORS as well as from external sources called EXTERNAL-NEIGHBORS. We just refer to NEIGHBORS when the context is clear. Postulate 2. The NEURONS are grouped into disjoint sets called BLOCKS. This partitioning of the NEURONS places the following restrictions on the possible NEURON-NEIGHBORS of a NEURCN: Suppose that a NEURON a belongs to BLOCK i. a) If a has some NEIGHBORS in BLOCK j (which may equal i) then every NEURON in BLOCK i must have at least one NEIGHBOR in BLOCK j. b) The SYNAPS weight is the same for the outputs of all NEIGHBORS of a from the same BLOCK. The NEIGHBORS of a coming from BLOCK j all have the same "influence" on it, but this may be different from that of the NEIGHBORS of a coming from BLOCK k (k)j). c) BLOCK j exerts the same "influence" on each NEURON in BLOCK i. We measure the "influence" of BLOCK j on a by the product of the number of a's NEIGHBORS from BLOCK j and their common SYNAPS value. This product is to take the same value for all NEURONS in BLOCK i, so that the "influence" of BLOCK j on each NEURON in BLOCK i is the same. d) The characteristics of all NEURONS in a BLOCK,:namely the THRESHOLD function and NOISE probability

73 distribution, are the same. e) The EXTERNAL-NEIGHBORS and their SYNAPS weights are the same for all NEURONS in a BLOCK. Postulate 3. Each NEURON operates at each time step as follows: a) The STRENGTH of the input volley is computed by adding together the outputs (which may be 0 or 1) of all the EXTERNAL-NEIGHBORS and NEURON-NEIGHBORS of a weighted by their respective SYNAPS values. b) If this STRENGTH is great enough, the NEURON will fire, i.e. will put out a 1 for the next time step; otherwise it will put out a 0. The 1/0 output can be thought of as a pulse/no pulse sent out by the NEURON, with only the pulse being able to influence the activities of another NEURON. c) The minimum STRENGTH required to fire is dependent on the RECOVERY-STATE of the NEURON and NOISE. Generally, the longer it has been since the NEURON last fired, the easier it is to fire, indeed if it has just firedit may be impossible to fire for some time (this period is called the absolute refractory period). This fact is formalized in the shape of the THRESHOLD function. This function determines for each RECOVERY-STATE value i (time since last fired) a value, THRESHOLD (i) if there were no noise. It is also assumed that the NEURON is perturbed by additive noise independently generated at each step for each NEURON from the NOISE probability distribution.

74 Thus if the NEURON is now in RECOVERY-STATE i., the mininum STRENGTH required to fire it is THRESHOLD (i) plus the sample value of NOISE. d) If the NEURON fires, its RECOVERY-STATE is set to 0, otherwise it is incremented by 1. Thus, as implied, it records the time elapsed since the last firing of this NEURON. This completes the description of the base model. It can easily be seen that the basic component, the neuron, can be represented formally as an ergodic machine. 3.2.2. The Lumped Model Components BLOCK'S —each BLOCK' represents in lumped form the behavior of a corresponding BLOCK of base model NEURONS. (' indicates the correspondence.) INPUT-WIRES's —each INPUT-WIRE' of the lumped model represents a corresponding INPUT-WIRE of the base model. Descriptive Variables For each BLOCK': For each i = 0,1,2,...: % IN-STATE-i' —with range [0,100]; % IN-STATE-i' = q predicts that q/100 of the base model NEURONS in BLOCK are in RECOVERY-STATE-i. Thus %*IN STATE-i' of BLOCK' corresponds to %*IN-STATE-i of BLOCK. We use the term PROBABILITY-IN-STATE i interchangeably with % i STATE i'. STRENGTH' —with range the real numbers; STRENGTH' = x'

75 indicates that the sum total of inputs to BLOCK' is x'. For each INPUT-WIRE': INPUT-LEVEL' —with range the real numbers. PARAMETERS: For each BLOCK': BLOCK' NEIGHBORS —with range the subsets of BLOCK's; specifies the BLOCK's which send input to the BLOCK'. EXTERNAL' NEIGHBORS —with range the subsets of INPUT-WIRES'. and for each NEIGHBOR (BLOCK' or EXTERNAL'): WEIGHT' —with range the real numbers; determines the influence of the NEIGHBOR on the BLOCK'. NOISE-DISTRIBUTION' —a cumulative density function (c.d.f.) (will be the c.d.f. of NOISE). THRESHOLD' —a function from the non-negative integers to the reals (to be identical with THRESHOLD). Component Interaction 1. Each BLOCK' receives input from other BLOCK's (called BLOCK' NEIGHBORS) and external sources (called EXTERNAL' NEIGHBORS). 2. Each BLOCK'- at each time step operates as follows: a) The STRENGTH' of the input to the BLOCK' is computed as the weighted sum of all NEIGHBOR (BLOCK' and EXTERNAL') outputs. b) Using this STRENGTH' and the current values of the %*IN'STATE-i' (one for each RECOVERY-STATE i) the next values of the %'IN.STATE-i' variables are computed. For each i, the current value of %'IN*STATE'i' is multiplied by 1 - F(x - THRESHOLD'(i)) to obtain the new value of %.IN.STATE i+l'.

76 Here, F is the NOISE'DISTRIBUTION' (i.e., the cumulative distribution function), and x is the value of STRENGTH'. This accounts for the fraction of base model.NEURONS in RECOVERY-STATE-i which have not fired (hence go to i+l); the remainder then, are those that do fire (hence go to RECOVERY-STATE0O) and the lumped model reflects this as a sum of all such contributions to obtain the new %*IN-STATE-O'. We will discuss the derivation of this lumped transition rule after we define the correspondence between the base and lumped structures. c) The output of BLOCK' is just %-IN-STATE-O', which, if the lumped model is valid, is the percentage of NEURONS in the BLOCK which are firing, i.e., outputting a pulse (1). (The rest are in the various RECOVERY-STATES and since they output a 0, will not influence any activity at the next time step.) CQRRESPONDENCES BETWEEN LUMPED AND BASE MODEL Components Each BLOCKJ (lumped model component) corresponds to a BLOCK (partition class) of NEURONS (base model components). Desea- ptive Variables For each BLOCK': For each i = o,l,2,... %'IN-STATE-i' = % IN-STATE-i STRENGTH' = STRENGTH of any NEURON in the corresponding BLOCK (this STRENGTH is the same for each NEURON in a BLQCK by base model postulate 2).

PARAMETERS: INPUT' WIRES = INPUT-WIRES for each BLOCK': BLOCK' NEIGHBORS = the set of BLOCK's whose corresponding BLOCKs influence NEURONS in the BLOCK (corresponding to BLOCK'). EXTERNAL' NEIGHBORS = EXTERNAL'NEIGHBORS of any NEURON in the the BLOCK (a BLOCK constant according to postulate 2). THRESHOLD' = THRESHOLD NOISE-DISTRIBUTION' = NOISE-DISTRIBUTION for any NEURON in the BLOCK (postulate 2, again). and for each NEIGHBOR (BLOCK' or EXTERNAL') WEIGHT' = the product of the number of corresponding NEIGHBORS and their common SYNAPS values (postulate 2) of any NEURON in BLOCK. for each INPUT-WIRE': INPUT LEVEL' = INPUT-LEVEL Component Interaction Continued We are now interested in deriving the lumped model transition function. The state of the lumped model is described by a vector % IN-STATE'-i for each BLOCK'. We wish to calculate how this vector changes at each time step. Suppose at time step t, this state vector for a specific BLOCK' is represented by s(t) = <s1(t),s2(t),...,sn(t)> where si(t) represents the %-IN-STATE' i. We are assuming that there are n recovery states. Consider the information that we have for each NEURON in BLOCK

78 corresponding to BLOCK'. We know that it is perturbed independently by noise at each time step and that its threshold is a function of the recovery state. Also, for this BLOCK, the THRESHOLD function and NOISEDISTRIBUTION function are the same for each NEURON in the BLOCK and the same as for BLOCK'. Consequently, we can compute Pi i+l(x(t)), the probability that the NEURON goes next to RECOVERY-STATE-i+l (does not fire) given its present RECOVERY'STATE.i and the input STRENGTH = x(t). Assume -that F(z) is the cumulative noise distribution function for the BLOCK:under consideration. We also assume that a neuron that does not five in -recovery state n remains in state n and thus p n+l(x(t)) = pn(x(t)) is the probability that a neuron in state n remains in n. We thus have, Pi i+l(x(t)) = Prob{x(t) < THRESHOLD(i) + z} = 1 - F(x(t) - THRESHOLD(i)) Also, since a NEURON can only go to RECOVERY-STATEO0 if it does not go to i+l, we have Pi, = 1 - Pii+l(X(t)) = F(x(t) - THRESHOLD(i)) We can present these conditional probabilities in the form of a -m.atrix where each Pij = ij (x(t)) Poo Pol ~ to O 0... 0 PlO ~ Pl2 * ~ PlO 0 p12 P(x(t)) = 0.P * *o P~n-l,n PnO. nn Given the matrix P(x(t)) and the <%NIN-STATE'-i> (row) vector s(t) we shall define the transition behavior as: s(t+l) = s(t) P(x(t))

79 It turns out that if we let this vector s(t) = <%-IN-STATE'-i> be the corresponding vector to the <%'IN-STATE-i> of the corresponding BLOCK in the base model, then in a model B that is the idealized version, with an infinite number of neurons and an infinite number of neighbors per neuron, the correspondence is preserved over time using the above transition matrix. We show this formally in the next chapter. Intuitively, however, the reason it works is that since each neuron has an infinite number of neighbors, we can assume that each neuron receives the same identical input. Furthermore, the fact that there are an infinite number of neurons in a block will say that the correct percent will not fire in any RECOVERY'STATE-i, given the noise distribution. In the rest of this chapter, we experimentally investigate the correspondence between realizable base models B and their approximate homomorphic image L. We shall find it convenient to consider the above matrix a Markov matrix for purposes of analysis since the entries can be thought of as transition probabilities. If the strength x(t) is a constant at each time step, then the matrix can be thought of as a homogeneous Markov matrix. This conceptualization of the matrix as a homogeneous or inhomogeneous matrix is for the purpose of using Markov theory to study the limiting forms of the <%-IN-STATE'-i> vector only, since the meaning of %-IN-STATE' i is not the probability of being in a particular recovery state for the BLOCK. The lumped model is in fact deterministic and there are no probabilities associated with it. 3.3 The Computer Simulation In order to investigate the approximate homomorphism between the base and lumped model, we carried out a computer simulation of the two models. The simulation consisted of a package of Fortran programs that

80 allowed great flexibility in specifying connection patterns, ititial states and parameter settings. Certain statistics were automatically compiled. A pseudo random number generator was used to create a stream of "random" numbers to be used for NOISE. Further information on the simulation programs can be found in Appendix II. Two restrictions on the base model were imposed in order to simplify the computer implementation. (1) There is a maximum of 20 recovery states possible. A neuron in the, highest recovery state that does not fire remains in the same recovery state. In our test runs we only use recovery states 0-6. (2) For all i, j, all neurons in BLOCK i have the same number of neighbovrs in BLOCK j with equal synapse weights and these neighbors are randomly chosen from the neurons in BLOCKj. In the following sections, we will consistently use "run" to mean a single simulation of the base or lumped model with a fixed set of parameters and initial random number seed. An "experiment" will refer *to several runs with identical parameters but different random number seeds. A set of related experiments will be called a "series of experiments" or simply a "series". Since the EXTERNAL-NEIGHBORS and their SYNAPS weights are the same for all neurons in a BLOCK, we will use the term "external input" for the external influence. The next three sections are three different series of experiments. or, tte noa-se in each neuron, we always used a Gaussian distribution. There seemed- to be no reason to consider other noise distributions although the computer implementation allowed them. Also, to simulate an absolute refractory period and a relative refractory period, it was

81 found convenient to use an exponential decay function for the threshold, although again, an arbitrary threshold function could have been used. It was sometimes convenient to always have a fixed background level of input to each neuron and this is reflected in the parameter setting background. This can either be considered as part of the external input or as a modification to the threshold. The first series of experiments simply investigates the situation where there are no connections but just aggregations of neurons each of which works autonomously. 3.4 Single Block —No Connections We first describe the basic structure of this series of experiments. Connection Pattern The base model consisted of a single block with no connections between neurons in the block. We varied the size of the block using 25, 50, 100, 500 and 1000 neurons. Parameter Settings 1. Background: 0 2. Noise: Gaussian, with mean 0 and standard deviation of 10, 20 and 40. 3. Threshold: Exponential decay 27e s where s is the recovery state. Initial State All neurons were initially in recovery state 6. Input Segment Input levels of -100, -80,..., 80, 100 were tried. The length of the input segment was sufficient to allow a "steady state" to be

82 reached. Not all possible combinations of experiments were conducted. The following shows the experiments actually done (x's). Size St. Dev 25 50 75 1t0 500 1000 10 x x 20 x x x x x x 30 x Since there are no connections, the only input that a neuron receives is external input and consequently, each neuron receives the same total input. The only difference is that each neuron samples independently from a noise distribution. Let us assume that there is an exact agreement between the percent-in-state vector of recovery states in the lumped model and the percent-in-state vector of the base model at time t. Then, at time t+l, the actual percent-in-state vector of the base model can be considered to be simply a sample from the distribution predicted by the lumped model. Whether or not the observed differences between the actual percent-in-state vectors and the predicted percent-in-state vector are reasonable can be tested using the chi-square statistic. In effect, we would be testing the random number generator. Such a chi-square test woeuld surely have given good results if the lumped model was reset at each time step. However, the aim of our investigations is to show that the-lumped model can be used to estimate behavior of the base model without of course knowing the base model behavior. Resetting the lumped model correctly would clearly be possible only if base model behavior is known. Without resetting the lumped model, error propagation becomes an

83 added consideration and is discussed later. Before giving the results of the chi-square tests, it will be instructive to examine the behavior of the lumped model. Because there are no connections between blocks, for a fixed level of external input the lumped model can be analyzed as if it were a homogeneous Markov chain with the following transition matrix: PO0 l-p00 0 0 0 0 0 PiO ~ 1-Po 0 0 0 O P20 0~ 1-P20 0~ ~ P= 30 0 O 0 l-p30 ~ ~ P40 0 0 0 0 1-p40 0 P50 0 0 0 0 0 1-P50 P60 0 0 0 0 0 l-P60 The actual values of the entries can be easily calculated. For a fixed level of external input x (note that this is the same at each time step t), if F is the cumulative distribution function (c.d.f.) that is Gaussian with mean 0 and standard deviation a, (N(O,c)), we have Pi = F(x - 27e ) Using the terminology of Feller [ 15 ], we see that viewed as a Markov Chain, the transition matrix is irreducible, aperiodic and positive recurrent. Consequently, it possesses a long run distribution. That is, regardless of the initial % in state vector of recovery states, the lumped model will approach the same steady state distribution in the limit. The limiting distributions for various input levels are shown in figs. 3.4.1 and 3.4.2.

100 + 20 -100 -80 -60 -40 -20 0 20 40 60 80 100 External Input Level Figure 3o4.1o Lumped Model Steady State Distribution a=10

100 c) 60 \ /,,r 40 20 -80 -60 -40 -20 0 20 40 60 80 100 External Input Level Figure 3.4,2. Lumped Model Steady State Distribution a=20

86 The behavior of the base model was qualitatively similar to that of the lumped model although the base model percentages tended to fluctuate about a "steady state." A quantitative analysis was obtained using the standard chi-square test. For each time step we calculated a chisquare value assuming that the base model percentages were a sample from the theoretical distribution represented by the lumped model % in state vector. If the expected number of neurons in a particular recovery state was less than five, then this recovery state was aggregated with another recovery state. Thus, the number of degrees of freedom ranged from 0 to 6. For a given number of degrees of freedom R, the chi-square statistics obtained should be x2 distributed. Tables 3.4.1 and 3.4.2 show the distributions that were actually obtained for each experimental run. The values do in fact appear to be X2 distributed. A second level chi-square analysis was conducted assuming the hypothesis that we were sampling from a chi-square distribution. Table 3.4.3 shows the results of this analysis. One would not expect the chi-square tests between the lumped and base models to be extremely good because of the problem of error propagation. The chi-square test assumes that sampling is independent from time step to time step. Thus, if at each time step we were to reset the lumped model, we would expect good chi-square results. One way to eliminat-most of the correlation from one time step to the next due to not resetting the lumped model is not to sample at every time step. Rather, one~might sople say at every fifth time step. We did an analysis sampling; at- every fifth time step for the case of 100 neurons with a- standard deviation of 20. The results obtained were extremely good and are also shown in tables 3.4.1 and 3.4.3.

87 TABLE 3.4.1 X2 STATISTICS OF BASE MODEL VS. LUMPED MODEL (PART 1) Sample Deg. Size St. Dev. Size Freedom Distribution of Chi-Square Values 25 20 93 2 99 96 88 74 60 8 50 20 30 2 100 97 90 73 37 3 75 20 30 2 100 97 97 80 47 3 100 20 44 2 98 95 91 73 36 7 500 20 42 2 95 93 88 74 62 2 1000 20 42 2 100 98 90 83 48 5 25 10 12 2 100 100 92 83 58 8 100 10 36 2 100 97 94 80 50 3 25 40 96 2 100 95 90 79 49 4 100 20 174 2 99 95 91 74 48 6 (Last row uses samples from every 5th time step) The Theoretical distribution is: 99 95 90 75 50 5

88 TABLE 3.4.2 X2 STATISTICS OF BASE MODEL VS. LUMPED MODEL (PART 2) Sample Deg. Size St. Dev. Size Freedom Distribution of Chi-Square Values 25 20 66 1 100 97 88 73 53 0 50 20 66 4 100 97 92 79 48 0 75 20 51 4 100 100 100 84 61 10 100 20 54 5 96 94 90 68 39 7 500 20 105 6 100 97 93 76 40 1 1000 20 105 6 98 94 88 73 57 7 25 10 57 3 100 100 95 82 60 16 100 10 48 5 98 96 90 77 54 4 25 40 32 3 100 94 78 78 59 3 The theoretical distribution is: 99 95 90 75 50 5

89 TABLE 3.4.3 ECOND, LEVEL CHI-SQUARE ANALYSIS Using 1st Level Chi-Sqgare From % ConfiSize St. Dev. Deg. Sample 2nd Level Deg. of dence of Freedom Size Chi-Square Freedom Acceptance 25 20 2 93 8.27 5 10 50 20 2 30 2.65 3 30 75 20 2 30 2.30 3 50 100 20 2 44 4.93 3 10 500 20 2 42 4.88 3 10 1000 20 2 42 3.72 25 25 10 2 12 0.17 2 90 100 10 2 36 1.58 3 50 25 40 2 96 2.54 5 75 25 20 1 66 2.05 3 50 50 20 4 66 3.43 3 30 75 20 4 51 1.38 3 70 100 20 5 54 4.63 3 20 500 20 6 105 7.67 5 10 1000 20 6 105 5.27 5 30 25 10 3 57 1.80 3 50 100 10 5 48 0.56 3 90 25 40 3 32 8.33 3 2.5 25 20 2 174 5 90 (Last row uses samples from every 5th time step)

90 We shall now consider the question of error propagation. Zeigler [ 45 ] introduced the concept of considering the error at a time step as composed of two parts: (1) due to sampling error and (2) due to not resetting the lumped model. The second component of the error can be analyzed solely in terms of lumped model behavior. Fig. 3.4.3 is a schematic representation of error propagation. Because of the structure of the approximate homomorphism h, c1 and E2 are due to sampling errors. The additional error kel is due to not resetting the lumped model. To investigate this additional error, we will describe what happens to t.vo states of the lumped model that differ by a distance e. Assume that r and s are two different percent in state vectors (states) of the lumped model. That is, r = <rO,...,r6 and s = <sO,..., S6>. Let distance be measured by the metric d defined as: d<r,s> = max Iri - si| Let a = r - s. Then, a will be called the error vector. Notice that 6 X a. = 0. We have i=0d<rP,sP> = max (aP) i where 6 aP = aiPio' a0(1-Po00) al(l-P10)' a4(1-p40), a5(l-p50) + i=0 a6 (l-P60))~ Now, if there exists k < 1, such that for all pairs of lumped model states r,s, d<rP,sP> - kd<r,s> then the error propagation can be shown to be ultimately bounded. (Zeigler [ 45 ]) Unfortunately, a simple example shows that such a condition cannot be guaranteed. Consider the two percent in state

base 6 f2f=6(fl) f 36(f2) base f 2 trajectory h h IP- ^; 2g=h(fg ) lumped P / () 2~ trajectory j 2-1 k1 P2=6 (P1) P3=6t (P2) Figure 3.4.3. Error Propagation

92 vectors r = <.80,0,0,00,.1,.1> and s = <.9,.1,0,0,0,0,0>. Let the external input be -20 and the standard deviation of the noise be 20. The p00 =.01, p10 =.07, p20 =.12, p30 =.14, p40 =.15, PS0 = -16, P60 =.16. a = <-.1,-.1,0,0,0,.1,.1> aP = <.024,-.099,-.093,0,0,0,.168> Thus, d<r,s> =.1 but d<rP,sP> =.168. We see that k must be greater than 1. In one transition then, we cannot be assured that two lumped model states will converge. In this case, bounded error propagation does not apply. An interesting situation is the case of only three different recovery states. Such a situation arises quite naturally if one considers, for example, firing, absolutely refractory, and quiescent as the only three recovery states possible. It can easily be shown that if Po Po01 02 P= PIO Pll P12 P20 P21 P22 for 0 < pij < 1, i,j = 0,1,2, there exists a k < 1 such that d<rP,sP> s kd<r,s> for all vectors r,s. That is, in one transition the error will be reduced. For a matrix of the type that we have been considering, poo00 l-p00oo 0 P P10 0 1-Plo P20 0 1-P20 this still holds.

93 This first series of tests has shown that analyzing the behavior of the lumped model seems to be a reasonable way to study expected base model behavior. We now turn to another series of experiments that has positive feedback between neurons. This series of experiments will continue to investigate a single block. 3.5 Single Block —Positive Feedback Connection Pattern The base model has 1 block. The number of neurons ranged from 50 to 1000 and the number of neighbors varied from 20 to 100. [See below ] The total input weight [synapse wtx,#nbrs] was held at a constant value of +100. Parameter Settings 1. Background: -20 2. Noise: Gaussian with mean 0 and standard deviation 10. 3. Threshold: 200e-s Initial State All neurons start in state 0. Input Segment The external input was 0. Experiments NBRS Size 100 80 60 50 30 20 so x x 100 x x x 200 x x 400 x x x x 800 x x 1000 x x x

94 The purpose of this series of experiments is to analyze the effect of feedback within a block of neurons. We wish to show that the lumped model can still explain the behavior of the base model. The first step is to explain the observed behavior of the lumped model. In our experiments, we used a starting state of all neurons in recovery state 0 (i.e. all firing). The subsequent behavior of the lumped model is shown in fig. 3.5.1. We see that the lumped model firing very quickly reaches a constant value, and from the output, viewed over many time steps, it was observed that the lumped model does reach. steady state. Experimentally, we tried other initial starting states, but in all cases, the identical long term steady state was reached. 204J.-4 15 0 i0).), o\ 5 10 15 20 25 30 Time Step Figure 3.5.1. Lumped Model Output Behavior (positive feedback) *~

95 One way to analyze the behavior of the lumped model is to view the transition matrices at each time step as representing a nonhomogeneous Markov chain. That is, at each time step, the transition matrix is a Markov matrix, but an external input, in this case simply the last percent in state vector, determines the Markov matrix at the next time step. It is easily seen that the following recurrence relations define a non-homogeneous Markov chain. Definition 3.5.1 Let s(t) be the row vector representing the percent in state vector of the lumped model at time step t, and let P(t) be the transition matrix at time step t. Then, by the recurrence relations we shall mean the following true equations. (1) s(t) = s(t-l)P(t-l) 6 Sk(t) = s.(t-)Pik(t- ) k = 0,1,...,6 i=o (2) x(t) = Ks0(t) where K is the total feedback and x(t) is the input at time t def (3) PiO(t) = F(x(t) + - =i) F.(x(t)) where B is background level and 0. is the threshold in recovery state i From equations (1), (2) and (3) we get (4) pio(t) = F[K S(t-l)Pj0(t-l)] Recall that Recall thatt

96 (5) Poo(t) 1-oo(t) 0... Pl(t) 0 1-po (t).. 0 P(t) = 20 600.. l-P60(t) In the following analysis, we shall use the theory of nonhomogeneous Markov chains as developed in Paz [ 31 ]. We will u.e the terms "weakly" and "strongly ergodic" which are used by Paz only in this section, and they should not be confused with our previous use of the concept of ergodicity. The following definitions and propositions can be found in Paz [ 31 ]. Definition 3.5.2 Let P = [p.i] be a stochastic matrix. Then define 6(P) = sup (p.-pvj) where k+ dgf max{O,k} u,v J uj lP I = sup jlp.jl i j J Proposition 3.5.1 0 < 6(P)' 1 Proof: The proof can be found in Paz [ 31 ], page 68. Definition 3.5.3 A non-homogeneous Markov chain is an infinite sequence of matrices {-P}iO such that the P is the P is the matrix of transition probabiliDefiit tiK e. t=i.

97 Definition 3.5.4 n Let H = i P.. Then we say that i=m (a) A non-homogeneous Markov chain is weakly ergodic if lim 6(Hm) = 0, m = 0,1,2,... n-moo (b) A non-homogeneous Markov chain is strongly ergodic if there exists a constant matrix Q [i.e. all rows equal] such that lim I||H - QI| = 0 n-+o Theorem 3.5.1 (Paz) A Markov chain is weakly ergodic iff a a subdivision of the chain 00 into blocks of matrices {H. } such that Z (1-(H.. )) diverges. j' j+l j=0 j' j+ Proof: The proof can be found in Paz [ 31 ], page 74. Proposition 3.5.2 Suppose F is a strictly monotone increasing c.d.f. (for example, the Gaussian distribution). Then, the lumped model sequence of transition matrices is weakly ergodic. Proof: Given the transition matrix, we have 6[P(t)] = max [1-Pko(t)] 0<k_<6 From equation (3) we see that Pko(t) = F(x(t) + 8 - oi) = F(Ks0(t) + B - ei) Now since F is strictly monotone increasing, we have 0 < F(B - oi) - Pko(t) - F(K + B - 0i) < 1 Thus, pkO(t) is uniformly bounded away from 0, and if we let the

98 subdivisions in the previous theorem be just Hi, = P(j), then we j+l;1 have 1 - 6(P(t)) uniformly bounded away from 0. Unfortunately, weak ergodicity is not very useful. It tells us only that there is a sequence of constant matrices {E } such that mn lim |H E 1 0 Hmn -E mnI0 n-*o We would like to say that there exists {Em} such that lim Hmn - EJ= O i.e., strongly ergodic. [It can be shown that Em is independent of m.] Strong ergodicity would say that the non-homogeneous process reaches a long run distribution (i.e., steady state). Note, however, that even strong ergodicity would not allow us to say that the same long run distribution was reached independent of the starting state. This is because our non-homogeneous chain is itself a function of the starting state. We have not been able to show that the sequence of transition matrices is strongly ergodic. We now turn to an analysis of the recurrence relations in an attempt to get some results for the matrix P(t) as t + I. Although we cannot be sure that there is a limiting matrix, we nevertheless try to get bounds on the entries pik(t) as t -+. The following theorem provides us with a method to calculate these bounds (although fairly crude). Theofrem 3.5.2 Assume the threshold is a monotone decreasing function. Then, Uimr F (KF )t(Ks ())] < im P (t) < lim P i(t) < lim Fi[(KF )t(Ks0(0))] t-)o t-*- t-oo t-> 6 Proof: Since we assure the threshold function is monotone decreasing, we

99 have 0 < Fi(x) < Fi+l(x) 1 i = 0,1,...,5 Consequently, Pio (0) = Fi (Ks0(0)) i [KFo (Kso (O)p j 0 (1))] PiO(2) = Fi i=0 j(1)Pjo(] >= Fi[K s ()F[KFo(Ks (0)) ] A 1 [i=O i0 FF[KFO[KFO(Ks (0))]] Inductively, Pio(t) - Fi [ (KF) (Ks(0))] Similarly, we can get an upper bound Pio (t) - Fi [KF6) (KSo (0)) ] Since F. is monotone, we have either KF (x) x t t-l ~tt-2.. (KFi)t(x) >- (KFi)t 2(x) KFi) (x)....' KFi(x) _ x or (KFi)t(x) - (KFi)t-l(x)... < KFi(x) < x In either case since we have a bounded monotone sequence, the limit exists as t-+ [may depend on Ks0(O)]. Thus, the result follows. We now present an example of calculating such bounds. (Note that the limit points are fixed points of KF0 or KF6). Let K = 100, o = 20, ei = 200e, p = O, 2 = -20. lim (KF0) (Ks0(0)) = 0 t-+t since this is the only fixed point of KF0. Also, lim (KF6)t(Ks(0)) = 100. u *~t+~^

100 Thus, F1(0) = F[-20 - 200/e] = 0 and F1[100] =.74. So, 0 < p10 -.74 is the limit. Other bounds for this situation yield 0 < p00 = O and.02 < P60 < i. It is clear that such bounds do not provide too much information. We have seen that analytic results for long run distributions appear to be very difficult to obtain in the general case. Rather than continuing to attack this problem we now turn to more informal "proofs." We now consider a heuristic "proof" that the lumped model witV the parameters of this series can in fact reach one and only one Hfeady state independent of the initial state, if it reaches a steady state at all. We assume that in the steady state, the first six recovery states 0,...,5 have approximately the same percent in state values. Moreover, nearly all the neurons that do fire will in fact have been in recovery state 6. Therefore, if s represents the steady state vector, s = <SS,SoS, > and s6 =1 - 6s Assuming that P60s6 = so, we get PSO = S0/s6 = S0/(l - 6s0) Now, Ks 0-e \ l100s -20.5 60 = FK5 ] = ) = ( 0 -'60 6- 0 10 So l0oos - 20.5 / s 0 [Note:. is the standard ( l00 0 ) \=( } ~ normal c.d.f.] 10 0s-2 s0 isa ^atsi' &- in the steady state. Graphing each side of the above equation as a function of so (see fig. 3.5.2), we find that the only possible solution so is very close to the observed value. The informal approach has thus given us useful information.

5 * 4 3 00~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~02 \ 1~~~~~~~~~~~, 0.02.04.06.08.10.12.14.16.18.20 % in state O (firing) Figure 3.5.2. Heuristic Calculation of Lumped Steady State. (Intersection of the two curves is the steady state firing level.)

102 Since we wish to use the deterministic lumped model to predict the behavior of the probabilistic base model, we now informally analyze how error might propagate. Recalling that what might happen is that the true base model percent in state vector is a slightly displaced version of the predicted percent in state vector, we examine what happens to two lumped model states that are close to each other. Instead of using the seven-dimensional vector that we actually have, we make the problem more tractable by just using two recovery states 0 and 6. Thus, we consider thhe next state transition of the lumped model, starting with variois values in states 0 and 6. If the entries do not add up to 1, we uniformly spread the remaining percentage in states 1 through 5. This gives us the "force field" in fig. 3.5.3. Surprisingly, an analysis of this force field is a mine of information. Although the force field is not quite accurate since we do not have a homomorphism from the seven-dimensional state space to the two-dimensional one, nevertheless, the force field gives us a good understanding of the behavior of the lumped model. The steady state point is clearly locatable. Also, we observe that there are some states which diverge quite significantly. Consequently, if a base model received enough noise to push it far enough away from the steady state, then the trajectory of the base model should follow the force field trajectory defined by the lumped model. To see whether this was actually true, we graphed several base model runs. We first simply plotted the firing level of runs with 400 neurons and 50 neighbors [fig. 3.5.4]. On the surface, this does not appear to match in any way the lumped model run [fig. 3.5.1] which is also a plot of A the firing level vs. time. However, we decided to plot the same base runs using the same two-dimensional "state space" as the

103 100 90 70 Fg,, \o 60 0 u 4J 30 10 20 30 40 50 60 %70 80 Figure 3.5.3. Lumped Model Two Dimensional Force Field (positive feedback)

25 ba 20 4) 4 15 0 10 5 5 10 15 20 25 30 35 40 45 50 Time Step Figure 3.5.4, Base Model Runs 400 Neurons 50 Neighbors (4 different runs)

105 "force field" of the lumped model. This plot [fig. 3.5.5] reveals quite clearly that once the base model is displaced from equilibrium (by noise) it tends to follow the "force field" of the lumped model. We plotted other base model runs using the same two-dimensional "state space." [Figs. 3.5.6-3.5.12] They show the qualitative similarity between base model runs and lumped model runs. We see that when the neighborhood size is reasonably large [figs. 3.5.6, 3.5.10, 3.5.11] the base model spends much of its time near equilibrium. When it is displaced by noise it follows a lumped model trajectory that is not too large. Also, it seems that there is almost a critical neighborhood size in the sense that even with 1000 neurons, if the neighborhood size is small, for example, 20, 30 and 50 neighbors [figs. 3.5.7-3.5.9] then there is always sufficient noise to kick the base model away from equilibrium. The concept of a "force field" has provided us with a convenient way to use the deterministic lumped model. It is seen that the lumped model can be used for general predictions of long term behavior. The force field also provides a qualitative conditional prediction of future base model behavior in the short term. We also see that for more accurate behavior matching we would have to consider a probabilistic lumped model that is, in an appropriate sense, a homomorphic image of the base model. We shall be able to do this in the next chapter when we apply the concepts of ergodic machines to homomorphic simplifications and consider the concept of a p-homomorphism. In the next section, however, we continue the analysis with a deterministic lumped model, relying heavily on the idea of investigating the "force field" behavior of the lumped model.

106 90 t=3 60 40 30 20 ro 10 20 30 40 % in state 0 Figure 3,5.5a Base Model Two Dimensional Runs 400 Neurons 50 Neighbors (Each point represents a time step, two different runs shown)

107 80 70 60 \O 0 so 40 W) 4O 30 20 10 10 20 30 40 % in state 0 Figure 305o5b Base Model Runs 400 Neurons 50 Neighbors (2 runs)

108 100 90 80 o\o~ 60 \ 50 4 — ) 40 30 10 20 30 % in state 0 Figure 3.5.6. Base Model Runs 400 Neurons 100 Neighbors (2 runs)

109 80 70 60 \ \ 0 50 4-\. 4 o\~ 40' 30 i 10 20 30 40 % in state 0 Figure 3.5.70 Base Model Run 1000 Neurons 20 Neighbors

110 80 70 60 b 50 4, 40 10 10 20 30 40 % in state "O': Figure "-3o50 80. Base -Model RunZT 1000 Neurons 30 Neighbors.'

111 80 o ^ \ 70 60 50 40 30 20 10 10 20 30 40 % in state 0 Figure 3o5.9. Base Model Run 1000 Neurons 50 Neighbors

112 100 90 80 70'n) 30 40 30 10 20 30 40 % in state 0 Figu.re 3, 5o10 Base Model Run 400 Neurons 60 Neighbors

113 80 70 60 40O 30 20 10 10 20 30 40 % in state 0 Figure 35 ollo Base Model Run. 800 Neurons 60 Neighbors

114 9 SO8 70 60 (2 runs) 40 20 10 10 20 30 % in state 0 Figure 3.5.12. Base Model Runs 50 Neurons 50 Neighbors (2 runs)

115 3.6 Two Blocks —A Flip-Flop Connection Pattern The base model had two blocks. The number of neurons ranged from 50 to 500 per block and the number of neighbors ranged from 20 to 50. [See below for exact values for various runs.] The total input weight to each block from the other block was held at a constant value of -100. There was no feedback from a block to itself. Parameter Settings (for each block) 1. Background: 50 2. Noise: Gaussian with mean 0 and standard deviation 20. 3. Threshold: 27e5~ Initial State The initial state varied depending on the purpose of the run. Input Segment The input segment varied depending on the run. Experiments Experiment 1 (setting the Flip-Flop) -NBRS Size 50 40 20 10 50 x x x x Experiment 2 (stability) ^NBRS Size 20 50 50 x x 100 x x 500 x x

116 Experiment 3 (reset) NBRS Size 20 50 50 x x 100 x x 500 x The purpose of this series of experiments was to examine the base-lumped modelling pair in a more complex setting than has been rreviously considered. We have seen that analytic results are extremely difficult to obtain even in the case of a single block. Thus, when we now consider two interconnected blocks, we shall rely on experimental studies of the lumped model behavior to give us insight into predicting behavior of the base model. The two blocks were interconnected, and the parameters chosen, so that a "flip-flop" behavior was obtained. Each block negatively inhibited the other block with a total input weight of -100, and had no effect on itself. Thus, for example, when Block 1 was firing at a high level and Block 2 was firing at a low level, this situation would theoretically persist until there was some external input. This firing condition constitutes the set state of the flip-flop. If, subsequently, a high level of external input was supplied to the second block, then the reverse situation should be obtained, with Block 2 firing at a high level and Block 1 firing at a low level. This state of the flitp-flop is called the reset state. Clearly, the flip-flop is able to remenber the last high level of external input that it received. Given the structure of the flip-flop there is one additional state that occurs. This is when both blocks alternately fire at a high level and then at a low level, in synchrony. We shall term this flip-flop state as lockstep.

117 We found that the lumped model indeed exhibited the three stable states of set, reset, and lockstep. Reset, of course, is simply a complementary state to the set state. Lockstep, in the lumped model, seemed to be an extremely stable state, since it took a fairly high level of input to get out of this state. A qualitative examination of base model behavior revealed no surprising effects. The base model exhibited a stable lockstep state and also could be put into the states set and reset. For the base model, however, the set and reset states were only conditionally stable. That is, after several time steps, depending on the number of neurons and neighbors, and varying with a particular probabilistic run, the base model would eventually go into the lockstep state. We shall see the reasons for this behavior, as well as other lumped model behavior, as we examine the base model using detailed considerations of lumped model behavior. The first consideration of interest is the level of continuous external input necessary to set the flip-flop. That is, starting from the lockstep state, which was Block 1 and Block 2 synchronously alternating with the percentage vectors <97.77, 0, 2.2, 0,.01, 0, 0> and <.01, 97.77, 0, 2.19,'0,.01, 0>, what level of external input is necessary to set the flip-flop? We found that the minimum continuous external input necessary to Block 1 was 47. Now, the base model lockstep state is predicted to be quite stable (more on this later on), and so we would expect that essentially the same amount of external input would be necessary to set the base model. At most, with a small number of neurons and neighbors, it might be slightly easier to set because of the random noise generated by such a confirguration. A detailed analysis of this situation was not deemed necessary. Experiment 1 shows the actual runs

118 considered. For the case of 50 or 40 neighbors, we found that an input level of 50 was necessary to set the base model, and for the case of 20 and 10 neighbors, we found that an input level of 45 was sufficient. Since we considered increments of S for the external input, the results are exactly what one might predict. A more interesting investigation using the lumped model was termed exploring the regions of stability. As has been previously mentioned, the lumped model exhibits the flip-flop states of set and lockstep. The set state occurs with Block 1 having the percentage vectrr <85.38, 14.13,.48, 0, 0, 0, 0> and Block 2 having the percentage vector <3.53, 3.53, 3.49, 3.40, 3.28, 3.16, 79.58>, in the steady state. Now, it seems that in the set state, most of the information is captured by considering the percentages in neuron states 0 and 1 for Block 1 and the percentages in neuron states 0 and 6 for -Block 2. Effectively, we are considering an approximate lumping of the original state sets to one.where there are only two neuron states in Block 1 and Block 2. Now, let us call this lumped version of the original lumped model the approximate lumped model. Suppose we now put the approximate lumped model in various initial states, that is, have the original lumped model starting in various percentages firing (i.e., in neuron state 0) and the rest of the percentage firing to be in neuron state 1 for Block 1 and neuron state 6:for Block 2. Without any external input, does the approximate lumped model go to the flip-flop state of set or reset or lockstep? Fig. 3.6.1 is a partial -summary of the results. Fig. 3.6.2 shows the entire region:considered, calculated on the basis of symmetry. The advantages of essentially only looking at neuron states O and 1 for Block 1 and O and 6 for Block 2 is that the behavior is now much easier to visualize and yet

119 100 S S S L L 95 S S S L L - 90 S S S L L +) 4-i.~ 85 S S S L L r-4 4-4.~s 80 S S S S L 4-i 75 S S S S L 0,0. 70 L L S S L 4^ 65 L L L L S L L 60 L L L L L S L L 55 L L L L L L L L 50 L L L L L L 0 5 10 15 20 25 30 35 % in state 0 for block 2 (rest initially in state 6) Figure 3o6l.1 Partial Region of Stability Graph L- Lockstep is final result S- Set or Reset is final result

120 100 90.et 80 Izem-t 70 I.oc. T-FE 80 *a 60 Iso 0 20 i) 30 10 -.es.. 10 20 30 40 50 60 70 80 90 100 % in state 0 for block 2 Figure 3,6,2, Entire Region of Stability Graph

121 captures the essential features. It is apparent that the region of stability, that is, those initial states leading to a set state (or reset state), is quite small compared to the lockstep region. Since the lumped model, starting in an initial state as indicated, essentially goes in one time step to a state that can be similarly aggregated, that is, most of the percentages are in neuron states 0 and 1 for Block 1 and neuron states 0 and 6 for Block 2, we can consider that an "approximate homomorphism" exists between the lumped model and approximate lumped model, and consequently we can get a "force field" as in fig. 3.6.3. This "force field" gives an approximate representation of how the lumped model behaves as it goes from an initial state to lockstep or an initial state to set. Although the "force field" is only approximate, nevertheless, it shows why the lockstep state is so stable. We see that once the lumped model is in lockstep, it has extreme changes of neuron states. Consequently, there is a very strong vector from one of the lockstep phases to the other. Recall that in lockstep, both blocks synchronously alternate with the approximate percentage vectors of <97.77, 0> and <.01, 97.77> for Block 1 and <97.77, 0> and <.01, 0> for Block 2. Of course, our stability diagram of fig. 3.6.1 assumes even less information, for the percentages not firing are assumed to be in neuron states 1 and 6 for Blocks land 2 respectively. In any case, since we know that the base model would be expected to follow the lumped model trajectories it is apparent that the chances of the base model deviating from lockstep sufficiently to become set are extremely remote. It would take a tremendous level of noise in just the right amounts over several time steps to get the base model into the set or reset region. This is the justification for considering lockstep to be a very stable

122 100 95 90 85 80 0 65 / / 41) 4.) 65 60 > / 55 / __________ 5 10 15 20 25 30; in state 0 for block 2 Fgr3e 3.6.3. -,Lumped Model Two Dimensional Force Field for Flip-Flop (arrow represents movement at next time step)

123 base model flip-flop state, and, in fact, the base model was never observed to have become set autonomously, once it was in lockstep. On the other hand, it is clear that if the base model is in a set state, the probabilistic nature of the base model would cause the trajectories to "wander around" the set point, or the point of equilibrium, using the effective force field as shown previously in fig. 3.6.3. From figs. 3.6.2 and 3.6.3, it would seem that it would be fairly easy for the noise of the probabilistic base model to be thrown into a region of lockstep after which it would inexorably go to the very stable lockstep state. Thus, we would predict that the base model could only be conditionally stable in the set flip-flop state and that after several time steps, the base model would slip into the lockstep state. This is, in fact, what does occur. Table 3.6.1 is a summary of the information gathered from the runs indicated in Experiment 2. The base model was run in cycles of 24 so this is the longest time period of stability that was considered. As can be seen, the effect of a small number of neuron neighbors is quite pronounced since even with 500 neurons in the base model, wih nly 20 neighbors per neuron, the base model is rarely stable for the full 24 time steps possible. On the other hand, with 50 neighbors per neuron, even 100 neurons is fairly stable. The case of 500 neurons and 50 neighbors indicates that once there is a reaonsable neighborhood size, with enough neurons, the base model very quickly begins to be extremely stable. To get a further idea as to how the base model decays, we decided to find the minimum input level to the lumped model that would unset it, once the lumped model was set. We decided to try input level of various lengths rather than a continuous input. Thus, we used an input of lengths

124 TABLE 3.6.1 BASE MODEL UNSET TIMES FROM THE SET STATE Block Size Number of Runs that Unset Number of Number of After x-y Time Steps Number of Number of Neurons Neighbors 1^4 5-8 9-12 13-16 17-20 21-24 24+ 50 20 3 4 2 0 0 1 3 50 50 1 1 1 0 1 0 6 100 20 3 9 2 1 1 1- 1 100 50 2 1 1 1 0 1 4 500 20 0 0 1 3 1 1 2 500 50 0 0 0 0 1 0; 8 NOTE: Base Model was run in periods of 24 so this is the longest possible time to stay set. 1', 2, 3, 4, and 5. After the appropriate input was applied for the proper number of time steps (1-5), it was removed, and the lumped model was observed to see whether it went into lockstep or back to the set state. Table 3.6.2 shows the minimum input of the various lengths needed to unset the lumped model. Thus, although an external input of 16 for one time step (length one) unsets the lumped model, if we apply the external input for three time steps, we need only apply an external input of magnitude 11. It may seem surprising that if the input is applied for two time steps, then an input level of 18 is necessary, but this is reasonable when we reexamine the force field of fig. 3.6.3. Since the first input displaces the lumped model away from equilibrium, the lumped model, fllowing the force field, would react by moving past the equiilibrium point in the other direction. A hit at this time step, then, would only tend to knock the lumped model closer to equilibrium rather than farther away.

125 TABLE 3.6.2 MINIMUM INPUT NECESSARY TO LUMPED MODEL TO UNSET IT Input Length Minimum Input 1 16 2 18 3 11 4 11 5 9 Given the data gathered, it might be desirable to attempt to analytically predict when the base model would unset. Unfortunately, because of the many complex trajectories that the base model might follow during unsetting, an analytical analysis is not feasible. One might expect that it would be possible to predict how frequently a single input due to noise might unset the base model, but even this is intractable analytically. We shall see in the next chapter that the way to proceed is to study a probabilistic lumped model which can properly predict the base model behavior. Experiments with such a probabilistic lumped model will yield the type of- information that is extremely difficult to obtain analytically. The final set of experiments associated with the flip-flop model, Experiment 3, investigated the "reset" behavior of the base-lumped model pair. Once the lumped model was set, it was found that a continuous input of 35 was sufficient to reset it. Thus, although a level of 47 is necessary to set the lumped model from lockstep, a lower level of external input suffices for putting it into the reset state once it is set. This is plausible, because the state trajectory followed during resetting

126 need never actually go through the true lockstep states of both blocks synchronously having the percentage vectors <97.77, 0, 2.2, 0,.01, 0, 0> and <.01, 97.77, 0, 2.2, 19, 0,.01, 0>. Upon observing the region of stability of fig. 3.6.2 and the force field of fig. 3.6.3, it appears that the specific trajectory that the lumped model would follow in resetting would be quite critical, and that small deviations might in fact lead to the lockstep state. Thus, when trying to predict base model behavior, it would seem that there would bh a well-defined fraction for which the base model would go into loc'kstep instead of getting set depending on the input level. Table 3.6.3 summarizes the information obtained for reset tests with input levels of 25 and 30. We see that resetting occurs consistently at an input level of 30 which is a little less than the level of 35 predicted by the lumped model. This again would be expected since it was found to be somewhat easier to set the base model from lockstep also. The input level of 25 shows that the effect of a small number of neighbors is again to make it easier for the flip-flop to reset, just as it was easier to initially set the flip-flop from lockstep. TABLE 3.6.3 RESET BEHAVIOR OF THE BASE MODEL FROM THE SET STATE Block Size Nuember of Number of Input = 25 Input = 30 Neurons Neighbors Reset Lockstep Reset Lockstep 50 20 37 23 49 6 50 50 20 21 43 7 100 20 47 18 62 2 100 50 14 6 26 0 500 20 31 0 20 0... i-.,.......

127 The analysis of the flip-flop case has shown that a detailed analysis of the lumped model can reveal much important information about base model behavior. Furthermore, the concepts of looking at the force field of the lumped model seems to have general usefulness, as we also had used this technique in analyzing the single block case. However, it appears that more fruitful analysis can only be obtained by a more representative lumped model than simply the deterministic lumped model that we have been considering. 3.7 Complexity It is apparent that the lumped model that we have been considering as the approximate homomorphism of a base model is much less complex than the base model. The simplicity of the lumped model relative to the base model was apparent in the computer implementation. The simulation programs for the lumped model were far simpler and shorter than tho_^qr the base model. The lumped model simulation used less computer memory and also required less computer time per simulated time step. A comparison between running times for the lumped and base model can be found in table 3.7.1 and figures 3.7.1 and 3.7.2. 3.8 Conclusion The experimental investigations show the usefulness of the concept of approximate homomorphisms, although they demonstrate the need for longer runs in the case of positive feedback to distinguish-between cyclic and aperiodic behavior. In the next chapter, we return to the formalism and show the derivation of p-homomorphic lumped models, and give some results with probabilistic lumped models for comparison. In chapter V, we look at the experimental results again in the context of chapter IV.

128 TABLE 3.7.1 COMPLEXITY ANALYSIS OF BASE MODEL VS. LUMPED MODEL (Running times for base model in second series of experiments) Neurons Neighbors 25 50 100 200 400 800 1 3.7 4.2 S.1 6.4 9.6 16.15 10 3.75 4.25 5. 55 8.2 13.4 23.75 20 3.85 5.05 6.75 10.4 17.8 32.55 40 -- 5.55 8.8 14.9 26.5 49.55 80 -- -- 12.95 23.6 42.95 NOTES: Values given are average length of time required per simulated time step (in seconds). For comparison, the lumped model requires only 2.75 seconds per timestep —regardless of the number of neurons or neighbors. We would expect the running time of the base model to be of the form T = K+N(an+b) since the main loop of the program is: Loop over all neurons (N) Computer threshold and external input Loop over neighbors of this neuron (n) Sum up inputs Compare input and threshold In fact, K = 3.2, a =.001, b =.015 gives a rather good fit. See figs. 3.7.:1 and 3.7.2

50, ^S^ so 4P 40 ^ 30 tH 20 boo mug** 10 10 20 30 40 50 60 70 80 # of neighbors Figure 3.7,1o Running Time Complexity

so~~~~~~~~~~~~~~~~~~o 50~~~~~~~~~~~~~~ 50~~~~~~~~~~ 40 ci) o 30 >^)I^ 30 G3 20 10 25 50 100 200 300 400 500 600 700 800 # of neulrons Figure 3.7o2. Running Time Complexity

CHAPTER IV PROBABILISTIC HOMOMORPHISMS AND SIMPLIFICATIONS In the previous chapter, we have shown how a probabilistic base model could be modelled by a deterministic lumped model. We claimed that the lumped model that was considered was an "homomorphic" image of an ideal base model. The ideal base model had an infinite number of components and an infinite number of neighbors per component. It was also suggested that for a finite number of components in the base model, an appropriate probabilistic lumped model could correctly simulate base model behavior. We now attempt to formalize these concepts, using the idea of an ergodic machine that has already been developed. Recall that an ergodic machine is a way to transform a probabilistic model into a deterministic one. Since we know what we mean by simplification in the deterministic context, we shall now be able to use the same idea in the probabilistic context-. We shall also see that treating the base model as an ergodic machine will give us insight into how to simplify. The first point that we investigate is the concept of a probabilistic homomorphism. 4.1 Probabilistic Homomorphisms The intuitive idea of a probabilistic homomorphism is very simple. Suppose we have a class of deterministic models corresponding to any probabilistic model A. Call this class C(A). Then, if there is a homomorphism (the usual deterministic idea) from an element of C(A) to 131

132 an element of C(B) for some probabilistic model B, then B will be considered to be a probabilistic homomorphism of A. The first thing that we must do is to delineate the class of models that we are talking about. Therefore, since the probabilistic models we are using are probabilistic automata, we shall let Abe aprobabilistic automaton. Now, we have shown in Chapter II that under very general conditions, there is a class of ergodic machines that faithfully represents A. In fact, if A has a finite state space, all that is necessary is that for each external input x, if I(x) represents the mirimum non-zero transition of the probabilistic transition matrix [p.. (x) then te mmin u(x) > 0. In Chapter II, we also investigated cross-products of ergodic machines and permutations of the seeds of the cross-product machine. We wish to expand the representative class of an automaton A by including machines with permutations. We are thus led to the following definition. Definition 4.1.1 Let E be an ergodic machine that is the cross-product of ergodic machines Ei, that is, E = iEEi where N is an at most countable set. Let {rr(t) } be a time'indexed set of permutations, with 7r(t) a permutation (bijective map) on N. Then a generalized ergodic machine is E together with {-7r(t)} and the deterministic machine that is represented is as before except that at each time step, ~(t) is applied to the vector of Seed. X:.* Notice that we are essentially allowing any arbitrary permutation at each time step. The permutation may in fact be determined by the state of E at time step.t. For convenience we repeat definition 2.10.4 at this point.

133 Definition 4.1.2 (also 2.10.4) N A faithful transformation = X C is permutation independent if i=l the information loss function is strong for every vector sequence allowing an arbitrary permutation of the seed vector <rl,...,rN> at each time step. We say that C is permutation independent if for every N, is permutation independent. Since the information loss function is strong even with permutations, it is clear that a faithful ergodic machine with permutations should also be faithful (see Definition 2.6.5). We thus have the following. Definition 4.1.3 Let an ergodic machine E with transformation X = X r for some ieN N faithfully represent an automaton A. Then, if G is permutation independent, the generalized ergodic machine <E, 7r(t)> also faithfully represents A. Definition 4.1.4 The representative class C(A) of a probabilistic automaton A is the class of generalized ergodic machines that faithfully represent A. We are now in a position to define what we mean by a probabilistic homomorphism. Definition 4.1.5 A probabilistic automaton B is a probabilistic homomorphic image of a probabilistic automaton A if there is a homomorphism h (the classical deterministic homomorphism) from an element of CCA) to an element of C(B). We shall also say that h is a probabilistic homomorphism

134 (p-homomorphism) from A to B. Our definition has captured the intuitive notion. We can effectively regard any faithful representative of the automaton A as just as good as any other. Thus, if we can simplify one representative A' homomorphically to get B' and then choose a more convenient representative B, we would want to say that B is a simplification of A. We now consider simplifying base models using these ideas. Given a base model-lumped model pairing, what we need to do is to represent each of them as (possibly generalized) ergodic machines and find {(deterministic) homomorphism between the two deterministic representatives. We shall develop the simplification results in a general context but it will be useful to keep the neural network model in mind. We shall use that model as a continuing example to clarify the exposition. Thusi, the simplifications that we shall consider will involve the percentage of components in a particular component state. For example, in the neural network model, we only aim to preserve the percent in state vector representing the recovery states. We begin by considering the infinite case base model and formalizing it as an ergodic machine. We do the same with the lumped model andthen exhibit a map h, which is a homomorphism from the state space of the base model to the state space of the lumped model. The homomorphism: of course allows us to say that the base model and lumped model are behaviorally equivalent. We shall at first assume that each components of the base model receives the same input as every other component. Such an hypothesis is justified in the neural network case if we have an infinite number of nqighbors by an application of the law of large numbers. In the case of a finite number of components per aggregation, this

135 hypothesis would be correct under some connection patterns and would be a good approximation if the number of neighbors is sufficiently large. We shall investigate in a later section what happens when this hypothesis is removed. 4.2 Simplification of the Ideal Base Model We model an individual component as an ergodic machine Mi = <Xi'Xi' iY, i 61i' li where each component is identical. Of course, the initial seeds of the components will be chosen independently from the distribution represented by Ii. We can represent an infinite base model pool as a countable cross-product of the individual ergodic machines. (We only deal with countably infinite numbers of components.) Because of our assumption about inputs, we do not have to specify the connection pattern. Instead, we assume that the input to each neuron is some function f of the state of the cross-product machine at the previous time step. Let the infinite pool be represented by M = X M.. A typical i=l 1 state of M is thus (sl,rl) x (s2,r2) x... and the transition function is defined by componentwise operation on M.. Alternatively, we may represent the state space by,the cross-product of two infinite vectors <S1,S2,S 3*...*> x <rl,r2,r3,...>. The elements of ri are of course the seeds. The initial value of the seeds is unknown to us, and different configurations of initial seeds will give rise to different pseudo state trajectories. The output function for the ideal base model is what we are interested in preserving under simplification. We have already decided that we wish to preserve the percentage of components in each component state. Consequently, to define the output function X, we must

136 decide what we mean by proportions of components in a component pseudostate s. We proceed as follows. Given the pseudo-state vector <sl,s, s3,..> we define the indicator functions Ik for each component pseudostate k by Ik(si) = 1 if si = k = 0 otherwise for k = l,...,n where n is the number of pseudo-states of a component. Ik indicates whether or not its argument is in pseudo-state k or not. Now, let Definition 4.2.1 1 N qk = lim Z I (si k - 1,...,n N-woo i=l We claim that the idea of proportions of components in a particular pseudo-state k to make sense is equivalent to the existance of this limit. We shall assume that this limit exists initially for the base model ergodic machine for every k. We will show that the limit then still exists at the next time step. The definition of X is now apparent. Given the state <,,S2',.. > x <r1,r2',.> we have I[<s1,s2,...> x <r1,r2,...>] = <ql' qn> where qk is defined above. This completes the definition of the ideal base model. In the neural network case we have: EHxple 4.2.1 We represent a neuron as an ergodic machine. Suppose we have.tl threshold function 27e-s, and a noise that is N(0,10), that is, Gausmnan with mean 0 and standard deviation 10. Then we can represent the neuron as NEURON = <X,S,l,Y,5,61, 1> where (a) X = some set of real numbers. We assume that the maximum and

137 minimum values of the input are bounded, that is, that the input is normalized. Thus X has a finite maximal element and minimal element. (b) S = {0,1,2,3,4,5,6} 0 represents the neuron firing or emitting an output pulse, 1 represents not firing for 1 time step, etc. (c) I = <R,B,P> where R is the unit interval [0,1), B is the set of Borel sets on [0,1) and P'= dx, the Lebesque measure on R: (d) Y = S as the output space. (e) ~ is a weakly mixing transformation on R that is faithful. (f) Let F be the cumulative distribution function for a N(0,10) random variable. Then, 6- (x,s,r) = 0 for F-1(r) < x - 27e= s+l otherwise unless s = 6 whence 61(x,s,r) = s (g) X,(s,r) = s Notice that since we assume a maximum possible input (and a minimum), the seed space is properly divided up so that there is a minimum measure over all possible transition seed spaces over all input values. Thus, we can find a faithful weakly mixing transformation of the multiplicative type by Theorem 2.7.1. Corresponding to an input value x, there is a well defined set A (x) which is the transition seed space from pseudo-state sl to pseudo-state s2. As (x) will be especially important. The base model neuron block can now be represented by an infinite cross-product of the NEURON defined above. Several blocks can be similarly represented, but for convenience, we consider the single block case only.' This completes the example.

138 We now turn to an examination of the formal definition of the lumped model of a single block. The lumped model will be represented by the ergodic machine M' = <X',S',',Y' T,,,A>. (a) X' represents the input set. We shall assume that since the input was identical to each component it can in fact be determined from the percent of components in a particular component pseudo-state. That is, the lumped input is precisely the identical component input of the base model. (b) S' = In where I is the real interval [0,1) and n is simply the number of base model component pseudo-states, assumed to be finite. A typical lumped model pseudo-state is thus <pP2,'...,p > where each Pi represents the percent of base model components in base component pseudo-state i. (c) I' is a probability space on a trivial one-point space, say {0}. Thus, P[{0}] = 1. (d) Y' = S' and XI<Pi,,Pn> x O] = <P.,P> (e) C' is the trivial mapping' (0) = 0. Notice that I' is still an ergodic transformation, although trivially so. (f) We need only define the transition function. In the base model, for each component, given an input x, for all pairs of states i,j, the transition seed space (Def. 2.5.1) A. (x) is a well defined set. Since the space R of the base model component is a proabbility space, we let pij(x) = Prob[Ai j(x)]. Let P(x) be the matrix [Pij.(x)]. Then, we define 6l[x,<p l**,.,pn>] <Pl,-,Pn>' P(x). Noti.ce that we have really defined a deterministic machine with

139 out any hidden components since the pseudo state is equivalent to the actual state. Since we do not use any property of the lumped model related to ergodic machines, we shall not be concerned that the pseudo state space is not countable. Before we define a homomorphism from the base model to the lumped model we consider the lumped neural network model. Example 4.2.2 We continue the example of 4.2.1 to the lumped case. Thus, H', X',1,Y',X' are as in the general case. S' = I7 and a typical state is <Po,'*,P6>' Finally, the transition function is defined using the familiar matrix of Chapter III to be POO () l-POO().. plO(x) 0 1-PlO().. 0 P(x) =.. -P20(X).* 1-p50(x) P60o(X) 0 0 l-P60(x) In this case pi0(x) is the probability of the transition seed space AiO(x). We notice that this is the set of seeds r such that F-l(r)' x-27e-s where F is as in Example 4.2.1. The probability of this set is clearly F(x-27e-). (See the matrix in section 3.4.) This concludes the example. We shall now consider the homomorphism from the ideal base model to the lumped model, both of which are represented by deterministic machines.

140 Definition 4.2.2 00 Let X (si,ri) be a state of the base model. Define the mapping i=l h to be h[ siri] = [<ql,... qn>,0] where qi is as in Definition 4.2.1. The proof that h is in fact a homomorphism is quite simple. First, however, we make precise our hypothesis on the initial values of the seeds <rl,r2,...>. Recall that we initially chose them independently and identically distributed. Equivalently, we can say that for;ny subsequence ri,r,ri,... I 2 3 N lim k E IA(ri ) = P(A) a.e. N- No k=l k i, is of course the indicator function of the set A. We assume that the above condition exists initially. We wish to show that this condition is preserved over time. Lemma 4.2.1 Suppose that the above hypothesis holds at time t. Then, if <r1,r2,..o> is the seed vector at time t, the above holds also for <C(rl).,(r2),...>, that is, for the seed vector at time t+l. Proof: We wish to show that for any subsequence r(ri),C(ri ),... 1 N 2 1 2 lim k IA[ (rik) = P(A) a.e. N —oo k=l L k N~w, 1 N lim K E IA ^i(ri = lim~ 1 1 N-+o k=l 1 k N k=l (A) k = P[- 1(A)] (by hypothesis) = P[A] ( is measure preserving)

141 Notice that we only needed the measure preserving property of the transformation for the proof. Theorem 4.2.1 The mapping h of Definition 4.2.2 is a homomorphism. Proof: Let 6 be the base model transition, 6' the lumped model transition. We must first show that 6'oh = ho6. Given <s,s...> x <rr...> by hypothesis, if rl,rl2,rl3,... are the seeds corresponding to pseudo state 1, 1 N lim N IA (x)Frl = [Al(X)] N-+w i= 1,1 That is, the proportion of neurons in pseudo state 1 that go to 1 at the next time step is just P[Al l(x)]. Similarly, the proportion in pseudo state k that go to pseudo state 1 is P[Akl (x)]. Consequently, the proportion in pseudo state 1 at the next step is well defined and is just v q P[Ak, (x)] ='q where ql represents the proportion in pseudo k=l state 1 at time step t+l. But, this is just what the first component of the lumped model is at t+l under the lumped model transition function. That is, the first component is just n n qkPk i (x) = l qkP[Ak (x)] = q' k=l k'k k=l k kl 1 The other components are similarly checked. Since the output function X is just the proportions in the pseudo states, and X' gives just the pseudo state, trivially X = XA'h. Also, all the seeds <r1,r2,...> can be mapped to 0, and thus h is a homomorphism that splits into two parts h=kl x k2, kl: S + S' and k2: R + R'. By the previous lemma, the hIkl' 1

142 hypothesis on the seeds is preserved at the next time step, thus:completing the proof. The proof of the infinite ideal case base model was quite easy. We went into detail only because the same techniques can be used to discuss the case of a finite number of components per block. We continue to assume that the input to each component is the same, for example, if each component gets input from every other component. However, now we will be able to exhibit a lumped model that is probabilistic and is a p-homomorphic image of a finite component base model. 4.3 Finite Number of Components in the Base Model We begin with a base model that is a finite cross-product of ergodic machines and reduce to an appropriate lumped model in steps. The first formal lumped model turns out to be a generalized ergodic machine. The permutations of this generalized machine depend on the state of the ergodic machine. N Thus, let the base model be represented by M = X- M where a i= 1 typical state is <sl,...,sN> x <rl,...,rN>. We assume the input space X is some bounded set as in the finite base model case. The transition function is of course a componentwise operation. We shall assume that the transformation 5 is a weakly mixing, faithful, permutation independent transformation. By the results of Chapter II, we know that, in fact, a multiplicative generator can be found that will satisfy these requirements. i[See Theorems 2.7.1, 2.10.2, 2.10.3, Def. 4.1.3.] The output functi~on A will simply calculate the proportion of components on each of the component pseudo states. We proceed to define a lumped model M.

143 (a) X' = X as in the base model (b) S' = Ran where Ra is the set of rationals (note that S' is countable. n is the number of component pseudo states). (c) I' = I as in the base model (d) Y' = S' and X' is just the pseudo state output (e)' = Actually, we wish to define a lumped model that has state set [S' x R'] mod E where E is an equivalence relation. A typical state representative will be <q,...,q> x <r',...,r> modulo an equivalence relation that we proceed to characterize. First, it will always be the case that Nqi is an integer since qi represents the proportion of components in pseudo state i. Now, the equivalence relation is such that a permutation of the first Nql seeds gives us the same equivalence class as well as a permutation of the next Nq2 seeds, etc. There is clearly no problem in defining the probabilities on the quotient space. Before completing the definition of the lumped model, we attempt to give some insight into our simplification. In the base model, since we know that seed ri is associated with pseudo state si we know which seeds correspond to the various component pseudo states. In the lumped model, however, we intend to keep only the information <ql','.qn> which represents the proportions in the component pseudo states. Now, since we are keeping all the original seeds, we will want to reorder them so we know the set of seeds that correspond to the different component pseudo states. In fact, we shall order them so that the first Nql correspond to component pseudo state 1, the second Nq2 correspond to component pseudo state 2, etc. However, once we know the group that corresponds, we shall be indifferent to the ordering of the seeds within

144 the group. Thus, we have the equivalence relation. Rather than complete the definition of the lumped model, we first define a mapping h from the base state space to the lumped state space. We then will complete the definition of the lumped model transition in a way that will clearly show that h is in fact a homomorphism. Definition 4.3.1 Given the base state space <sl,...,sN> x <rl,...,rN> aseed r. is a class k seed if s. is in component pseudo state k. Definition 4.3.2 Let the mapping h be as follows. h(<sl,...,SN> x <rl,...,N>) = (<q,... > x <r,.,r>) where N qi = Z Ii(sj) j=l and <rN,...,rN> is a permutation (reordering) of <rl,...,rN> such that r,..,rr are the first class 1 seeds with r' being the first class 1 seed in <rl,...,rN> and r' being the second, etc. Also, r' *' oedi <l,..rN ad 2 "'Nql+i'l, q +Nq are the next Nq class 2 seeds. We proceed similarly with the Nq1+Nq2 2 last Nq seeds in <r'...,rN> being class n seeds. The mapping. is clearly well defined. We now complete the definition of the lumped model. We first describe the transition on the pseudo state part <ql,...,q' and then on the seed part <rl,.,rN>. Recall that the infinite case base model used a matrix [Pij] for the transition where P[Aij(x)] = pij and thus pij was the proportion. of components that one expected to go from pseudo state i to pseudo state j.

145 In the finite case, we can actually find out how many components do this. Clearly, since we know which set are class i seeds, that is, the ith group of <ri,...,rn>, it is an easy matter to check how many of these fall in the set A.. (x). Remember that Ai j(x) is a well-defined set and is known, a priori. Definition 4.3.3 Let the lumped model state at time t be q(t) x r' (t) = <ql(t),...,q (t)> x <r(t),...,r.(t)>. Suppose the input is x(t). Let ni~i I. (x(t)) Ir] Nqi(t) t C class i i,j We let Q(t) = [qi,j[x(t)] Then define the next pseudo state q(t+l) to be q(t+1) = q(t)-Q(t) Notice that if the initial seeds are chosen i.i.d. then lim q = p.. a.e. N-o 1)J 1J where Pij is the ideal base model matrix entry. In the above definition, if qi(t) = 0 then qi,j[x(t)] is not well defined. In this case, it is easy to check that we can define qi j[x(t)] = 0 since from the transition function, any fixed value will 1,3 result in the same q(t+l). It remains to define the transformation of the seed space. Given the seed vector r' (t) = <rl(t),...,r (t)>, we wish to define r'(t+l). Ordinarily, we would simply apply the transformation r componentwise. However, in this case, we wish to make sure that at the next time step t+l, the first Nql(t+l) seeds correspond to class 1 seeds, the next Nq2(t+l) correspond to class 2, etc. The way to do this simply involves permuting the seeds before applying the transformation e.

146 Definition 4.3.4 A seed r is a class m to n seed if it is a class m seed and IAn(x) [r] = 1. (Notice that a class m to n seed should actually be a function of the input x; we assume that the definition defines when a seed r(t) is a class m to n seed.) The above definition simply says that at the next time step, the pseudo state Corresponding to seed r will be in component pseudo state n. Definition 4.3.5 Let the lumped model state at time t be q(t) x r' (t) = <ql (t),...,qn(t)> x <rx(t),...,rA(t)>. From the vector r'(t), choose class 1 to 1 seeds and place them at the head of a list. Next choose class 2 to 1 seeds and place them after the class 1 to 1 seeds. Continuea this way until class n to 1 seeds have been put on the list. Next, choose class 1 to 2 seeds, class 2 to 2 seeds,..., class n to 2 seeds,..., class n to n seeds. Apply i to this last to get r'(t+l). Definition 4.3.3 and 4.3.5 constitute the transition function of the lumped model. Notice that since we are applying a permutation'r(t) at each time step to the seed vector, the lumped model is a generalized ergodic machine. The following simple example in the neural network case should clarify any source of confusion. Example 4.3.1 In thfe neural network case, there were seven recovery states 0,...,6 and a neuron either went to recovery state 0 (fired) or it dropped to the next recovery state. Suppose we have a vector [rl,...,r,]. Suppose r5' aid r' are; class 0 seeds that fire and rll is a class 3 seed

147 that fires and no other seeds fire. Then, since the remaining seeds trivially drop to the next recovery state, the permutation gives us 5 6 11r' r4r7 rlOrl r] Finally we apply componentwise to get the seed vector at the next time step of [ (rp), (r ), C(rli )'''...,S(r~)]. We now show that the mapping h of Definition 4.3.2 is in fact a homomorphism from the base model to the lumped model. Theorem 4.3.1 Let the base model state be s(t) x r(t) = <sl(t),...,sN(t)> x <rl(t),...,rN(t)> and the corresponding lumped model state be q(t) x r'(t) = <q,(t),...,q (t)> x <r(t),...,r'(t)> under the mapping h as in ~n ~N Definition 4.3.2. Then, h is a homomorphism. Proof: Given the input x(t), we must show that 6oh = hod and X = X'oh. Consider 6'[h[s(t) x r(t)], x(t)] = q(t+l) x r'(t+l). We have, qi(t+l) = qk(t)qki(x(t)) (Def. 4.3.3) Z qk (t) Nq A (())[(t ) k t) Nqk(t) t class k IAk,i x(t) Nk te class k ki((t))[ t)] = 1 k (number class k to i seeds at time t) = * (number class i seeds at time t+1) Nk 1 ~ (number class i seeds at time t+l) 1 N = Ii s (t+l)) j=l Thus, we see that by Definition 4.3.2, h is a homomorphism on the pseudo state space. We now check that h is a homomorphism for the seed vector. Consider r(t+l). Some elements of this vector are class i seeds. We

148 wish to ascertain that in h[r(t+l)] these seeds appear as the ith group. Now, let ri (t+l),...,r i(t+l) be the class i seeds. Under the base 1 L model transition this means that ri (t),...,ri (t) is a union of class j to i seeds where j ranges over all component pseudo states. Now, r'(t) is just r(t) permuted, so these ri (t),...,riL(t) appear in r'(t) also although possibly reordered. By the lumped model transition, however, this set of U (class j to i seeds) is put in the ith group at the next time step. [See Def. 4.3.5] Thus, the same seeds appear in the ith group although possibly permuted. Since the lumped model has an equivalence relation that identifies such permutation classes, we see that h is a homomorphism on the seed space. Finally, it can immediately be checked that h preserves the output function. Thus, h is, in fact, a homomorphism. Notice that although we have shown that h is a homomorphism, we cannot in this case factor h. as h = k x k since we need to know the pseudo state space to decide how the seeds are to be mapped. Example 4.3.2 Consider a neural network example with four recovery states 0,1,2, and 3. Let the base model have six neurons with the state at time t to be: [0,1,2,0,0,2] x [rlr2,r3,r4,r5,r6] The corresponding lumped model state is clearly [1/2,1/6,1/3,0] x [r1,r4,r5, r2,r3,r6] Suppose that the input causes neurons 2, 4, and 6 to fire (go to state 0) in the base model. Then, the next state of the base model is [1,0,3,0,1,0] x ~[rl r2,r3,r4,r5,r6]

149 Under the mapping h, the lumped model state that corresponds to this state is [1/2,1/3,0,1/6] x c[r2,r4,r6,rl,r5,r3] Under the lumped model transition, however, we get [1/2,1/3,0,1/6] x [r4,r2',r6,rlr5,r3] We see that h is a homomorphism because of the equivalence relation we have defined on the seed space. Notice that there is in fact a permutation of the seed space. Call the base model and lumped model that we have so far considered B and L respectively. It is clear that the complexity of L has hardly been reduced. Part of the problem is that we are required to permute the seed list before applying the transformation r. Clearly, this is something that intuitively we feel we need not do, since the seeds should not be affected by permutations. However, suppose we consider a lumped model L' for which we don't permute but only apply r componentwise? Theorem 4.3.2 Let L' be the lumped model that uses definition 4.3.3 for the transition on the pseudo state space and for a seed vector r(t), the transition is simply r(r(t)) = r(t+l) = <~(r (t)), 4(r (t)),...,r(rN(t))>. That is, the transition is a componentwise operation of G without permutations. Then, if B is the base model as above, L' is a probabilistic homomorphic image of B. Proof: It is clear from the definitions that both B and L' are ergodic machines. Furthermore, the representative class of L' is the same as

150 the representative class of L. This follows since the difference between L and L' is only that one has a set of time indexed permutations {rT(t)} operating at each step. Since G is assumed to be permutation independent, both faithfully represent the associated automaton. We have shown in Theorem 4.3.1 that there exists a homomorphism h from B to L. Consequently, h is a p-homomorphism from B to L'. (See Definitions 4.1.3-4.1.5) We now consider a further simplification. So far, we have a lumped model L' that uses N seeds where N is the number of components. We shall see that we can find a lumped model L" that is in the same representative class as L but that uses at most n2 seeds where n is the number of component pseudo states. To make things easier we recall the transition function of L'. A typical state is <q,...,qn> <rl,...,rN> and the transition function is defined as follows: The new pseudo state vector is; q(t 1) = q(t)-Q(t) where the matrix Q is determined by Z.IA. (x(t) [r~] t) class i i.X()[r qi,j(t) = -. 1,~J ~Nqi(t) The new seed vector is simply < (r ),..., (rN)> The complexity of this model is due to the fact that we are using N seeds operating in parallel, plus the fact that we must actually determine to which class a seed transitions. We seek a model that also faithfully represents the same associated automaton as L' does but that has fewer seeds in general. To clarify the derivation we first consider an extension of the. 1

151 lumped model L' by including a redundant matrix as part of the state space. Let K, (t) l= c IA..(X(I))[rI Define the matrix 1>3 ~ tE class i i^j^ K(t) to be [K..(t)]. Then, we can define the extended state space as 1,J <ql ~***.qq^ x K x T<r1,. Using this information, it is easy to see that the matrix Q(t) can be defined as K 1, 2. Kn 1,1 __ _,n Nql Nql Nqj K K 2,1 2,2... Nq2 Nq2 Nq2 Kn~ K K n,1 n,2... n,n Nq Nqn Nq n n n Furthermore, it is easy to see that q (t+l) K. (t) Note that K. clearly depends on both the vector <ql,...,q > as well as <r1,...,rN>. The interesting point is that we can view the method of calculating K.j as simply one way of implementing the calculation of a random variable with a specific distribution. In fact, it is apparent that since the transition seed space A..(x(t)) has a well defined probabil-,,J ity, say p.. (t), we can view the calculation of the vector K.,lK.,Y...,K. as defining a multinomial distribution. That is, (. (Nq.)! a a ProbQ(.. = a. for j=l,...anJ i. Pa....Pn, The reason { i>3 J j i,,~1an i, n,i

152 this is true is that L' was represented faithfully so this is the Qnly information we retain about the pseudo states and seeds since we originally chose the initial seeds i.i.d. Since the vectors <K.. K. > are generated multinomially, there is no reason not to use some other method for the calculation if this is convenient, as long as the probabilities of the K1,j remain the same. Thus, we have shown that we can use the n2K. values instead of the original N seeds in our calculations. Notice that all the Kij are not independent and thus we would use fewer than n2 values in general. The following example should clarify the situation.'Example 4.3.3 \ In the neural network model we see that the matrix K is simply K K __ —_. —.. 0 1 I NqO V Nq, I K (1 ). Nq Nq / K We also have 16 q0(t+l) = z K. i=O K ql(t+l) = qO(t) - T *~Ct~l) K, q5s(t+l) = q4,(t) - N

153 K K q6(t+l) = q5 - + q 6 Clearly, we need only generate the seven seed values <Ko0,... K>. Furthermore, K. is Binomial (Nqi Pio) where Pio = F(x - 27ei1) and F is Normal (0,10). Ki being Binomial (Nqi, PiO) means that Prob[K. = j] =( )p/ (1 PPi Notice that Pi0 is simply the theoretical entry in the transition matrix for the ideal infinite number of neurons case. We can in fact consider the count K. as using a rejection method for generating the value of a random variable that is distributed Binomial(Nq3, Pio). Recall that a rejection method for calculating a binomially distributed, Binomial(N,p) random variable X says to take N uniformly distributed random variables between [0,1] and count the number that are less than or equal to p. In this example, we have found a lumped model L" that represents the same associated automaton that L' does. Since we can find a faithful transformation r" on L", we see that L' and L" are in the same representation class. Notice the simplicity of the final lumped model that we have obtained. Given a pseudo state <q0,...,q6>, the next values of the pseudo state are - (- Ki Nqo —Ko...,Nq4-K4,Nq5+Nq6 -K5-K, N\i=0''"Nq4 It can also be shown that the expected value of this next pseudo state is exactly the next pseudo state of the lumped model corresponding to the ideal base model case. The exposition of this section and this last example have made clear that in fact, there exists a probabilistic homomorphism from B

154 to L". We consequently have the next theorem. Theorem 4.3.3 Given the base model B, there exists a lumped model L" that is a probabilistic homomorphic image of B. L" has at most n2 seeds in parallel in the seed space vector, where n is the number of component states. Proof: The proof was in the derivation of this section. For an appropriate C", L" is a faithful ergodic machine in the same represntation class as L'. Since a p-homomorphism from B to L' exists (Theorem 4.3.2) a p-homomorphism from B to L" also exists. Our simplification has yielded a lumped model that can be implemented on the computer. Since, for the neuron example, we need only seven seeds operating in parallel, the model is much simpler than the original base model. Since the model as implemented is really the corresponding probabilistic automaton, we call it the probabilistic lumped model. 4.4 A Probabilistic Lumped Model In the previous section (Example 4.3.3), we derived a probabilistic lumped model that should exhibit the same trajectory behavior as the base model of Chapter III. The probabilistic lumped model of course assawes that each neuron receives identical input and consequently, the model is more accurate wherever the number of neighbors per neuron approaches the number of neurons in a block. In fact, whenever we consider a base model such as SO neurons and 50 neighbors per neuron in the single block studies, the probabilistic lumped model exactly models

155 the base model. In order to compare base model and probabilistic lumped model runs, we expanded the simulation package of Chapter III. In the actual computer implementation, it turned out to be easier to in fact use a rejection method to generate the binomial distributions. However, when generating Binomial(N,p) for large values of N, it is possible to use the normal approximation K = Z VNp p) + Np where Z is standard normal N(0,1). This was done to speed up the computations of the binomially distributed values. The first consideration of interest was to examine the behavior of the probabilistic lumped model when there was Positive Feedback and we were considering a single block. The comparison is with the base model runs of section 3.5 and, in particular, comparisons with figs. 3.5.5-3.5.12. In those figures, we saw how the base model less frequently deviated along the "force field" as the number of neurons and number of neighbors increased. Figs. 4.4.1-4.4.4 are probabilistic lumped model runs of 50, 100, 200, and 400 neurons respectively. Because of the assumption of identical input, these correspond really to 50 neurons 50 neighbors, 100 neurons 100 neighbors, etc. Of course, once the base model has a sufficient number of neighbors, the identical input assumption is fairly good anyways. We also made two additional base model runs for additional comparisons, and these are shown in figs. 4.4.5 and 4.4.6. The similarity between the base model and probabilistic lumped model is quite striking, especially comparing fig. 3.5.12 with fig. 4.4.1 and comparing fig. 4.4.2 with fig. 4.4.5. The next situation of interest was to consider the flip-flop

156 90 80 70 60 i 40 40 / 30 20. 10 10 20 30 40 % in state 0 Figure 4.4.1. Probabilistic Lumped Model Run 50 Neurons (2 runs)

157 100 90 80 70 an 60 4 50 41' 40 10 20 30 40 % in state 0 Figure 4.4.2. Probabilistic Lumped Model Run 100 Neurons

158 100 -95 90 85 4i X 80 o,, 75 5 70 660 6 10 15 20 25 % in state 0 Figure 4.4.3. Probabilistic Lumped Model Run 200 Neurons

159 100 90 O0 90 4 80 q, 60 60. _, —------;....-.... 10 20 30 % in state 0 Figure 4.4.4. Probabilistic Lumped Model Run 400 Neurons

16.0 100 95 90 85, 80 75 70 65 60 55 50 45 _, 5 10 15 20 25 % in state 0 Figure 4.4.5. Base Model Run 100 Neurons 100 Neighbors

161 100 95 90 85 80 75 70 65 60 55 45 40 5 10 15 20 25 30 % in state 0 Figure 4.4.6. Base Model Run 200 Neurons 100 Neighbors

162 example of section 3.6. In that section we considered three experiments with the flip-flop: (1) Setting the Flip-Flop from Lockstep, (2) Stability of the Set State, and (3) Resetting the Flip-Flop. The behavior of the probabilistic lumped model in setting from the lockstep mode was as predicted. An input level of 50 would set the flip-flop whereas an input level of 40 would not. An input level of 45 also would not set it in general although occasionally it might. This is the same behavior that our tests with the base model with 50 neurons per block and 50 or 40 neighbors per neuron in the other block resu-lted in. For the probabilistic lumped model, we used 50 neurons per block. In Chapter III, table 3.6.1 gives the information on the stability of the base model flip-flop state. Table 4.4.1 gives.the results of the two different lumped model experiments done for this case. It also includes the base model run as well as an additional base model run for comparison. Finally, we consider the case of resetting the flip-flop. Table 3.6.3 shows the reset behavior of the base model. The orresponding table for the case of the probabilistic lumped model behavior is table 4.4.2. Since there are not really enough probabilistic lumped model runs, we see that the correspondence is fairly good. The statistics of this section have shown that the probabilistic lumped model does reflect base model behavior as it should. However, the statistics also point out the importance of a theoretical derivation of the lumped model correspondence, since, for small numbers of runs, particular statistics could be misleading.

163 TABLE 4.4.1 PROBABILISTIC LUMPED STABILITY VS. BASE MODEL STABILITY Ptobabiztic Lumped Modet Block Size, Numbe f Nmber of Runs that Unset After x-y Time Steps Neurons 1-4 5-8 9-12 13-16 17-20 21-24 24+ 5O 0 1 1 1 0 0 3 100 0 1 1 1 0 0 18 Bs e Mode2 Block Size,.. -Numberof ber f Nber of Runs that Unset After x-y Time Steps Number of Number Of Neurons Neighbors 1-4 5-8 9-12 13-16 17-20 21-24 24+ 50 20 3 4 2 0 0 1 3 50 50 1 1 1 0 1 0 6 100 20 3 9 2 1 1 1 1 100 50 2 1 1 1 0 1 4 100 100 0 3 4 1 1 1 22 500 20 0 0 1 3 1 1 2 500 50 0 0 0 0 1 0 8

164 TABLE 4.4.2 PROBABILISTIC LUMPED RESET VS. BASE MODEL RESET PuobabitLstic Lumped ModeZ Block Size Number of Input = 25 Input 30 Number of Neurons Reset Lockstep Reset Lockstep 50 13 7 14 5 100 23 6 5 1 500 9 1 Sas e Mode2 Block Size Number of Number of Input = 25 Input = 30 Neurons Neighbors Reset Lockstep Reset Lockstep 50 20 37 23 49 6 50 50 - 20 21 43 7 100 20 47 18 62 2 100 50 14 6 26 0 100 100 18 8 500 20 31 0 20 0

165 4.5 Finite Number of Neighbors We now turn to a consideration of the situation that obtains when each component of a block of the base model cannot be expected to get the identical value of input. That is, we are considering the case when we have a connection pattern such that the different base model components get differing input. We shall strive for understandability of the simplification ideas in this section and will not attempt to be totally precise. Again, we examine the case of an aggregation to a single block and consider the input to be from this block also. The generalization to the many block case is not difficult. Since each component will get input as a function of its neighborhood set, we shall explicitly represent the connection pattern. First, however, we represent the state space of our base model. Suppose N components are in the base model. Then, the state of the base model N can be represented by X (si,ri) where si is the component state and ri i=l is the noise as we have previously described these values. The input can be determined knowing the function of the state that defines the input and knowing the connection pattern. However, we can be redundant and include it explicitly as part of the state set. Thus, the state N will be X (si,ri,xi) where the vector <x1,...,xN> will be called the i=l input vector. For simplicity, we only consider the situation in which certain component states can be thought of as emitting a pulse and the input to a component is the sum of the connection weights of the neighbors that are emitting a pulse. This corresponds to considering the input to a neuron in a neural net example as being determined by the sum of synapse weights of its neighbors that are firing.

166 Definition 4.5.1 N Given the state X (si,ri,xi) and thus the input vector x = <xl, i=l...,XN>, a vector x' is a strong permutation of x if it is a permutation of x such that values corresponding to any fixed component state are also permuted. Example 4.5.1 Consider an example with two component states 0 and 1, and 5 components in the base model. Let the pseudo state vector be <0,1,0,0.> with input values x = <5,10,15,20,25> (notice that we have both cates emitting a pulse and varying connection weights). Then, x' = <20,25,15, 5,10> would be a strong permutation since the subvectors <5,15-,20> and <10,25> are permuted. However <10,5,15,20,25> would not be a strong permutation. The idea of a strong permutation is that the distribution of values corresponding to each component state remains the same under a strong permutation. Now if we generate a strong permutation of x, rather than x itself, we can claim that the lumped model percent in state i vector is the same for both inputs. Proposition 4.5.1 N Let B be an ergodic machine with state X (si,ri,xi) and suppose i=1 at' time step t0, instead of using x(t0) we use the input vector x' (to) = 1r(t0)ox(to) where w is a strong permutation. Then, there is a generaliz(ed ergodic machine B' that uses x' at time to and that has identical percent in state vector through time step t0+1. Proof: B' is just B through time step t0-1. At to, apply the

167 permutation r(to) to the seed vector <r1,...,rN>. Now, because x1 is a strong permutation, corresponding to any component state i, we have identical seeds and distribution of inputs, and thus, the next percent in state vector is the same for B and B'. It is important to note that the above works for one time step only. The problem at the next time step is that although we have the same percent in state vector, the pseudo state vector has been permuted. Since the connection pattern is fixed, that is, the input wires are the same and do not change, we have no idea of how the input vector corresponding to the permuted pseudo state compares with the input vector corresponding to the original pseudo state. We shall see that if we hypothesize that regardless of the permutation of the pseudo state, the input vector has an identical distribution in corresponding component states, then we can effect a simplification that is preserved over all time steps. First however we investigate the explicit representation of the connection pattern. We shall find it convenient to represent the connection pattern as a matrix M where M = [mij] is an NxN matrix and mij represents the connection strength from the jth component to the ith component. Given the pseudo state vector s = <sl,...,sN> let f be the mapping that determines whether or not si is emitting a pulse or not and if it is we let f(Si) = 1 otherwise f(si) = O. It is clear that the vector f(s) is a vector composed of l's and O's and has length N. Furthermore, if x is the input vector, we see that we can represent x as follows: x = M-f(s) Example 4.5.2 Consider a neuron net example with recovery states 0,1, and 2

168 and let 0 be the only firing state. Suppose there are five neurons in the base model with the neighbors of neuron one being four and five, the neighbors of neuron three being one and no other connections. Suppose the synapseweights are identically 3. Then, the connection matrix is 0 0 0 3 3 O O O O O M 3 0 0 0 0 O O O O O O O O O O Consider a typical pseudo state s = <1,0,2,0,2>. Then f(s) = <0,1,0,1,0> and the input vector x is just <3,0,0,0,0>. We are now ready to give a sufficient condition for a lumped model that is a p-homomorphic image of the base model. Essentially,, what we need is that the connection matrix M be such that a permutation of the pseudo state vector T, (s) should give an input vector that is a strong permutation of the input vector zr (x) for all permutations it. Proposition 4.5.2 Suppose that for every permutation'!, there exists a strong permutation T2 such that the following diagram commutes for every pseudo state s. Mof S _I.... X IT,7T 2 ~1' 1 Mof St......'.- -L Xt Then there exists a p-homomorphic lumped model with the pseudo state as the percent in state vector.

169 Proof: We have really provided sufficient conditions for the generalization of Proposition 4.5.1. Since at each time step, knowing the percent in state vector, we can discover the distribution of the inputs, we can appropriately permute the seeds for a homomorphism. The hypotheses of this proposition are the sufficient conditions for the input distributions to be determined by the percent in state vector only, since we can in fact choose any pseudo state vector that has the same percent in state values and apply it to M to get the inputs. The formalism of the above should not obscure the essential simplicity of the concepts. We want to be able to say that after one time step, as in Proposition 4.5.1, even if we have a permuted pseudo state vector, the input distributions are still the same for each component state. Thus, we need only retain the percent in state information to calculate these input values. Much of the rest of our discussion will be an attempt to show under what conditions the matrix M can satisfy the hypotheses of Proposition 4.5.2. Example 4.5.3 Consider a connection pattern such that each neuron is connected to every other neuron but is not connected to itself and the connection weight is identically a. Then, the resulting matrix M for the neural network examples satisfies Proposition 4.5.2 as can easily be checked. Notice that in this case, although the values of the input are identical for all neurons in a particular component state, for different component states the values may not be the same. We show this for a simple case. Suppose we have recovery states 0,1 and 2 and we consider four neurons.

170 Let the pseudo state be s = <0,0,1,2>. Then f(s) - <1,1,0,0>. Since 0 a a a a 0 a a M =: 1 a a 0 a a a a 0 we see that M-f(s) = x = <a,a,2a,2a>. Similarly, if we consider the permutation it of the state set to <1,2,0,0> we have M.f(il(s)) = <2a,2a, a,a>. Since 7lox = Mf(1lt(s)) we see that the strong permutation for the state s and permutation tl is simply a2 = identity. Notice that for the lumpad model, all that we would need is the percent in state vector and the matrix M. The matrix M is important since it allows us to calculate the distribution of the input values. Although the hypotheses seem strong, we shall see that they actually obtain in the limiting case of an infinite number of compo ents as long as the connection matrix is sufficiently "random". We of course continue to have only a finite number of neighbors per component. We shall now consider the following matrix as an example of a "random" connection matrix. Suppose there are three neighbors per component. Let the connection weight be identically a. Consider the following infinite matrix. 0 a a a 0 0.0 0 0 0 0.. OOOOaaaOOOOOO... takes input frm neurons that are the next three farter..Clear this form0 0 0 0 0 0 0 ari a a 0 0 0...eudo M = Essentially, each row, which represents the connection to other neurons, this form on~ly ~o-3rks when *he matrix is infinite. Given any psaeudo

171 state s, this determines an input x as before. Later on, we shall make some comments on the "randomness" of this matrix. Let us attempt to intuitively clarify what we shall show more formally below. We assume that we have a base model with a large number of components and a finite number of neighbors per component. Assume that the connection pattern is fixed and is in a sense "random". Then, we would say that although we are running the base model with fixed connections, most runs can be treated as if the connections were redefined at each time step and consequently, a lumped model that ignores the connection pattern is sufficient to represent this base model for most base model runs. That is, we can imagine the sampling of neighbors at each time step, randomly and independently, as giving the same results as we get by the sampling determined by our fixed connection pattern. There will be two steps in the argument. First, we will want to say that most initial pseudo states will give rise to the same distribution of inputs. Secondly, we want to say that the transition to the next state will result in a pseudo state that is still of this type. The distribution of inputs will of course be able to be calculated by knowing only the percentage in state vector or the distribution of the components in the various component states. We now consider the more formal arguments. Let n be the number of component states. Let k be the number of neighbors per component. Let N be the number of components. We shall let XY be the function space of all mapping from Y + X. Suppose for every value N, we let SN be the set of all pseudo states corresponding to N components and we let MN correspond to some fixed connection pattern. We wish to define a mapping ( from the set of pseudo states SN to

172 the n-dimensional function space [0,1]j where I is the set of integers. Definition 4.5.2 Let SN be the set of pseudo states with a typical pseudo state <s,.,sN> and a typical input vector <x,...,x> as determined by MN. Then, we can define a mapping 4 by,: SN + [0,1] where ((<s,...,sN>) = ~?= (l'''" n) and pi: I - [0,1]. - i calculates the percentge of input values corresponding to component state i that have input values of 0,1,2,....,k. (We assume that input can be normalized to 1 and so these are the only input values.) pi is 0 for all other integers. The above definition simply defines how the input values are distributed corresponding to each component state. We give.an example that will clarify the definition. Example 4.5.4 Let us consider a neuron net example with recovery states 0,1, and 2. Suppose there are six neurons in the base model and three neighbors per neuron. Then, we may assume that the possible input values are 0,1,J2 or 3. Let a pseudo state s = <0,1,2,0,0,2> with a corresponding input vector x = <1,2,3,1,2,0>. We can easily calculate the distribution i. for each component state. 0(0) = 0 0o(1) = 2/3 0(2) = 1/3 0(3) = 0 P1 (0) = 0 i1(1) = 0 1(2) = 1 i1(3) = 0;(e)) = 1/2 2(1) = 0 2 (2) = 0 2(3) = -/2 Before showing how the concept of the j map relates to our previous concept of strong permutations, we generalize the notion of strong permutation to the ps~eudo states. (See Definition 4.5.1.)

173 Definition 4.5.3 Two pseudo states s and s' are strong permutations of each other if there exists a permutation T1 such that s' = T1os and there exists a strong permutation T2 such that the diagram of Proposition 4.5.2 commutes. The above definition simply says that the corresponding input vectors x and x' have the property that rlox is a strong permutation of X. Proposition 4.5.3 Let s and s' be two pseudo states that are permuations of each other. Then s and s' are strong permuations if and only if f(s) = <(s'). Proof: -- Let s and s' be strong permutations. That is s' = flos and x' and Trox are strong permutations. But, this implies that x' and Irox have the same distribution. Thus, s and s' have the same input distributions in corresponding component states, or K(s) (= (s'). = Assume j(s) = c(s'). Since we know s' is a permuation of s, s' = rlos we need only show that it is a strong permutation. But this is obvious since the two input distributions are the same for any component state, that is, x' and IT1ox are strong permutations. What we now intend to show is that almost all states will map to a particular subset of points under the p mapping. We will call this subset of image points the ideal points. Definition 4.5.4 Let <ql,...,q > be a pseudo state vector of the lumped model, that is, it corresponds to a percentage in state vector of the base

174 model. Then SN/<q,...,qn> is the set of pseudo states of SN that have <q,...,q > as their percent in state vector. Note, that this partitions the space SN and each partition class is a class of permutations. We now wish to associate a specific point in the function, space with the lumped pseudo state vector <ql',..qn' For this purpose, we actually need the percentage of components emitting a pulse. We shall assume that the function f takes the vector <q,...' qn> and calculates a value p which is the percent emitting a pulse, or the number of components firing. Then, Definition 4.5.5 Let <ql',,,.q > be a pseudo state vector of the lumped model and let p be the percent emitting a pulse. Then, the ideal point corresponding to <ql.,,qn > is simply = (km) p(l-p)-m for m = O,...,k and for i = l,...,n. Notice that i = <^,..., > will have all entries the same function when i is the ideal point. The ideal point simply says that we are looking at the distribution of inputs whpn we have an infinite number of components and truly a random connection pattern. For such a situation, the ideal point is the distribution that one would expect, knowing the percent emitting a pulse and knowing that there are k neighbors per neuron and assuming random sampling. We- now want to let the number of components approach c and see what happens. Suppose we let So be the set of countably infinite sequences <S1,2',s3,...> and we let M = <V,V,...> be the random matrix as before whsose first few rows are

175 V1 = 1 1 1... 1 0 0 0 0 k times V = 0 0 0 0.. 1 1.. 0 0 0. k times k times V = [k(n-l)+l] O's followed by k l's followed by O's. Notice that this is the generalization to k neighbors of the matrix that we previously defined where we have normalized the input to 1. Now, what can we mean by the class of permutations corresponding to a lumped vector <q1,...,qn>? We proceed as follows. We assume that the set SO is generated by choosing each component state i.i.d. from the distribution <qu,...,qn>. By the law of large numbers, the set of infinite sequences that can be said to have the distribution <q,. n, > as their percentage vector, has measure 1. That is, we have converted the set of pseudo states into a probability space. If the set of pseudo states is represented by Q, by choosing the state of each component i.i.d. from the distribution <ql,...,q>,we have imposed a probability measure P on subsets of Q. Notice that a point wet is just a pseudo state w - <S1,S 2 s 3-.>. Since the set of pseudo states is just a probability space we can proceed to define a sequence of random variables on a corresponding to each component state i. Definition 4.5.6 For each component state.i, we define the sequence of random variables Wl, W2, W3,... on S as follows. Given an w = <,s...>,

176 W1 finds the first component in component state i, say it is nl, and determines the input that corresponds by applying Vn to the value f(w) 1 [see example 4.5.2]. That is, W1(w) = Vn *f(w). The V's are of course from the connection matrix M. Similarly W2 finds the second component in component state i, say n2 and does the same thing. In general Wi(W) = Vn.f(). 3 J Lemma 4.5.1 Let M be the matrix M = <V,V2,...> as defined above and cjnsidex the random variables W1i W12 W1. defined in 2 as above. Then, these random variables are independent and identically distributed. Proof: It is clear that the W1 are identically distributed since they each examine a set of k neighbors or components of the pseudo state wEsf. Since we assume that the component states are chosen i.i.d., it is clear that any k neighbors will give the same distribution. Furthermore, if 1 i i i we take any subset W, W, W,..., Wi of the random variables, since 1i 32 33 n th-ey examine disjoint neighborhoods (recall the definition of M), they are independent. Theorem 4.5.1 If we choose the pseudo state by choosing each component i.i.d. with the distribution <q,...,q > and we assume the connection matrix M = <V1,V2,...> as above, then the set of pseudo states that map to the ideal point corresponding to <ql,..,q> has measure 1. Proof: For the proof, we need to show that the distribution of input values corresponding to any component state for any pseudo state

177 w = <s 1,s2,s3,.> is identical almost everywhere. Translated, we need only show that the values Wi ()), W2 (w),... are identically distributed. But, this is not difficult. Since the random variables W, W2, W... are i.i.d. we can assume that they are generated by some ergodic process [Breiman (8), p. 107]. That is, WCo) = W (9 (w)) for some 5 an ergodic transformation. Now, by the ergodic theorem [Breiman (8), p. 111] N-1 N-l. r lim - W() = 1 Wi(J() = W1dP a.e. N-*o j: O jo That is to say, the average of the elements Wi(w) is the same for all w almost surely. Similarly, the higher moments are the same which proves that for any w, it is almost surely true that we get the same distribution of inputs. It is easy enough to check that the distribution of W is just the ideal point. Thus, the theorem is proved. Definition 4.5.7 Let the subset of permutations of S/<q,,...q > that map into the ideal point corresponding to <ql,...,qn> under the mapping p be called proper permutations. Theorem 4.5.2 If we look at the permutation classes as N+-o most permutations corresponding to a lumped model pseudo state are strong permutations of each other. Proof: By the previous theorem, most permutations corresponding to a lumped model pseudo state are proper. By Proposition 4.5.3, it is clear that proper permutations are strong.

178 We can now use Proposition 4.5.2 to see that the lumped model is a p-homomorphic image of the base model. We need only check that the condition that we perceive the pseudo state as being generated by i.i.d. choice of component states still holds at the next time step. But this is fairly easy to see. It is apparent that the distribution generated by the input values are not only identically distributed, but can also be considered to be generated i.i.d. Thus, the next pseudo states can also be considered to be generated i.i.d. Essentially, we can look at this more precisely by saying that the ensemble of states at time step t+l cannot have a strange distribution without there being a high probability of generating this strange distribution. Notice that the form of the matrix depended only on the fact that i i the random variables W1, W2,..., were i.i.d. If we can guarantee this result using some other connection matrix M, then this matrix would satisfy our concept of being random. We see that we are basically requiring the neighborhood sets to be disjoint and to not contain; the cor-e ponent itself as a neighbor. For large numbers of components, we can thus confidently use the lumped model. Rather than most permutations mapping exactly to the ideal point, they would map close to the ideal point. Since our original base model has independent noise generated at each time step, the fact that they do. not map exactly to the ideal point can be subsumed as part of the noise. Thus, we would expect that a probabilistic lumped model with some. addition:al noise as generated due to the small number of neighbors would appropriately model the base model. The runs of Chapter III essentially show that nothing unexpected happens due to finite neighbors. However, to more precisely represent such a probabilistic lumped model,

179 one would have to go into a discussion of the topology on the function space. This concludes our discussion of the case of a finite number of neighbors. 4.6 Conclusion This ends the discussion of applying ergodic machines to simplifications. We saw how the concept could clarify the relationship between two probabilistic systems and also how one could use the structural properties to gain an understanding of the simplification procedures. In the next chapter, we conclude with some remarks about future directions.

CHAPTER V FUTURE DIRECTIONS In the foregoing chapters, we have developed new methods for studying the simplification of probabilistic systems. In particular, the usefulness of the concept of an Ergodic Machine is apparent, for this provides a structural handle with which to study probabilisticI models. In studying ergodic machines, we saw the importance of the concept of a faithful transformation. We characterized the faithfulness of multiplicative generators and showed that universally faithful transformations do not exist. We made connections to pseudo random number generators and considered the concepts of permutation independence and probabilistic homomorphisms. We also showed the theoretical derivations of lumped models for (1) Infinite number of components and neighbors, (2) Infinite number of components and finite number of neighbors, and (3) Finite number of components and infinite number of neighbors. In this last case, we really mean a complete neighborhood set so that the input is the same to each component. We experimentally studied a neuron network example for the case of finite number of components and neighbors. In this case, we called th;e l ped model an approximate homomorphic image. Table 5.1.1 gives a synopsis of the results of this analysis. Notice that in the case of positive feedback we cannot distinguish between aperiodic behavior and cyclic behavior, whenever we do not have a steady state. The data that 18o

181 TABLE S.1, SYNOPSIS OF RESULTS FOR NEURAL NETWORK EXAMPLE Number of Number of Result or Type of Model Neurons Neighbors Prediction Method Used Io Connections — Lumped Steady State Theoretical Base 25S 1000 0 Fluctuates about Simulation Steady State Positive FeedbackLumped o0 X Steady State Heuristic and Simulation Base 50-1000 20-60 Aperiodi? or Simulation Cyclic Lumped (Force finite finite Non Steady State Heuristic and Field) Simulation Probabilistic finite p'Aperiodic or Theoretical and Lumped Cyclic Simulation FlZip-FZop-. Lumped o' o 3 States Simulation Base 50-500 20-50 3 States Simulation (2 unstable) Lumped (Force finite finite 3 States Heuristic and Field) (2 unstable) Simulation Probabilistic finite 3 States Theoretical and Lumped (2 unstable) Simulation

182 we have gathered would be consistent with'both modes of activity and?..only further data with longer runs could distinguish between' these two possibilities. As is generally the case, any research work not only answers some questions, but, in addition, suggests new ones. We now discuss,: a few open questions that arise. We have shown the usefulness of using deterministic models as simplified versions of probabilistic models.. Can this simplification, which we have called an approximate homomorphism, be formalized? In particular, it might be useful to introduce topoIo — gical ideas into the questions of approximate homomorphisms in order- to make this concept precise. That is, the question of how good an.iapproximation we obtain, would be a topic of further study. In our' discussions of the modelling paradigm,,we have focusssed on two aspects of"a four way classification. We have studied a probabilistic base model that has a corresponding simplified lumped model. that is either deterministic or probabilistic. What about the case of a deterministic base model? The case of a corresponding deterministic lumped model to represent this type of base model is classical systems theory, but how about a probabilistic lumped model? Ergodic machines could provide an ideal vehicle to study this fourth classification of a deterministic base-probabilistic lumped pairing. Such a modelling situation is, in fact, quite common since the introduction of probabilistic elements often provides a way to examine intractable deterministic base models. This type of situation frequently occurs, for example, in. Political Science and Economics. The study of Ergodic Machines also suggests possible renewed investigation into the1 concepts os pseudo random number generators. We

183 have shown that certain properties are sufficient on our ergodic transformations to allow faithful representation of the associated probabilistic automata. Perhaps we can tailor pseudo random number generators for particular problems of interest rather than attempt to find "ideal" pseudo random number generators, These questions, and others that may suggest themselves to the reader, will hopefully be considered in future research work. NOTE: A preliminary version [2] of parts of chapter III appeared as a technical report.

APPENDIX I MATHEMATICAL NOTATION In this appendix, we describe some of the mathematical notation and notions that we have used in the text. A standard reference for the teal analysis is Royden [35]. References for the probability are Breiman [8] and Feller [15]. 1. A probability space (0,F,P) is a set Q, a a-algebra F on 2, and a probability measure P on 2. An algebra F is a set of sets that is closed under union, intersection and complementation. If the algebra is closed under countable unions also, it is said to be a a-algebra. 3. Let F be a a-algebra on Q. Then P is a probability measure if P(2) = 1 and P is a measure. For the concept of a measure, see Royden [35]. 4. Given a probability space (Q,F,P), a statement holds almost everywhere (a.e.), if it holds for all wDe, except possibly for a set A of measure 0, i.e. P[A] = 0. 5, Let f: X - Y and g: Y + Z. Then gof is function composition. That is, gof: X - Z such that (gof)(x) = g[f(x)]. id: X + X is the mapping such that id(x) = x for all x. 6. [a,b] is the closed interval that includes the two endpoints a and b. [a,b) excludes the point b and (a,b] exludes the point a. 7. A set A C B means that x~A — xXB. Notice that A C A is a true statement. 184

185 8. A function f is monotonically decreasing if for y > x, f(y) $ f(x). It is strictly monotone decreasing if for y > x, f(y) < f(x). 9. imn is lim sup. lim is lim inf. See Royden [35], page 36. 10. i.i.d. means independent and identically distributed. The concept refers to a set of random variables on a probability space.

APPENDIX II THE SIMULATION The simulation was a package of Fortran programs written to run on the Logic of Computers Group's system. The Logic of Computers Group is a research facility within the Department of Computer and Communi. ation Sciences at The University of Michigan. Its computer facilities include an IBM 1800 with disk bulk storage interfaced to a DEC PDP7 with graphic CRT display. The simulation package was modularized. There were separate initialization routines for both the base and lumped models. A list of the various subroutines and main programs follows. BASE: Main program for the base model run BINOM: Subroutine to calculate a binomial distribution CNNCT: Helps in setting up the connection pattern DIST: Computes distribution function for NOISE GPFIL: Used in file manipulation INEXT: Allows for various types of external input INIT: Base Model Initialization INITL: Lumped Model Initialization LUMPD: Main program for lumped model run (both deterministic and probabilistic depending on flags set) NOISE: Generates random noise PRNT: Print routines (also PRNT1, PRNT2) 186

187 RSETB: Used to reset base model state during a run RSETL: Used to reset lumped model state during a run THOLD: Generates the threshold function CHI: Uses output from base and lumped model to calculate chi-square statistics. FSKIP: Used in file manipulations SIGNF: Computes approximate significance level for chi-square statistics. Some routines called from these programs were input/output support routines. For further information, contact the author.

REFERENCES 1. Aggarwal, S., and Zeigler, B. P., "Canonical Realizations of I-0 Relations,," Presented at the 8th Annual Princeton Conf. on Inform. Sciences and Systems, 1974. 2. Aggarwal, S., Bethke, A. D., and Zeigler, B. P., "Analysis of a BaseModel and a Lumped Model of a Neural Network," The University of Michigan, Technical Report LCG-165, January 1975. 3.. Anninos, P. A., Beek, B., Csermely, T. J., Harth, E. M. and Pertile, G., "Dynamics of Neural Structures," J. Theor. Biol. 26 (1970), 121-148. 4. Arb.ib, M. A., "Automata Theory and Control Theory: A Rapprochement," Automatica 3 (1966), 161-189. 5. __, "Decomposition Theory for Automata and Biological Systems," JACC, August 1971. Published in Systems Structure IEEE, pub. no. 71C61-CSS. 6. Bacon, G. C., "The Decomposition of Stochastic Automata," Inform. and Control, 7 (1964), 320-329. 7. Billingsley, P., Ergodic Theory and Information, John Wiley and Sons, Inc., 1965. 8. Breiman, L., Probability, Addison-Wesley, 1968. 9. Burks, A. W. (ed.), Essays on Cellular Automata, The University of Illinois Press (1970). 10. Carlyle, J. W., "Reduced Forms for Stochastic Sequential Machines," J. Math. Anal.. Appl. 7, 167-175. 11.. Corynen, G. C., "AMathematical Theory of Modeling and Simulation," Ph.D. thesis, Computer, Information, and Control Engineering, The University of Michigan, 1974. 12. Csermely, T. J., Harth, E., and Lewis, N. S., "The Netlet Theory and. Cooperative Phenomena in Neural Networks," J. Dynamic; Systems, Measurement and Control, Sept. 1973. 13. Evans, G. W., Wallace, G. F., and Sutherland, G. L., Simulation Using Digital Computers, Prentice-Hall Inc. (1967). 14. Ewenczyk, D., "Formalization of the Boyd and Epley Three Stage. 188'

189 Model of a Multiple Resource-Allocation Operating System," Master's Thesis, Israel Institute of Technology, Haifa (1973). 15, Feller, W., An Introduction to Probability Theory, John Wiley and Sons, 1968. 16. Foo, N. Y., "Homomorphic Simplification of Systems," The University of Michigan, Technical Report LGG-156, July 1974. 17. Foy, J. L., "A Computer Simulation of Impulse Conduction in Cardiac Muscle," The University of Michigan, Technical Report 166, December 1974. 18. Gill, A., Introduction to the Theory of Finite State Machines, McGraw-Hill (1962). 19. Halmos, P. R., Lectures on Ergodic Theory, Tokyo: The Mathematical Society of Japan (1956). 20. Harth, E. M., Csermely, T. J., and Lindsay, R. D., "Brain Functions and Neural Dynamics," J. Theor. Biol., 26 (1970), 93-120. 21. Hartmanis, J. and Steams, R. E., Algebraic Structure Theory of Sequential Machines, Prentice-Hall, 1969. 22. Kalman, R. E., Falb, P. L., and Arbib, M. A., Topics In Mathematical System Theory, McGraw-Hill, 1969. 23. Kemeny, J. G., and Snell, L. L., Finite Markov Chains, D. Van Nostrand, 1960. 24. Klir, G. J., Trends in General Systems Theory, John Wiley, New York, 1972. 25. Knuth, D. E., The Art of Computer Programming, vol. 2, AddisonWesley, 1971. 26. Laski, J. G., "On Time Structure in (Monte Carlo) Simulation," Operational Research Quarterly, Vole 16-3. 27. Mihram, G. A., Simulation: Statistical Foundations and Methodology, Academic Press, 1972. 28. Mortimer, J. A., "A Cellular Model for Mammalian Cerebellar Cortex," The University of Michigan, Technical Report 03296-7-t October 1970 29. Page, C. V., "Equivalences Between Probabilistic and Deterministic Sequential Machines," Inform. and Control (1966), pp. 469520. 30. Paz, A., "Homomorphisms Between Stochastic Sequential Machines and Related Problems," Math. Systems Theory, Vols. 2, 3 (1968).

190 31., Introduction to Probabilistic Automata, Academic Press (1971). 32. Plum, T., "Simulation of a Cell-Assembly Model," Ph.D. dissertation, The University of Michigan, 1972. 334 Rappaport, A., Bull. Math. Biophys., 14 (1952), 365. 34. Rochester, N., Holland, J. H., Haibt, L. H. and Duda, W. L.,, IRE Trans. Inform. Theory, 2 (1956), 80. 35. Royden, H., Real Analysis, New York: Macmillan, 1968. 36.. Santos, E. S., "Algebraic Structure Theory of Stochastic Machines.' Math Systems Theory, Vols. 6, 3 (1972). 37. Weinberg, R. and Zeigler, B. P., "Simulation of a Living Ceil: Multilevel Control System," J. of the American Society for Cybernetics, 1970. 38. Whitehead, B., Personal Communication, 1975. 39. Zadeh, L. A., "The Concepts of System Aggregate, and State in System Theory," in Systems Theory (Zadeh, L.A. and Polak, E., eds.), McGraw-Hill (1969), pp. 3-42. 40, Zadeh, L. A. and Desoer, C. A., Linear System Theory —The State Space Approach, McGraw-Hill, 1963. 41. Zeigler, B. P., "Automaton Structure Preserving Morphisms with. Applications to Decompositions and Simulation," in Theory of Machines and Computations, Academic Press, N. Y. (1971). 42-. __, "Modelling and Simulation: Structure Preserving Moxphisms. for Discrete and Continuous System" in Computers and-i Automata, Proc. 21st International Conf., Polytechnic Institu e of Brooklyn, Brooklyn, N. Y., 1972. 453., "On the Formulation of Problems in Modelling and Simulation within the Framework of Mathematical Systems Theory," Proc. Sixth International Congr. on Cybernetics, Namur, Belgium, 1972. 44. ___, "Statistical Simplification of Neural Nets," Int. Jounral of Man Machine Studies (to appear, 1975). 451., Theory of Modelling and Simulation, John Wiley and Sons (frithcnMiing). UNIVERSITY OF MICHIGAN 3 9015 02493 0763