THE UNIVERSITY OF MICHIGAN INFORMATION SYSTEMS LABORATORY SENSORY INTELLIGENCE LABORATORY Department of Electrical Engineering College of Engineering Technical Report ISL-65-2 MONOTONE CONGRUENCE ALGORITHMS Richard F. <Arnold Donald L. Richards April 1965 This work was supported in part by Grants No. AF-AFOSR-367-63 and AF-AFOSR-367-64 of the U.S. Air Force Office of Scientific Research, Directorate of Information Sciences, Office of Aerospace Research, in part by Contract No. AF 30(602)-3546, Rome Air Development Center, and in part by NSF Grant GP-2778. Some of the material was presented at the ICMCI Conference in Tokyo, Japan under the title "Monotone Reduction Algorithms".

Nomenclature Z finite set (alphabet) s (or s) arbitrary element of Z (letter) n number of elements of Z ZaS~* ~set of all expressions on Z (words) X empty word k, k' length of a word m index on the letters of a word ts u, v) w, xi y) z (ot, u,,, et,.) arbitrary word (or ti, ui, etc.) C~~~ ~is a member of _C —_~ ~is a subset of < is less than (relation of total order on S*) > is greater than is congruent to (two-sided congruence relation on Z*) <3 is included in (relation of partial order on Z*) x -> y simple substitution x 4 y concluding substitution A, B, C arbitrary algorithm A(w) the word which results when A is applied to w is equivalent to (equivalence relation on algorithms) AL set of words appearing at the left of substitutions of A AR set of words appearing at the right of substitutions of A At core algorithm for A auxiliary alphabet of0a~ c class of algorithms a(w) number of applications of A to w P, Q, S, Ss T, U specific sets of words 2

Abstract Given an associative system in which each element is assigned a cost and in which an equivalence relation obtains between elements, it is often of interest to ask the question: what is the least costly word equivalent to a given word? The case in which the equivalence relation is a two-sided congruence relation is studied here, one example being the problem of optimizing a type of computer program. Such optimizing processes may be formalized as Markov normal algorithms. A general result concerning order relations on finite alphabets is established first. Then, the properties of a class of Markov normal algorithms are investigated. It is shown that each such algorithm must always terminate, and that every class of mutually equivalent algorithms contains a unique minimal algorithm which can be obtained by applying any algorithm of the class to itself. 3

1, Motivation Given an associative system (semi-group) in which each element is assigned a cost and in which an equivalence relation obtains between elements, it is often of interest to ask the question: what is the least costly word equivalent to a given word? Three examples of such problems are the travelling salesman problem studied by Dantzig, et al. (1954), the optimum control problem (cf. Pontryagin, 1962), and the problem of finding the least word which performs a given mapping upon the states of a finite automatono Algorithms do not exist for all such problems, since the solution may in general require the solution of the word problem for semi-groups, which is known to be unsolvable (Davis, 1958). The problems mentioned above involve natural two-sided congruence relations. Two input words x and y to a finite automaton may be regarded as equivalent if they perform the same mapping from the set of states of the automaton to itself' i.eo, if M(s,x) = M(s,y) for all states s where M is the transition function of the automaton. A simplified model of computer programming can be obtained by reinterpreting the input words as programs, and the initial state as data upon which the programs act; here the programs are not self-modifying and contain no instructions which transfer controlo Programs P and Q, are then equivalent if they reach the same result for each set of data; in that case the programs RPS and RQS must also be equivalent, R being run immediately before P (or Q) and S being run immediately after. Thus the eqivalence relation satisfies the definition of a two-sided congruence. 4

The programming and automaton problems can be solved by enumeration if necessary; however, the solution algorithms are often impractical and may give no intuitive insight into the structure of the optimizing process itself, It is hoped that greater insight and more effective algorithms can be found by studying the general class of associative systems with twosided congruence relations, a class which contains problems of all degrees of difficulty. Where such a two-sided congruence relation exists and, in addition, there is a total order relation on the costs of the words which has certain natural properties, the theory of monotone congruence algorithms applies. The properties of this class of algorithms are investigated below. A general result concerning order relations on finite alphabets is established first. Then it is shown that each monotone congruence algorithm must always terminate, and that every class of mutually equivalent monotone congruence algorithms contains a unique minimal algorithm which can be obtained by applying any algorithm of the class to itself. 5

2o Monotone Congruence Algorithms Let Z be a finite set called the alphabet, whose elements are called letters~ Let Z* denote the set of all expressions or words on the letters of EZ including the empty word X\ (The letters may be interpreted as computer instructions or subroutines; the words, as programs or as systems of subroutines. The word uv represents the program formed when the program u is followed by the program v.) Let < ("less than") be a total order relation on Z*o The symbols <, >, and > will be used in the usual way. (In the interpretation, "<" will often be derived from a cost function on the letters, so that the cost of uv is the cost of u plus the cost of v, Within classes of words having the same cost, the total order can be completed lexicographically as follows. Assume u and v have the same cost. If u is shorter than v, let u < Vo If u and v have the same length and u=wslu and v=ws2vi for s s2Ce, Sl s2 and w ulvl EZ* define u < v if sl < s2 and v < u otherwise. The total order thus defined can be shown to satisfy assumptions (1) and (2). (The cost of a computer program might reflect its execution time, the space required, etc.) Assume that~ (1) k < w for all we-'*; (2) if x < y, then uxv < uyv, for all u, v, x, and yeS*. (The first assumption implies that the "empty program" is less costly than any other program; the second assumption holds wherever the cost of uv is the cost of u plus the cost of v, and in many other cases as well.) 6

Let ~ (" is congruent to") be an equivalence relation on Z such that (3) if x - y, then uxv - uyv for all u, v, x, and yes*. (x is congruent to y if these two programs compute the same function upon the internal states of the computer,) Definition If v ~ w implies v > w for all v in Ce E* w is a minimal word (over < and ~). If v - w and w is a minimal word, then w is the minimal equivalent word for v, Definition If there exist u, ve Z* such that uxv = y, then x is a subword of y. If u A x or v f k, then x is a proper subword of y. Theorem 1. Any subword of a minimal word is a minimal word. PROOF: Suppose, for purposes of indirect proof, that w = uxv where w is minimal and x is not. There must be a word y such that y - x9 y < xg but then uyv - uxv = w and uyv < w by (2) and (3), which contradicts the minimality of w. Theorem 1 is a natural generalization of the Principle of Optimality of Bellman (1957) to two-sided congruence relations. Note that it does not depend upon the introduction of a particular algorithm. A general notion of algorithm is now defined following Markov (1961). Definition Let "-"' and "'-" be symbols not in So Then "x -~ y" is a simple substitution of E* and "x -> y" is a concluding substitution of 2* if x, ycZ*. 7

Definition A Markov normal algorithm on Z* is a finite ordered list of substitutions of E*o The letters A, B, and C will be used to denote algorithms. The set of words which appear to the left of the arrows in the substitutions of A will be denoted AL ("A-left") while those which appear to the right of the arrows will be denoted AR ("A-rightt") Definition The substitution "x - y" (or "x - y") applies to weS* if there exist u, veS* such that w = uxv. An algorithm applies to w if any of its substitutions applies to wo By convention, the operation of a Markov normal algorithm upon a word w is interpreted as follows If "x -e y" (or "x > y") is the first substitution of the algorithm, w = uxv, and u1 is at least as long as u wherever w = u txv'- the first application consists of replacing uxv by uyvo The same procedure is then repeated for uyvo If the first substitution does not apply to w the second. substitution is considered instead; if neither applies, the third substitution is considered, etc. Whenever a concluding substitution applies, the process stops immediately after it has been applied. Otherwise, the process stops only when a word is reached to which none of the substitutions applyo Since the process is deterministic, it must produce a unique sequence of words, w, w!, w2, o o of length 1 or moreo The sequence is of length 1 only if the algorithm does not apply to w. The number of applications a(w) of A to w is one less than the length of the sequence. The algorithm A produces x from w if x is a member of the sequenceo If z is the last word in the sequence, define A(w) = z 8

Definition A = B ("A is equivalent to B") if A and B are algorithms and A(w) = B(w) whenever either A(w) or B(w) exists. The algorithms which form the main subject of investigation can now be defined. Definition Let Z be an alphabet, < be an order relation satisfying (1) and (2), and ~ be an equivalence relation satisfying (3). A monotone congruence algorithm (MCA) over < and ~ is a Markov normal algorithm A for which: (a) A(w) is minimal for all weS* for which A(w) exists; and (b) for each substitution "x -> y" (or "x - y") of A, x > y and x - y. (in terms of computer programs, each substitution gives a way of replacing a subprogram by a less costly program which computes the same function; if the algorithm terminates, it must produce the least costly program which computes the same function.) Example. For Z = (1, 2, 3), the following MCA maps x, y EZ* to the same word if and only if the sum of the digits of x is equal to the sum of the digits of y modulo 4. Let the letters 1, 2, and 3 cost 7, 2, and 4 units, respectively, and let "<" be induced by these costs. The minimal words are 22, 23, 2, and 3. The algorithm is: 222 2 2 32 -* 23 1 -~ 32 33 - 2 223 - 3o 9

(If the letters 1, 2, and 3 are interpreted as shift instructions for a four-bit circular shift register, the algorithm maps each program of shift instructions to the least costly equivalent program.) 10

3. Termination The first principal result to be established is that for any MCA A and any word w, A(w) exists; i.e., that every MCA always terminates. The proof depends on a quite general result about total order relations on E*. Definition For x,y distinct words of Z*, x P y ("x is included in y") if x = xis2.. s for s fe (i=12,..m) and there exist words vo, vl, o.., v c Z* such that y = vOslV1,,, s v m mm The partial ordering thus defined is called the inclusion partial ordering. Observe that every total ordering of Z* satisfying (1) and (2) is a refinement of the inclusion partial ordering in the sense that x -< y implies x < y. Theorem 2. Every total order which is a refinement of the inclusion partial ordering is a well-orderingo A proof of Theorem 2 is given in the Appendix. Theorem 3. For any MCA A and any weZ*, A(w) exists. PROOF: A(w) must exist unless A produces an infinite descending sequence of words w > w1 > w2 > o.., which contradicts the fact that "<" is a well-ordering. Markov normal algorithms may in general involve the use of auxiliary letters. That is, the alphabet of A may consist of disjoint subsets Z and eA such that for all w~c*, A(w) eZ*o Where attention is concentrated on the words of 2*, the letters of iC may then be called auxiliary letters, since 11

they appear only in the intermediate stages of the application of A to the words of S. The words with auxiliary letters are required to obey the usual conditions on the ordering and congruence relations. The next two theorems show that every monotone congruence algorithm is equivalent to one which has neither auxiliary letters nor concluding substitutions. Theorem 4, If A is an MCA such that A(w)eZ* for every word weS*, then A is equivalent on E* to an MCA whose substitutions involve only words of Z*. PROOF: Given A, form the algorithm B by deleting each substitution "x -> y" (or "x - y"), for which xZZ* and then replacing each of the remaining words of AR by its minimal equivalent word. BL and BR then contain only words of S*. To show that A(w) = B(w) for all we,*, note that if w is minimal, A(w) = B(w) = w. If w is not minimal, some substitution "x -> y" (or "x - y") of A must apply to w, with w = uxv and u, x, veZ*o, Then, by the choice of B. the substitution "x -4 z" (or "x e+ z") must be in B, where z is the minimal equivalent word for x. It follows from the hypothesis of the theorem that z is also in E*, whence uzv e E* as wello Thus, after one application, B has replaced w by a lesser equivalent word from E*. The argument can be repeated for the new word, etc.; after a finite number of repetitions, B will produce the minimal equivalent word for w. Thus B(w) = A(w) for every word we*2, which was to be shown. Theorem 5. Each MCA is equivalent to an MCA which has no concluding substitutionso 12

PROOF: Given an MCA A, form the MCA B by replacing each concluding substitution "x - y" by the simple substitution "x -> y. If B f A, there must be a word w such that B(w), A(w), and the application of A to w must involve atleast one concluding substitution. By the nature of concluding substitutions, only one can have been applied, and it must have been applied only at the last step, producing A(w). Since B has the same substitutions as A, the sequence of words produced by B from w must be identical up to and including the step which produces A(w); since B(w) f A(w), some substitution of B must apply to A(w) also, reducing it to a lesser word which is congruent to A(w). Then A(w) cannot be a minimum word, and A cannot be an MCA. The contradiction establishes that A = B. Note that if the substitutions of an MCA which has no concluding substitutions are reordered, the resulting algorithm is equivalent. 13

4. The Core Algorithm The second main result is that every class of mutually equivalent monotone congruence algorithms contains a unique minimal algorithm which can be obtained by applying any member of the class to itself. The algorithm is unique up to the order of the substitutions and minimal with respect to the number of substitutions. Definition For any MCA A, the core algorithm A' is formed by: (1) replacing all concluding substitutions by the identical simple substitutions; (2) where two or more substitutions have the same left-hand word, deleting all but one of them; (3) deleting every substitution "x -> y" for which some proper subword of x is in AL; and (4) replacing each remaining word of AR by its minimal equivalent wordo Theorem 6. For any MCA A, (1) A' is an MCA; (2) A' A; (3) If B - A, then AL C: BL PROOF: A' is chosen so that x > y and x ~ y for each of its substitutions "x -> y". Thus to show that A' is an MCA and that A' = A, it is only necessary to show that A' applies to every word to which A applies. If A applies to w, some word v e AL must be a subword of w; but then, by the choice of A', either v e AL or some proper subword of v is in both AL and A' In either 14

case, A' applies to w alsoo Thus A' successively reduces each word to lesser words to which it is congruent until, eventually, the minimal equivalent word is reached; A' - A. Observe that A' was chosen in such a way that each proper subword of a word of x c AL is a minimal word; for otherwise, the substitution "x -> y" would have been deleted in forming A'. Now suppose that B - A. B must apply to each word of AL, since none of them is minimal. At the same time, B cannot apply to any proper subword of any of the words of AL since all the proper subwords are minimal. Thus each word of AsL must appear as the left side of a substitution in B. The words of AL being distinct, it follows that AL C BL. This completes the proof. For any class of equivalent MCA containing A, AL is the unique set of non-minimal words whose proper subwords are all minimal, each word of AS is the unique minimal equivalent word for the corresponding word of AL, and there are no concluding substitutions. Thus A' is unique except for the order in which the elements of AL appear. 15

5. Uniform Optimality The core algorithm is minimal with respect to the number of substitutions, A further kind of minimality based upon the number of steps required to reduce a word is now considered. Definition An algorithm C is uniformly optimal within a class a of equivalent algorithms if, for all AEa and all wcZ, a(w) < c(w). Note that a(w) (defined earlier as the number of applications of A to w), usually depends on the order of the substitutions of A. Theorem 7. The only MCA which is uniformly optimal within a complete class of equivalent MCA is the algorithm which has no substitutions. PROOF: The algorithm which has no substitutions is not equivalent to any other MCAo If, on the other hand, A applies to any word w, it must also apply to each word of the sequence ww, www,.... Let z be any member of this sequence which is not in the finite set AL; then a(z) > 1o Form B from A by adding the substitution'?z -> y" at the beginning, where y is the minimal equivalent word for z. Then b(z) = 1 < a(z), and A is not uniformly optimal within its complete class of equivalent algorithms. Theorem 8. If C is uniformly optimal within the class of a of equivalent MCA such that AL = CL for every Acog, and no two substitutions of C have the same left-hand word, then each word of CR is a minimal word. 16

PROOF: Suppose that in the substitution'"x - y" of an MCA A, y is not a minimal word. Then a(x) > 1. If y is replaced by its minimal equivalent word to form the algorithm B, then B - A and b(x) = 1 < a(x), so A is not uniformly optimal within the class. It was shown in Theorem 6 that any algorithm with a minimum number of substitutions must have the same set of left-hand words as a core algorithm. Theorem 8 shows that within those with the same set of left-hand words, only the ones which have the same set of right-hand words as the core algorithm can be uniformly optimal. Thus the study of optimality reduces to the study of the algorithms which can be produced by reordering the substitutions of the core algorithm. It will be shown in conclusion that a uniformly optimal ordering of the subsitutions does not always exist, and that more than one uniformly optimal ordering may exist. Example. It can be shown that 01 -> X 10 % x 00 - 1 11 - 0 is a core algorithm which is uniformly optimal for the class of equivalent MCA which have the same left sides. Other uniformly optimal orderings of the same core algorithm can be obtained by interchanging the first and second substitutions or the third and fourth substitutions. Note that the left-hand sides are not ordered according to the < relation for any of these optimal orderings; for if 0 < 1, then 00 < 01, Finally, there exist core algorithms which cannot be reordered to produce an algorithm which is uniformly optimal within the class of equivalent algorithms with the same left sides. 17

Example. 01 - 2 10 - 2 02 -* 20 12 - 21. Let u = 010 and v = 101. If A is any ordering of these substitutions such that "01 - 2" precedes "10 - 2" then a(u) = 1 and a(v) = 2. If, on the other hand, B is any ordering of the substitutions such that "10 -> 2" precedes "01 2", then b(u) = 2 and b(v) = 1. Since a(u) < b(u) and a(v) > b(v), neither A nor B can be uniformly optimal. 18

Appendix Included here are the proofs of Theorem 2 and of a lemma concerning the inclusion partial order; the proof of the latter is the principal step in proving the theorem. The definition of the partial order < is restated here for reference. Definition For x,y distinct words of E, x < y ("x is included in y") if x = sels2 m.. s for s C E (i=l,2,...,m) and there exist words v, vl.. ev cZ such that y v= 0Slvl2v2. s v Lemma 1l Every infinite subset of Z contains two words v and w such that v 4 w. PROOF: A set of words such that no word in the set is included in any other word of the set will be called independent. An infinite independent set of words whose alphabet has at most n letters and whose shortest word is of'length k will be called an (n,k) i.i. set. The proof that no infinite independent set exists proceeds by an indirect proof involving simultaneous backward induction on n and k; i.e., it is shown that if an (n,k) i.i. set exists, then either an (n,k-l) i.i. set exists or an (n-l, k') i.i. set exists for some kt' Repeated application of the argument must yield a (1,1) i.io set. Since such a set obviously does not exist, the hypothesis that an infinite independent subset exists must also be false. Suppose, then, that an (n,k) i.i. set S is given. If the alphabet has less than n letters, the induction step is immediate. Otherwise, pick any word w from S which is of length k and let s denote the first letter of w. 19

Throughout the proof it is frequently convenient to divide the infinite set under consideration into two subsets and to prove for one of them that if it is infinite, either an (n, k-l) i.i. set or an (n-l1 k') i.i. set exists Since the induction step is completed in that event, the subset can be discarded, and the other of the two subsets can be assumed infinite. The process just outlined can be applied to the set SS of all words of S which start with the letter s. The set of words formed from S by s removing the initial letter of each word of S contains a word of length k-I (namely, the word formed by deleting the initial letter of w) and is an independent set (or else, trivially, S is not independent). Thus if s S is infinite, an (n, k-l) i.io set exists. Similarly, the set of words of S which do not include the letter s is either finite or it is an (n-1, k) i.i. set. When both this set, and all of Ss except w are eliminated from S, there remains a set T = tti. which can be assumed to be infinite. Thus T is an (n, k) ioi. set in which each word contains at least one occurrence of the letter s, and no word except w begins with s, The word t = w is of the form sy0 for some word yo; each other word ti of T is of the form xisyi, where x. does not contain the letter s and each x, is of length 1 or more. Let X denote the set {x1, x2,.o 5, reordered so that if xi q x., then i < j. It will now be shown that X contains either an infinite ascending chain P = ([pI p2 ooo ) where pi Pi+l for all i, or else an infinite independent sequence Q. The two sequences P and Q of elements of X can be chosen as follows, beginning with the element x1o At each stage, ask whether the element x, then under consideration is included in (or equal to) an 1 20

infinite number of elements of Xo If so, include x. in P and go on to consider the next element in the order which includes (or is equal to) x o If not, include x, in Q and go on to consider the next element in the order which includes no earlier element of Q (and which is not already in Q) At each stage only a finite number of elements of Q have been chosen, and each has been chosen so that it includes (or is equal to) only a finite number of elements of X. Therefore an infinite number of elements which include no earlier element of Q (and which are not already in Q) must remain at each stage, and the process can be continued indefinitely. Q is an independent set on an alphabet which does not contain the letter s; it is therefore either an (n-l, k') i.io set or finiteo Accordingly, P can be assumed infiniteo The elements t. = x.sy. of T for which x. is not a member of P may be i 1s1i1 discarded. The remaining members of T can be ordered according to the order of the elements in P; then if t, = x syi appears before t. = x.sy., either 1 11 J J a x. = x. or x. <1 xo Here the words Yi must all be distinct, otherwise 1 J 1 0 1 xisyi _ xjsyj for some i, j, which contradicts the independence of To Wherever yj d yi, j > i, in the above ordering, remove t. leaving a set Uo U is independent because it is a subset of the independent set To Now suppose that U is infinite. The set of t. just removed must then a be infinite, and each y. must be included in one of the y. which remainso The number of words included in each remaining yi must be finite, since none can be longer than yi itselfo Since the number of yi is also finite, the infinite number of yj included in them cannot all be distinct, which is a contradiction. Hence U is infiniteo 21

So far it has been shown that either an (n, k-l) i.i. set, an (n-1, k') i i. set, or the infinite independent set U constructed above must exist. In conclusion, it is shown that the set Y of all the yi which appear as tails of elements of U is an (n, k-l) i.i. set. The word y of Y is of length k-l, since w = syo and w was assumed to have length k. If yi < yj, i < j, then xsyi < xjsy., which contradicts the independence of T, since already x. < x.. On the other hand, U was defined so as to exclude the case 1- 3 in which yi < yj for i > j. Thus Y is an (n, k-l) i.i. set and the lemma is proved. Theorem 2. Every total order which is a refinement of the inclusion partial ordering is a well-ordering. PROOF: Let < be a total order on E- such that x < y implies x < y. If < is not a well-ordering, there exists an infinite sequence of words wl> w2 >... An infinite independent set V = (v1, v2,... 3 can be selected from the sequence as follows. Let v1 = wl. For i > 1, let vi+l be the first word of the sequence which is not included in vl, v2,.., or v.. Since only a finite number of words is included in any given word, such a v. always exists and must precede an infinite number of words of the sequence. Thus V is an infinite independent set, contradicting the lemma above, and the hypothesis that < is not a well-ordering must be false, 22

References Aris, R. (1963) Discrete Dynamic Programming, Blaisdell, New York. Bellman, R. E. (1957) Dynamic Programming, Princeton University Press, Princeton, New Jersey. Dantzig, G. B., Fulkerson, D. R., and Johnson, S. (1954) Solution of a Large-Scale Travelling-Salesman Problem, Journal of the Operations Research Society of America, Vol. 2, No. 4, pp. 393-410. Davis, M. D. (1956) Computability and Unsolvability, McGraw-Hill, New York. Markov, A. A. (1961) Theory of Algorithms The Israel Program for Scientific Translations, Jerusalem. Pontryagin, L. So (1962) The Mathematical Theory of Optimal Processes, Interscience Publishers, New York. 23

UNIVERSITY OF MICHIGAN I3 9015 02493 89641