THE UNIVERSIT Y OF MI C HIGA N COLLEGE O' LITERATURE, SCIENCE, AND THE ARTS Department of Philosophy Technical Note THE STAR-HEIGHT OF REGULAR EXPRESSIONS L. C. Eggan ORA Projects 03105 and 04422 under contract with: DEPARTMENT OF THE NAVY OFFICE OF NAVAL RESEARCH CONTRACT NO. Nonr 1224(21) WASHINGTON, D.C. and OFFICE OF THE CHIEF SIGNAL OFFICER RESEARCH AND DEVELOPMENT DIVISION CONTRACT NO. DA-SIG-36-039-61-G4 WASHINGTON, D.C. administered through: OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR March 1962

-e n C -. - A rA (A ^ Q(,^

lo INTRODUCTION* Kleene3 was the first to introduce the concept of regularity for expressions and sets of wordso Others, including Copi, Elgot and Wright, Myhill5 and McNaughton and Yamada4 have discussed this topic and its relation to finite automata. We refer the reader especially to Refo 1 for a presentation similar to ours. Let g be a finite set of objects, say q = (A1,A2,.o, An). Let ( be the free semigroup with identity generated by ~ where we write the operation multiplicatively and denote it by juxtaposition. We denote by Am the element.' Let 9 denote the identity element, so eAi = Aie = Ai for all m times i = 1,.o.n. We will also call 67 the set of words on the alphabet 6, and hence an element of< a word (on the alphabet A). Let S=4U ( A,O }, where A and ~ are distinct objects not in A. Let "V" and "*" be associative binary operations and "*" a unary operation and define = (,V9,,*) to be the free algebra generated by zwith operations V, and *o Thus aJ, and aC wAe implies oVE, a -c and a*e (. We call any element of G a regular expression (on 8). As usual, we shall write a( for acO-. We next define the concept of regular set (or regular event). Define the mapping from ( into 2 inductively as follows: (we read icI as "the set denoted by a") *This research was supported by funds supplied by the Office of Naval Research through Contract No. Nonr-1224(21) and the Office of the Chief Signal Officer through Contract Noo DA-SIG-56-039-61-G4c 1

(i) Ai = A=, i =,..., n, IA = ~, 1l = (e). (ii) For a,EGo, ivWi1 1a IU iX1, Ia=0 = (xy; xe al1, yE 13, a*1 = lo-IL UaUl lljaaIc0LJ... Thus6 = (?* I = I(A1VA2V...VAn)*I, so we generally write (1* for the semigroup with identity generated by the set Ci. We define any subset of 0( to be an event. We say that an event E is regular in case there exists a regular expression ae ( such that I I = E. Thus the class of regular events (or regular sets) is the range of the function I just defined. It is well known ([1, 3, 4, 51) that the class of regular events is a Boolean Algebra of sets. Finally we define the concept of *-height (read: star-height). We define the function h: 6( Z, (where Z denotes the integers) inductively as follows: (1) For sEi, h(s) = 0 (2) For a,, o, h(cro) = h(oVco) = max (h(o), h(LD)) h(o*) = h(a)+l. For example: h(Aj*(A2VA3*)) = 2, h(Ai*V(AL*A2VA3)*) = 2. We call h(o) the *-height of a. If S is a regular event, we define the *-height of Z by h(Z) = min h(a): 1(7 = Z). 2

It is clear that for each integer n there is a regular expression of *-height n. Our main result is that for each integer n, there is a regular event of *-height n. We, in fact, show somewhat more; namely, that if a regular event Z contains words of a certain type then h (Z) n. It should be mentioned that there is some hope of relating the *-height of the behavior of a finite automata or nerve net to the number and complexity of its feedback cycles (cf., Ref. 1 for terminology, and also cf., Ref. 2). 2. PRELIMINARIES In this section we introduce some notation and conventions and prove some preliminary results Let A1, A2,..., An, be an infinite sequence of distinct letters. Let Jk = (Ai, A2,..., Aj] where j = 2k-1, and let Tfk = (AJ+l,..o, A2j} so ^k^U iJ k+l \ = wk+-1~ Xwill denote a finite alphabet, but we shall presume thatSk( whenever we speak of. Thus although Xmay become arbitrarily large, at any point in our discussion it will be finite. All sets and expressions considered will be presumed to be regular subsets of 2* or regular expressions on. (Recall J* denotes the semigroup with identity e generated by ao) Small Greek letters will denote regular expressions, capital Greek letters the corresponding regular set, lower case Roman letters will denote words and upper case Roman letters will denote letters of the alphabet. If ZEcx, let Z = fae2*, sxyxayez) 3

i.e., Z is the set of all subwords of words of Z. Let h(a) and h Z be the *-heights of the expression a and the set Z, respectively. We define the properties k, k = 1, 2,.., on the class of regular sets as follows~ Let'be a regular seto Z has property $1 (relative to the letter A-) in case (Vn)( (H >n) ( ). Thus Z has property )i on the letter Al if Z contains arbitrarily long strings of the letter A1 as subwords. If a word a contains A1 for some m >n, then we say that a is a 1-word for exponent n (on the letter Al). We proceed by inductiono Suppose that u is a (k-l) word for exponent n on -Jk-i and v is a (k-l) word for exponent n on ck-1 and let j = 2 -1. Then any word of the for.m (uvAj)r for m >n is a k-word for exponent n (on )k). We say that Z has property k relative to k in case (i) Z contains k-words for arbitrarily large exponent (on/k), (ii) Z has property Ok-1 relative to each of the setski andlk1_ and (iii) for Ap kE l Aqek fk ApAq4.. Thus, for example, Z has property $2 on 2 in case (Vn)(mil, m2, m3 > n) mi m2 ms m Ai A2 A) A C and A2Ai. )C We remark here that it is clear that if Z contains no k-subword for exponent m, then Z = 1o'2;ac12Ej contains no k-subword for exponent 2m. We next state two lemmas, the first of which follows from the fact that concatenation distributes over union. Lemma 1. If h () = n, then there is a regular expression a which denotes Z such that = 7Vy7V... VY, 12 m

where each yi is of the form xl x2 2 o- Xs s Xs+ls (1) where xieJ* and 0< h(ai)< n-l, for each i. Lemma 2. If 1ylVy2V.o Vm I has property ok' then for some j, \7j has property ~k. If * * has property k. If Ixl a.. x s s Xs+ll has property Ok' then so does some I* * Proof. The first statement is clear. Let y = XiOl.o. xsasxs+1. If we suppose that y I has $k, then clearly each IalI satisfies (ii) and (iii) for the definition of (k. Now each of the x.'s is a word and hence of finite length. Thus, since 171 contains arbitrarily long subwords of a certain type, i.e., IY! contains k-subwords for arbitrarily large n, at least one of the ICaj must also have this property. Definition. Let Pk be the following proposition: If Q is regular and if Q satisfies Ok, then h(s) >k. We shall show by induction in the next section that Pk is true for all k. To simplify this proof, we now prove a lemma. Lemma 3. Let k,1 be an integer and suppose that proposition Pk is true. If h(L) k-l and if Z* has Ok relative to k' then Z contains a word on the letters of /k. Proof. Since Pk is true9 there clearly exists an N such that Z contains no k-subword for exponent no (Note that all k-words are relative to k' ) Recall that by a previous remark E contains no k-subword for exponent 2N. Suppose now that E contains no word onko. Then any word of Z, and hence 2, which contains the k-subword t is of the form xty where at least one of x and y contains a letter not iniXk. If further xtyeZ and t is for exponent N, 5

then both x and y contain a letter not in k. This follows since xty must be of the form.x xtlty' where x"t tm tyL and x' and y' contain letters not in kd. Thuss also contains nc k-subwor fQor exponent 2N. By induction we obtain that, for all n, ZL contains no k-subword for exponent 2N so A* also has this property. But this contradicts the fact that A* has.k' 3. THE MAIN RESULT In this section we provee the main result and give some examples. Theorem. P. is true for all positive integers k. Proof The proof is by indauction If Q satisfies $~, then clearly Q is infinite so h(Q) >1 and this establishes P1o Suppose then that Pk is trueo Let Q have property k+l relative to k+l-, but suppose h(Q) < k. Since Q also has $k' we have that h(Q) = k. Let a decnote the set Q. 3y Lermna 1 we may assume CD = y V.. Vm where each 7. has the form (1) L By Lemma 2, some y., say y1, has k+io Let yl = xlCl.. x a% xsl where each x.e) and h(. i)< k-l. By Lemma 2, some;,* say dr, has property Ok+ l Hence ar has k relative to k and also $k relative to Ck. Thus by Lemma 39 since h.( jat ) h(c )< k-], we have that la r contains a word, say a, on the letters of. and jar also contains a word, say b, on the letters of Tk' Then bae c Ia so there exists Ape k and A k such that ApAq jal i But this contradicts condition (iii) of property k+l' Therefore h(Q) >k+lo This completes the proof of the theorem. Corollary 1. For any positive integer k, there exists a regular set of *-height ko 6

Proof. Let 1 = Al, let 2 = (A1AA3), and let'y = A2, 72 = (AA~A6). Then define Pk and 7k inductively by Pk (ok-l 7k-l A2k-1) and k Tk is obtained from 3k by adding 2 -1 to each subscript. Thus k is on the letters of Tk. The sets 13kl clearly satisfy "k. We would next like to note one obvious generalization of property Ok for which Pk is also true.* We may replace the occurrence of the Aj in the definition of a k-word by a word, say bk, on the letters of Af, provided of course condition (iii) is not violated. Thus, for example, we would have the following 3-word for exponent 5 (A(A 2(A3AlA3 ))6(A4AA6A4A6AsA6 1oo(AA6A7AiA2) and a typical expression would be (AA2b2) (A4Ac2) 3) Note that (iii) requires the words bk and ck to contain the letters Aj and A2j, respectively, where j = 2k-1. Before proving the final corollary, we prove a lemma which may be of some independent interest. Recall that regular events form a Boolean algebra. We use ~ and () to denote relative complement and intersection, respectively. Lemma 4. Let a and 3 be regular expressions. Then I((OvA)* l-Io*il = 1(c0*+0*)*{ (2) if and only if I(a*ctac*)*nli10 tI = t. (3) *It is also clear that by an appropriate coding one can obtain a regular set of arbitrarily high *-height on the two letter alphabet (0,1). 7

Proof. (2) -(3) is trivial. Suppose that (3) holds. Then clearly I(0*3*)* I-I — I(~V)* Ic C* 1. Let the word a be in I(aV3)* 1~lcO* I. Then a must be of the form alba2 where al, a2e I(caO)* I and be I. But clearly such a word alba2e I( *a * )* |. Corollary. If a and. are regular expressions, if j lo and if no element of I13 is a subword of a word of acl, i.e., if Ijrflac = $, then I(ov)* l *laa\ = We shall call an application of a star to a regular expression a (or n'* regular event 7) non-trivial in case for all positive integers n, Z f Z, where = la(. We now know by Corollary 1 that there are expressions for which the application of a star is non-trivial. However, as the next corollary shows, an application of the star may be non-trivial and yet not raise the *-height of the regular set. Corollary 2. For every positive integer k, there is a regular event E, of star height k for which the application of the star is non-trivial, but such that Lk also has height k. Proof. For k = 1, let al = A1A2 in which case I1(l = I(A1VA)*A2Vel. For k > 1, let Pk and. k be as in Corollary 1, and let j = 2 -1. Define F+1 = *k+l I - * and let =k = k3kAj VkYkAjV kPk7k7kAj so that h(Ek+l) k+l, by the theorem, and h(3tk) = k. Now I3k+l = 1kykAj I = IAjV VkIkAjY kAjVkkkkAj = IAjVTk I (simply replace Ok and yk by (EV^kk) and (OV7k7k) and multiply out). Thus by applying the Corollary to Lemma 4, 8

we obtain lik+l IA^AjJ: I = (Aj )kj e Hence =k+1 I K |AAI = I(AjkAj) v'Aj I and h (AJAj) VA) = k+l so h'l ) k+l. Note that the example KL given in this Corollary has the properties: (1) Z j for i j and (2) For any regular event ~, h( <) < h(Zi)_h # io The first follows simply because every word in TJ contains exactly j occurrences of the letter Ao2 The second follows because 21 satisfies O1 for the letters Al and A2 so that if h(2) = 0 and Q = ZL- then 2 must contain a word on the letter Al, by Lemma 3, which is a contradiction~ Thus we have an example of an event Z of height 1 which satisfies (2). I do not know of any examples of sets of greater height which also have this property. 9

REFERENCES 1. Irving M. Copi, Calvin C. Elgot and Jesse B. Wright, "Realization of Events by Logical Nets," J. Assoc. Comp. Mach., Vol. 5 (1958) pp. 181-196. 2. John Holland, "Cycles in Logical Nets," J. of the Franklin Institute, Vol. 270, No. 3, September 1960, pp. 202-226. 3. S. C. Kleene, "Representation of Events in Nerve Nets and Finite Automata," Automata Studies, (edited by C. E. Shannon and J. McCarthy), Princeton University Press, Princeton, New Jersey, 1956, pp. 3-41. 4. Robert McNaughton, and H. Yamada, "Regular Expressions and State Graphs for Automata," IRE Transactions on Electronic Computers, March 1960, pp. 39-47, Vol. EC 9, No. 1. 5. John Myhill, "Section V: Finite Automata and the Representation of Events," from: Fundamental Concepts in the Theory of Systems, WADC Technical Report WADC TR 57-625, May 1957. 10

DISTRIBUTION LIST (One copy unless otherwise noted) Asst. Sec. of Def. for Res. 2 Bureau of Ships and Eng. Department of the Navy Information Office Library Br. Washington 25, D-C. Pentagon Building Attn: Code 671 NTDS Washington 25, D. C. Bureau of Naval Weapons Armed Services Tech. Infor. Agcy. 10 Department of the Navy Arlington Hall Station Washington 25, D.C. Arlington 12, Virginia Attn: RAAV Avionics Division Chief of Naval Research 2 Bureau of Ships Department of the Navy Department of the Navy Washington 25, D.C. Washington 25, D.C. Attn: Code 437, Inf. Syst. Br. Attn: Communications Br. Code 686 Chief of Naval Operations Naval Ordnance Laboratory OP-07T-12 White Oaks Navy Department Silver Spring 19, Maryland Washington 25, D.C. Attn: Technical Library Director, Naval Res. Lab. 6 David Taylor Model Basin Tech. Inf. Officer/Code 2000 Washington 7, D.C. Washington 25, D.C. Attn: Technical Library Commanding Officer, Officer of 10 Naval Electronics Laboratory Naval Research San Diego 52, California Navy. No. 100, Fleet P. O. Attn: Technical Library New York, New York University of Illinois Commanding Officer, ONR Br. Office Control Systems Laboratory 346 Broadway Urbana, Illinois New York 13, New York Attn: D. Alpert Commanding Officer, ONR Br. Office University of Illinois 495 Summer St. Digital Computer Laboratory Boston 10, Massachusetts Urbana, Illinois Attn: Dr. J. E. Robertson Office of Technical Services Technical Reports Section National Security Agency 3 Department of Commerce Fort Geo. G. Meade, Maryland Washington 25, D.C. Attn: Howard Campaigne

Technical Information Officer Massachusetts Insto of Technology US Army Signal Research and Dev. Lab. Cambridge, Massachusetts Fort Monmouth, New Jersey Attn: D. W. Baumann Attn: Data Equipment Branch Dynamic Analysis and Control Lab. U. S. Naval Weapons Laboratory Burroughs Corporation Dahlgren, Virginia Research Center Attn: Head, Compution Div., Paoli, Pennsylvania G. H. Gleissner Attn: R A Tracey National Bureau of Standards Hermes Incorporated Washington 25, D.C. 75 Cambridge Parkway Attn: Dr. S. N. Alexander Cambridge 42, Massachusetts Attn: Mr. Reuben Wasserman Aberdeen Proving Ground, BRL Aberdeen Proving Ground, Maryland Lockheed Missiles and Space Div. Attn: J. H. Giese, Chief Compution Lab. 3251 Hanover Street Palo Alto, California Office of Naval Research Attn D. G. Willis Resident Representative University of Michigan University of Michigan 146-B Temporary Classroom Bldg. Ann Arbor, Michigan Ann Arbor, Michigan Attn: Depto of Philosophy, Prof. A. W. Burks Commanding Officer ONR, Branch Office Census Bureau John Crerar Library Bldg. Washington 25, D.C. 86 East Randolph St. Attn: Office of Asst. Director for Chicago 1, Illinois Statistical Services, Mr. J. L. McPherson Commanding Officer ONR, Branch Office National Science Foundation 1030 E. Green Street Program Dir. for Documentation Res. Pasadena, California Washington 25, D.C. Attn: Helen L. Brownson Commanding Officer ONR, Branch Office University of California 1000 Geary Street Los Angeles 24, California San Francisco 9, California Attn: Dept. of Eng.,. Prof. Gerald Estrin National Bureau of Standards Washington 25, D.C. Columbia University Attn: Mr. R. D. Elbourn New York 27, New York Attn: Dept. of Physics, Naval Ordnance Laboratory Profo Lo Brillouin Corona, California Attn: H. H. Weider

Hebrew University Wright Air Development Division Jerusalem, Israel Electronic Technology Laboratory Attn- Profo Y. Bar-Hillel Wright Patterson Air Force Base, Ohio Attn: WWRNEB Computer and Bionics Br. Massachusetts Insto of Tech. Cal iiLc].ge, MassachuZetts Laboratory for Electronics, Inc. Attn~ Prof. Wo McCulloch 1079 Commonwealth Ave. Boston 15, Massachusetts Atomic Energy Commission Attn, Dr. H. Fuller Washington 25, D Ca Attn: Div of Research, John Ro Pasta Stanford Research Institute Computer Laboratory Naval Research Laboratory Menlo Park, California Washington 25, D C. Attn: H. D. Crane Attn: Security Systems Code 5266, Mr G. Abraham General Electric Company Schnectady 5, New York Cornell University Attn: Library, L.M.E. Dept., Dept. of Mathematics Bldgo 28-501 Ithaca, New York Attn' Profo Mark Kac The Rand Corporation 1700 Main Street Dro Ac M. Uttley Santa Monica, California National Physical Laboratory Attn: Numerical Analysis Dept. Teddington, Middlesex Willis H. Ware England Hunter College Diamond Ordnance Fuze Laboratory New York 21, New York Connecticut Ave- and Van Ness St. Attns Dean Mina Rees. Washington 25, D oC ORDTL-012, Eo W Channel General Electric Research Laboratory P. O. Box 1088 U.So Army Signal Res. and Dev. Lab. Schenectady, New York Fort Monmouth, New Jersey Attn, Information Studies Section Attn: Mo Tenzer R. L. Shuey, Manager Harvard University Radio Corporation of America Cambridge, Massachusetts Moorestown, New Jersey Attno School of Applied Science Attn: Missile and Surface Radar Div. Dean Harvey Brook Sidney, Caplan The University of Chicago University of Pennsylvania Institute for Computer Research Institute of Cooperative Research Chicago 37, Illinois Philadelphia, Pennsylvania Attn: Mro Nicholas Co Metropolis, Diro Attno Dr~ John O'Conner

Stanford Research Institute National Biomedical Res. Found. Inc. Menlo Park, California 8600 16th St., Suite 310 Attn: Dro Charles Rosen Silver Spring, Maryland Applied Physics Group Attn: Dr. R. S. Ledley Northeastern University National Bureau of Standards 360 Huntington Avenue Washington 25, D. C. Boston, Massachusetts Attn: Mrs. Frances Neeland Attn: Prof. L. 0. Dolansky University of Pennsylvania Marquardt Aircraft Company Moore School of Elec. Eng. 16555 Saticoy Street 200 South 33rd Street P. 0. Box 2013 - South Annex Philadelphia 4, Pennsylvania Van Nuys, California Attn: Miss Anna Louise Campion Attn: Dr. Basun Chang, Res. Sci. Texas Technological College Varo Manufacturing Company Lubbock, Texas 2201 Walnut Street Attn: Paul G. Griffith Garland, Texas Dept. of Electrical Engineering Attn: Fred P. Granger, Jr. IBM Corporation Data Processing Systems Staff Military Products Division Department of State Owego, New York Washington 25, D.C. Attn: Dr. S. Winkler Attn: F. P. Diblasi Post Office Department Dr. Saul Gorn, Director Office of Res. and Engo Computer Center 12th and Pennsylvania Avenue University of Pennsylvania Washington 25, D. C. Philadelphia 4, Pennsylvania Attn: Mr. R. Kopp, Res. and Develo Div. Applied Physics Laboratory Air Force Research Division, ARDC Johns Hopkins University Computer and Mathemathb;::1al Sci. Lab., ARCRC 8621 Georgia Avenue L. G. Hanscom Field, CRBB, Bedford, Mass. Silver Spring, Maryland Attn: Dro H. H. Zschirnt Attn: Supervisor of Tech. Rpts. Office of Chief Signal Officer Chief, Bureau of Supplies and Department of the Army Accounts, Navy Department Washington D.C. Washington, DoC. Attn; R and D. Div. SIGRO-6D Attn: CDR. J. C. Busby, Code W3 Mr L. H. Geiger Auerbach Electronics Corporation Bell Telephone Laboratories 1634 Arch St. Murray Hill Laboratory Philadelphia 3, Pennsylvania Murray Hill, New Jersey Attn: Dro Edward Fo Moore

National Aeronautics and Space Lincoln Laboratory Administration Massachusetts Institute of Technology Goddard Space Flight Center Lexington 73, Massachusetts Greenbelt, Maryland Attn: Library Attn: Chief, Data Systems, Dive C.V.L. Smith Dr. C. R. Porter Psychology Department Federal Aviation Agency Howard University Bureau of Research and Development Washington 1, D.C. Washington 25, D oC Attn: RD-375, Mro Harry Hayman Electronics Research Laboratory University of California Mr. Donald F. Wilson Berkeley 4, California Code 5144 Attn: Director Naval Research Laboratory Washington 25, D.C. Institute for Defense Analysis Communications Research Division David Taylor Model Basin Von Neumann Hall Washington 7, D.Co Princeton, New Jersey Attn: Aerodynamics Laboratory, Code 628, Miss Cravens George Washington University Washington, DoC. Attn: Prof. N. Grisamore

UNIVERSITY OF MICHIGAN 3 9015 02539 6824