THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY1 AN EFFICIENT ALGORITHM FOR CONTAINMENT PROBLEM Vaclav RaJlch CRL-TR-9-84 JANUARY 1984 Room 1079, East Engineering Building Ann Arbor, Michigan 48109 USA Tel: (313) 763-8000'Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author.

An Efficient Algorithm for Containment Problem Vaclav Rajlich Department of Computer and Communication Science University of Michigan Ann Arbor, MI 48109

Abstract Containment problems are important problems, because verification of specifications can be formally stated as a containment problem. The paper introduces a necessary and sufficient condition for containment of two languages. Disjoint-future automata are defined, and a one-pass algorithm for the containment problem is introduced.

1. INTRODUCTION The containment problems can be characterized in the following way: Let P and Q be automata, and L(P) and L(Q) be languages they accept, respectively, then does L(P) C L(Q)? There is considerable literature which deals with containment problems; for overview, see [1,3]. It should be noted that the containment problems play a very important role in the context of the so-called specifications and top-down program design. In that situation, the behavior of data structures, subsystems, modules, etc. is specified by a specification language. The language specifies what is a correct sequence of data structure access, or what is a correct sequence of procedure calls, etc. This is particularly important in topdown program design [5,6], where the only thing known about lower modules may be the specifications, i.e. the sequencing of the procedure calls. When we want to debugg the design, then we have to rely on specifications as the only aid. For an overview of regular languages as specification languages, see [2,9]. To prove that a program satisfies the specifications means to prove that the set of all traces of that program is a subset of the specification language, i.e. it means to solve the containment problem. However in their full generality, containment problems are difficult problems; even for nondeterministic finite-state automata, the problem is PSPACE complete and hence it is intractable. Therefore it is imporant to find special cases which make the problem easier to solve. One of these cascs was investigated in [7], where it was proved that containment of regular languages in deterministic context-free languages is solvable in polynomial time. Results of this kind are important because the specifications are usually given as deterministic automata, while the analyzed program is given in its uninterpreted form, formally defined as a nondeterministic automaton.[8] 3

In this paper, we shall deal with a containment problem in situations where specifications are given as deterministic automata, and derive a necessary and sufficient condition for containment in the situations. In cases of disjoint-future automata, this condition leads to a linear one-pass algorithm for the containment problem Section 2 of the paper contains all relevant definitions and observations. Section 3 contains the main result of the paper. Section 4 contains corollaries, including the onepass algorithm for disjoint-future automata. 2. DEFINITIONS Let us start this chapter with a definition of automaton: Definition 2.1 Automaton is a 5-tuple P-= < A, K, f, 8, M>, where A is an alphabet, i.e. a set of symbol8, K is a (possibly infinite) set of states, f: K x A - 2K is transition function, 8 E K is starting state, and M C K is set of final states. The transition function can be extended in the following way: Definition 2.2 Let X denote empty word, A' denote all words over A. Whenever no ambiguity arises, denote singletons as a- {a). Then f can be extended to f: K X A _ 2K in the following way: flg,X) - g, and for a E A, Ag,w. a) = { pi there exists r E fq,w) such that p E AJr,a) } We say q is reachable from r (in automaton P), denoted r < q, iff there exists w E A' such that q E Ap,w). Note that relationship < is reflexive and transitive. If

P is obvious from the context, the same relationship will be denoted by <. If N is a set of states, then p < N means there exists r E N such that p __ r, and N < p means there exists r E N such that r < p. In the following definition, we shall define several notions: Definition 2.3 Let P= < A, k, f, 8, M > be an automaton. Then act of traces (which bring P from G C K to H C K)is defined as T(P,G,H) = { ti p E G, q E Ap,t), and q E H } Set of atates reached by P from G C K and traces T CA* is defined in the following way: If there is gE G, t E T such that fg,t) = 4,then K(P, G, T)=,,otherwise K(P,G,T)- { qj p E G, t E T, and q E Ap,t)) Language accepted by P is defined as L(P) = T(P,s,M). Future of state p E K is defined as F(P,p) = T(P,p,M). The following lemma captures a property of the previous notions: Lemma 2.1 Let P= < A, K, f, 8, M > be an automaton, and let 8 < q < M. Then for every v E F(P,q) and m E T(P,8,q), m v E L(P). Proof obvious.

In the proof of the main result, we shall use the following notion: Definition 2.4 Automaton P= < A, K, f, 8, M> is prefixed iff for every state q E K, every input a E A, 8 f q,a). In the following lemma, we show a property of prefixed automata: Lemma 2.2 For a prefixed automaton P = < A, K, f, >, M P,,,s) X. Proof: Obvious. An important subclass of automata are the deterministic automata of the following definition: Definition 2.5 Automaton P — < A, K, f,, M > is deterministic iff for every r E K, a E A, there exists at most one q such that q E f(r,a). For deterministic automata, we have the following lemma: Lemma 2.3 Let P = < A, K, f, 8, M> be a deterministic automaton, and let r e K, and v, mE A',,(s,m) = r. Then if Ar,v)=, then m' v f L(P). Proof For v = X, the lemma is vacuously true. For v 7y X, and m = X, the lemma is obviously true.

For induction step, suppose the lemma is true for m and v = a d, where a E A. Assume Ar, a)?= q, then.(q,t )=, and m * a t/d L(P) by the induction assumption. 3. THE MAIN RESULT This section contains the main results of the paper and its proof. Theorem 3.1 Let P = < A, K, f, 8, M > be an arbitrary automaton, and let Q = < A, KQ, fQ, 8Q, MQ > be a deterministic automaton. Then L(P) C L(Q) iff for every q for which 8 < q < M, 7 r F(P,q) C { w for every r E K(Q, 8q,T(P,s,q)), w E FQ,r)} Proof will be done in three steps. In the first step, we will prove that if the Theorem 3.1 is true for prefixed automata, then it is true for all automata also: (i) We will construct prefixed automata P, Q' in the following way: Let b A, 8 K,then P =< AU{b}, KU{ ),f', 89,M> where for every p E K, a E A, f' (p,a) = Ap,a) and f' (8,b) = 8. Let 8 Q KQ, then Q = < AU {b},KQU {8 Q},f' Q,8 Q, MQ> 7

where for every q E KQ, a e A, f' Q(q,a) = fQ(q,a), and f' Q(Y Q,b) = 8Q. Now let us assume that Theorem 3.1 is true for the prefixed automata, i.e. we will suppose that it is true for PI and Q. Take any qg 8, then FP',q) C {( for every r E K (Q',', T(P,, q)), w F (Q',r)} Then the following equalities are easy to verify: F1P,q)= rFP,q), F Q',r) = FQ,r), (1P,,,q) = b T(P,s,q), K(Q',, q, b T(P,8,q))= K(Q,eq,, T(P,s,q)) and this proves step (i). (ii) Suppose that for every q for which 8 < q < M, F(P,q) C {Ium for every r E K(Q, 8o, T(P, s,q)), w E F(Q,r)} -P T We will prove that L(P) C L(Q). Obviously from Definition 2.3, L(P) = F0P,s), and by the assumption, F(P,q) CI {u4 for every r E K(Q, 8Q, T(P, 8s,8)), w F(Q,r)) Then: T(P, 8, 8)= by Lemma 2.2, K(Q,sQ,X) = 8, and FIQ,Q)= L(Q), which proves this implication. (iii) In this step, we will prove the opposite implication. Proof will be done by contradiction. 8

Suppose L (P) C L(Q) and there exists q, for which s <p q <,p M and kI'l,q) ~ uwI for every r E K(Q,SQ, (11,,,q)), w clFlQ,r)}. This means that there exists v F(P,q) and r E K(Q, 8Q, T(P,s,q)) such that v F(Q,r). Let mE 7TP,s,q) such that fQ(8Q,m) = r. Then by Lemma 2.1, m v E L(P). However by Lemma 2.3, m. v 6 L(Q), which is the contradiction with the assumption. 4. DISJOINT-FUTURE AUTOMATA The Theorem 3.1 can be used for speed-up of containment algorithms. As an example, we will consider a special class of automata, the so-called disjoint-future automata of the following definition. Definition 4.1 Let P = < A, K, f, s, M > be an automaton. It is a diajoint-future automaton iff it is deterministic and r, q E K, r y q implies F(P,r) n FP,q)=. The definition is illustrated by the following example: Example 4.1 Consider the automaton P = < A, K, f, 8, M > when K is a set of all nonnegative integers, M= 8 = 0, A= <, > },and for every r2O,j - 1, f(i, < )=i+l,f(j, > ) = j-1. Then for i 4 j, F(P,i) nf F(P,j) =, and hence the automaton is disjointfuture. L(P) consists of all well-formed bracket expressions like <>, <<><>><>,etc.

For disjoint-future automata, we have the following corollary: Corollary 4.1 Let P- < A, K, f, 8, M > be an arbitrary automaton, and let Q = < A, KQ, fQ, sQ, MQ > be a disjoint-future automaton. Then L(P) C L(Q) iff for every q for which 8 < q < M, there exists a unique r E K(Q, 8Q, T(P,8,q)). Proof Suppose z 3 y and x, y E K(Q,sQ,T(P,8,q)), then by Theorem 3.1, F(P,q)= 4, and hence it is not true that 8 < q < M. On the basis of Corollary 4.1, we will define the following algorithm: Algorithm 4.1 Let P = < A, K, f, s, M > be an arbitrary automaton, and let Q = < A, KQ, fQ, 8Q, MQ > be a disjoint-future automaton. Suppose P is represented by a graph with K being the set of nodes, and f being represented by labeled arcs: if q E f(p,a), then arc <p,a,q> will be in the graph. Then the containment problem L(P) C L(Q) can be solved in the following way: 1. Find all reachable states of K, i.e. all states q such that there exists t C A' such that q E f8,t). Label each state q by a corresponding state rE KQ such that r = fQ( Q,t) 2. If any reachable state of K is labeled by two or more states of KQ, then L(P) ~ L(Q), otherwise L(P) C L(Q). The algorithm is in fact a variant of marking algorithm [4], with additional labeling of nodes (i.e. states) by states of KQ. It's complexity is 0(m) where m is the number 10

of tile arcs in the graph. Note that the algorithm is one-pass, because each arc will be traversed at most once. 5. REFERENCES [1] Hopcroft, J.E., Ullman, J.D., Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, MA. [2] Horejs, J., "Finite semantics: a technique for program testing," Proc. 4th International Conference on Software Engineering, (1979) pp.433-440. [3] Hunt, H.B.III., Rosenkrantz, D.J., Szymanski, T.G., "On the equivalence, containment, and covering problems for the regular and context free languages," J. of Computer and System Sci. Vol.12, (1976), pp.222-268. [4] Knuth, D.E., The Art of Computer Programming Vol. 1: Fundamental Algorithms, Addison-Wesley, Reading, MA 1968. [5] Rajlich, V., "Stepwise refinement revisited," to be published in The Journal of Systems and Software, Vol.4, No.l, (1984). [6] Riddle, W.E., Wileden, J.C., Sayler, J.H., Segal, A.R., Stavely, A.M., "Behavior modeling during software design," IEEE Trans. Software Engineering, Vol. SE-4, 283-292. [7] Rosenkrantz, D.J., Hunt, H.B. III, "Polynomial algorithms for deterministic pushdown automata," SIAM J. Comput., Vol.7, (1978), pp.405-412. [8] Rutlege, J.D., "On Ianov's program schemata," J. of the A CM, (1964), 1-9. [9] Shaw, A.C., "Software specification languages based on regular expressions," in Software Development Tools, Riddle, W.E., Fairley, R.E., eds. SpringerVerlag, New York (1980), pp.148-175. 11

lllllli l. 2l79lllllll lll 111111 90151 03695111 5279'11.111111111