THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY1 A ZERO-ONE LAW FOR LOGIC WITH A FIXED-POINT OPERATOR AndresaBlass, Yuri Gurevich and Dexter Kozen CRL-TR-38-84 September 1984 Room 1079, East Engineering Building Ann Arbor, Michigan 48109 USA Tel: (313) 763-8000 lAny opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agency.

- (",. ii.i~;.......

A ZERO-ONE LAW FOR LOGIC WITH A FIXED-POINT OPERATOR Andreas Blass Dept. of Mathematics University of Michigan, Ann Arbor, MI 48109 Yuri Gurevich Dept.. of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, M[I 48109 Dexter Kozen IBM Research Yorktown Heights, NY 10598

Abstract The logic obtained by adding the least-fixed-point operator to first-order logic was proposed as a query language by Aho and Ullman [AU] and has been studied, particularly in connection with finite models, by numerous authors [CH,Gu,lm,Va]. We extend to this logic, and to the logic containing the more powerful iterative-fixed-point operator, the zero-one law proved for first-order logic in [GKLT] and [Fa]. For any sentence Wo of the extended logic, the proportion of models of'o among all structures with universe 1, 2,...,n approaches O or 1 as n tends to infinity. We also show that the problem of deciding, for any p, whether this proportion approaches 1 is complete for exponential time, if we consider only'p's with a fixed finite vocabulary, (or vocabularies of bounded arity) and complete for double-exponential time if po is unrestricted. In addition, we establish some related results.

Introduction Many statements about finite structures satisfy the following zero-one law. Consider the probability that the statement holds for a structure with uriverse J1, 2,...,n and relations chosen at random. This probability approaches either 0 or 1 as n tends to infinity. For numerous examples, see [BH] and the references cited there. Glebski, et al. [GKLT] and independently Fagin [Fa] showed that every first-order sentence satisfies the zero-one law. Grandjean [Gr] showed that the problem of deciding which of the two limit values is correct for a given firstorder sentence is PSPACE complete. (We state these results precisely and review their proofs in ~1.) Kaufmann and Shelah [KS] have shown that the zeroone law is violated badly within monadic second-order logic. We extend the zero-one law to sentences in the logic obtained by adding to first-order logic the least-fixed-point operator studied in [AU,CH,Im,Ko,Va] or the more powerful iterative fixed point operator [Gu,Li]. We show that any formula in these extended logics is equivalent, in random structures (i.e. with probability approaching 1 as the structures get larger), to a first-order formula. This result, which immediately implies the zero-one law, contrasts with the wellknown fact that the least-fixed-point operator greatly increases the expressive power of first-order logic. Contrary to what one might expect, our equivalence result does not allow us to transfer PSPACE completeness of the theory of random structures from first-order logic to the logics with fixed-point operators. The difficulty is that the translation process, from the extended logics to first-order logic, can vastly increase the length of formulas. This difficulty cannot be overcome without proving PSPACE=EXPTIME, for we show that the decision problem for the theory of random structures in logic with a fixed-point operator is EXPTJME complete.

1-2 The structures in the isomorphism class given by the theorem will be called random structures, and their first-order theory will be called RANDOM(a). It is easy to see that, if a' is another vocabulary, then RANDOM(a) and RANDOM(a') contain the same sentences of the common vocabulary a n a', so it makes sense to say that a sentence is in RANDOM without specifying a. Proof sketch. To describe Gaifman's axiomatization of RANDOM(a), we first introduce some terminology. For a finite list v = v...,v of distinct variables, a simple P-formula is an atomic formula, with variables from the list'U, and not involving the equality symbol. (Thus, every atomic formula is either a simple P-formula for some P or an equation between two variables.) A complete quantifier-free description for v, or just a v-description, is a conjunction of simple P-formulas and negations of simple P-formulas such that, for every simple P-formula ac, exactly one of a and -a occurs as a conjunct. If w is a variable distinct from the vi's, then a v,w-description E extends a P-description D if every conjunct of D is also a conjunct of E. Every such pair D,E (for every v and w) gives rise to one of Gaifman's axioms: W [(.Avi wvyAD(v)) A D 3w (Av% W AmE(vlw))] An easy computation shows that every such axiom holds in almost every structure with universe ). A back-and-forth argument shows that every two countably infinite models of these axioms are isomorphic. These facts suffice to establish Theorem 1.1 once we observe that no finite structure can satisfy all the Gaifman axioms. (We could avoid this observation by adjoining to the Gaifman axioms the sentences which assert the existence of at least n objects, for each n.) We record, for future reference, two corollaries of the proof of Theorem 1.1.

Corollary 1.2. The theory RANDOM(a) is No-categorical. (Recall that this means that all models of the theory of cardinality 80 are isomorphic.) Corollary 1.3. The theory RANDOM (a) is complete and recursively axiomatized, hence decidable (uniformly in a). The decidability result could also be obtained, with a better decision procedure, by an effective elimination of quantifiers. In fact, Grandjean has shown that we can do considerably better yet. Theorem 1.4. [Gr]. The decision problem for RANDOM(a) is PSPACE complete (with respect to PTIME reduction). We remark that the degree of the polynomial that bounds the space required by the decision algorithm in [Gr] increases with the arity of the relation symbols in a. It is thus essential that we are dealing with a fixed finite vocabulary a. Proof. We begin with some preliminary facts. First, the same back-and-forth argument as in the proof of Theorem 1.1 shows that, if M is a random structure and x and V are lists of distinct elements satisfying the same complete quantifier-free description (for a suitable list v of variables), then there is an automorphism of M sending x to y. It follows that, if D(u) is a v-description and x is a tuple of distinct elements satisfying D in some rand um structure M, then, for any formula 6(v) with free variables among u, j satisfies 6 in M if and only if all solutions of D by distinct elements in all random models satisfy A, i.e., if and only if A vivj^AD(v ) implies 5(v) in the theory RANDOM This implies the following equivalences which will be crucial for the correctness of our algorithm; for brevity, we write "D(v) > -(5)" for'i ( A vi fvui^D(v5) -, (v)) is provable in RANDOMY." i<3

D(v) X -- (v) if and only if D(U) 74f1(v) R R D(i-) -> i1 (9) v 132(u) if and only if D(u) R -1 (v) or R R D (v) - >'62 R D(U) — > 3wi5(U,w) if and only if R either E('U,w) - 5(v',w) for some extension E of D or D(v) - i5(U,vi) for some vi in the list v (T'he two cases in the last equivalence distinguish whether or not w equals one of the v's.) The second preliminary fact that we need is a bound on the number of'descriptions in terms of the length I of the list vU, the number m of relation symbols in a, and the maximum arity r of these symbols. Each relation symbol yields, when we assign it arbitrary tuples from v' as arguments, at most Ir simple U-formulas. So there are at most m lr simple v'-formulas and therefore at most 2mt' U -descriptions. The last preliminary fact is the result of [CKS] that PSPACE is equivalent to APTIME, so it suffices to give an alternating PTIME Turing machine to decide membership of sentences in RANDOM(a). It is convenient to describe first an alternating PTIME algorithm which decides, for any vU, any v-description D(U), and any formula d(U) with free variables among vU whether D ->. We may assume that disjunction and negation are the only connectives and 3 is the only quantifier in'. The algorithm proceeds recursively as follows. If 5 is -o, the machine enters a negation state and computes whether D -- SO. If o is 51 v 2, R

the machine enters an existential state with two successors q and q2; in qi it computes whether D — >.. If iP is 3wa('v,W), the machine enters an existential R state with two successors ql and q2. From q1, it goes through a sequence of Rog2lj existential states, guessing an i E I1,., t; then it computes whether D po (v,vi). From q2, it goes through a sequence of (at most m r') existential R states, guessing an extension E(i,w) of D(vU); then it checks whether E -> -o. If i5 is a simple vU-formula, it accepts or rejects according to whether e or -'G is a conjunct of D(v). Finally, an equation between variables is accepted if the variables are the same and rejected otherwise. To decide whether a sentence t is in RANDOM(a), we apply this algorithm to decide whether TRUE > 6, where TRUE, the empty conjunction, is the unique complete quantifier-free description for the empty list of variables. Our first preliminary fact shows that this alternating algorithm gives correct answers. Our second preliminary fact shows that it operates in polynomial time, since m and r are fixed (by a) and I is bounded by the length of the input. Our third preliminary fact allows us to convert the algorithm into a deterministic one that operates in polynomial space. So RANDOM () c PSPACE. For the other half of Theorem 1.4, we give a polynomial time computable reduction of the decisibn problem QBF for quantified Boolean formulas to the decision problem for RANDOM(a). Since QBF is known to be PSPACE-complete [St], this will suffice to complete the proof. To simplify notation, we assume that a contains a unary predicate symbol P; if it contains only non-unary symbols, just replace variables with sequences of variables in what follows. The idea of the reduction is simply to let elements satisfying (resp. not satisfying) P act as surrogates in RANDOM(a) for the truth value 1 (resp. O) in

1-6 QBF. More precisely, define for each quantified Boolean formula so a corresponding a-formula p' as follows. For a propositional variable pi, let p'i be P(vi). Let (r-o)' be -('), and similarly for the other connectives. Let (3pi)') be 3vj(.'), and similarly for V Then, for every (p 1,... Pt) with free propositional variables among P l.., p, is true under the truth assignment f:,..,:t-4 0,1i (i.e. whenpi has value f (i)) is and only if the sentence 3u0o3u (-P(uO) A P(u1) A'(uf(l)U... Uf ())) is in RANDOM. (The proof of this is a straightforward induction on ao.) In particular, a sentence pW (with no free propositional variables) is in QBF if and only if Ao' is in RANDOM. This is the desired reduction, so the proof of Theorem 1.4 is complete. Until now, we have treated only countably infinite structures. However, the theory RANDOM can also be used to provide information about large finite structures, despite the fact that no finite structure can satisfy all of Gaifman's axioms. For any sentence so and every positive integer n, let FRACTION ('n,n) be the quotient of the number of models of'o with universe }1, 2,..,n by the total number of o-structures with this universe. We shall be interested in the behavior of FRACTION (fp,n) as n tends to co (with po fixed). This behavior indicates the probability that'p is true in a randomly chosen large finite structure. Theorem 1.5. [Fa,GKLT]. If the sentence'p is in RANDOM (a), then FRACTION (po,n)-4l as n-ra. If'p is not in RANDOM (a), then FRACTION ('p,n)-O as n -,. Corollary 1.6. (Zero-one law). For any first-order sentence'p, lim FRACnt nd equl zero or oe. TION ('p,n) exists and equals zero or one.

Notice that this theorem and corollary would be false if we had permitted zer'o-place relation symbols in a, for such a symbol would be a so with FRACTION (fo,n) = - for all n. Allowing constant symbols would lead to similar counterexamples. Sketch of proof of Theorem 1.5. The second sentence follows from the first, since if RANDOM (a) doesn't contain go it must contain -sp, by completeness (Corollary 1.3). So we need only prove the first assertion. Let so be a sentence in RANDOM (a). If ro is one of Gaifman's axioms, then straightforward estimates, which we omit, show that FRACTION (sp,n)- 1. In the general case, V is a logical consequence of finitely many Gaifman axioms, say cxl,..., xk. Any structure not satisfying op must also violate at least one of the ait's. So 1-FRACTION (Ion) ~< E (1-FRACTION (o,nj). i-= Since each summand on the right side approaches 0, we see that FRACTION (-p,n)-4 1 when n-moo, as desired. We note for future reference that the last part of the proof actually establishes the general fact that the property "FRACTION (o,n)-l as n-o.A" is preserved by logical consequence. We also note that the definition of FRACTION (o,n) makes sense for any sort of sentence W (with a well-defined semantics), not just for first-order so. The preservation of "FRACTION -41" under logical consequence also continues to hold as long as the number of premises is finite.

2-1 2. The iterative fixed point In this section, we introduce the two extensions of first-order logic that we shall study. These extensions admit formulas defining the least fixed point of a monotone operator or the iterative fixed point of an inflationary operator. Both of these fixed-point constructions have been extensively studied in recursion theory under the names of "monotone" and "non-monotone" induction, respectively; see, for example, [Mol,Mo2,Sp]. Much work has been done on logics involving the least-fixed-point operator, sometimes called At-calculi; see [AU,BR,CH,HP,Ko,Pa,Ro,SB]. The extension of first-order logic by the iterativefixed-point operator was introduced in [Gu]; a similar concept occurs in [Li]. Given a structure S of vocabulary a and a natural number 1, consider an operator 7r which associates with each 1-ary predicate P on S a new I-ary predicate rr(P) on S. Any formula;p of the vocabulary ~uuP! (where P is a new -ary predicate symbol), with free variables v1,.., vI, defines such an operator ir in. every a-structure S r(P) = x E: St Ij T holds of x in the structure (S,P)t If op has additional free variables, they can be viewed as parameters; an operator Ir is obtained for any fixed values in S of these parameters. A relation P is a fixed point of or if ir(P) = P. The operator iT is monotone if, for all t-ary relations P and Q on S, P Q implies r(P) c 7r(Q). For monotone operators, a classical construction [Kn,Ta] provides a least fixed point, n cP X s Ir(P) c Pi This least fixed point is also obtained by the transfinite inductive construction [SP]

2 — Po= = Pa+1 = 7T(Pa) PX = PU Pa Ia<X for limit ordinals X. It is easy to check that xa<f implies PO C Pa, so there must be an ordinal number a (of cardinality at most that of St) with Pa = Pa+,. This Pa is the least fixed point of Tr. Deciding whether a given first-order formula defines a monotone operator is difficult in general. It is shown in [Gu] that this problem- is (recursively) unsolvable, that it remains unsolvable if one restricts attention to finite structure, and that there is a formula so such that the problem of deciding whether o defines a monotone operator on a given finite structure is co-NP-complete. There is, however, a simple syntactic condition on co that implies monotonicity of the associated operator: P should occur only positively in so (that is, under an even number of negation symbols). This observation motivates the Least Fixed Point Formation Rule: Let,o be a formula with only positi ve occurrences of the I-ary predicate symbol P, let vl,..,vL be distinct variables, and let ui,..., u be variables. Then (U11, U...,Z) c (LFP P, vl., v,) 9 is also a formula. The vocabulary of the new formula consists of all symbols except P from the vocabulary of go. The free variables of the new formula are ul,...,u and the variables other than v1,..., v that are free in s. (Thus, LFP binds P, v v,,., v,1) An-occurrence of a predicate symbol (other than P) in the new formula is positive ( resp. negative) if it is so in a. The new formula is true (in a structure S with specified values for its free variables) if and only if the l-tuple of values of u 1, i.t,1Z belongs to the least fixed point of the operator defined by

Z-3 s (and v1,.. v, using the specified values for the other variables as parameters). It may be worth remarking that, if we had allowed vocabularies to contain function symbols, then the u's in the new formula would have been allowed to be arbitrary terms and the definitions of vocabulary, free variables, and truth of the new formula would have been modified accordingly. For non-monotone operators, the sequence of predicates Pa need not stabilize and may therefore yield no fixed point. There is, however, another condition on rr that suffices to ensure that P Cz P, for a<o and therefore that P = P,,,+ for some a. This condition is that the operator be inflationary, which means that P: rr(P) for all P. (The terminology is due, as far as we know, to Freyd [Fr] and was first used in the present context in [Gu].) Although such an operator need not have a-least fixed point, it has a canonically defined fixed point, namely P, for any sufficiently large a; we call this the iterative fixedpoint. Deciding whether the operator defined by a formula Co is inflationary is, in general, difficult in the same senses (and by virtually the same proof) as for monotonicity. However, any operator Tr can be easily transformed into an inflationary operator, P H> P u 7r(P), which agrees with ir if ir happens to be inflationary already. This observation motivates the semantics of the following rule. Iterative Fixed Point Formation Rule: Let ip be a formula, P an 1-ary predicate symbol, v 1,..., distinct variables, and u,.., ul variables. Then (2u1.,U) C (IFPP, vl.. v. E) is also a. formntla. The syntactic properties of this new formula are the same as for LFP. The formula is true if and only if the 1-tuple of values of uL,..-, ut belongs to the iterative fixed point of P H- Pur(P), where ir is the operator defined by so. If ir is inflationary or monotone, then this is the iterative fixed point of i. In

2-4 particular, if P occurs only positively in p, then IFP and LFP agree, so the logic FO + IFP obtained by adding the iterative fixed point formation rule to the formation rules of first-order logic subsumes the logic FO + LFP. It will be useful to have a notation for the stages of the iteration leading to the iterative fixed point. Iteration Stage Formation Rule: For so, P, U, u as in the preceding two formation rules and for a an ordinal number, (ul, U..,J1) E (at P., w, l) so is a formula. The syntactic properties of the new formula are as for the preceding two rules; truth is defined as for IFP but with "the ah stage Pa" in place of "the iterative fixed point." We shall need this formation rule only for finite x, and we adopt for complexity-theoretic purposes the convention that the new formula should contain a written in binary notation. For finite ax, a steps in the iteration of a first-order definable operator can be carried out in first-order logic. Thus, the new formula u c (aP,Uv) is equivalent to a first-order formula if so is, so the iteration stage formation rule adds no expressive power to first-order logic unless at is infinite. Indeed, u c (0 P,'U)$ is always false, and u z (k+ 1 P,v)o is equivalent to the result of first substituting U for v in go (renaming bound variables if necessary), then replacing every subformula of the form P(.i) (for arbitrary 4) with w E (kP,')Wo, and finally forming the disjunction of the result with u E (kP,)go. Recursive application of this procedure lets us reduce any formula in first-order logic with the iteration stage rule for finite ax, FO+IS, to a first-order formula. It should be noted, however, that this translation process may result in a formula vastly longer than the initial formula. Thus, although the iteration stage rule for finite

2-5 a does not increase expressive power, it does affect complexity.

3. The zero-one law for first-order logic with iterative fixed point The purpose of this section is to extend the zero-one law, Corollary 1.6, from first-order logic to the stronger logic FO + IFP introduced in ~2. Theorem 3.1. Let ~o be a formula of FO + IFP with vocabulary a. There exist a first-order a-forinula po' and a finite subset S of Gaifman's axiom set for RANDOM(o) such that WP and o' are equivalent in all models of S. Corollary 3.2. ( Zero-one law) Let ~o be a sentence of FO + WFP. Then lim FRACTION(cp,n) exists and equals zero or one. n oo Proof of corollary. Let AP' and S be as in the theorem. Since each sentence in S has FRACTION — 1, and since either FRACTION(',n) — > 1 or FRACTION( -',n) ->1 by Corollary 1.6, we have that either so, which is a logical consequence of Supo'l, or -'p, which is a logical consequence of Su -5'!, has FRACTION -- 1. Proof of Theorem 3.1. Let a be the vocabulary of so. Since the theory RANDOM(a) is Ho-categorical, we need only invoke Theorem 1 of Appendix 3 of [Gu]. For the sake of completeness, we sketch the proof. We proceed by induction on the depth of nesting of IFP in Ao. The only nontrivial case is that o is?7c(IFP P,')f. We cannot apply the induction hypothesis to' since ~ involves P (whose interpretation is certainly not random here), but we can apply it to any finite stage in the iteration of i. More precisely, define, for each natural number k, a first-order formula pk (ui) that does not contain P and is equivalent to uEc(kPP,v)4 in all models of a certain finite subset Sk of Gaifman's axioms. For k-0, let Vp(QU) be FALSE. Let'k+l(U) be obtained from i by first substituting u for v, then replacing every subformula of the form P(Q) with k(iu), then applying the induction hypothesis to get an almost equivalent first-order formula, and finally forming the disjunction with Pk(U).

3-2 Here "almost equivalent" means "equivalent in all models of enough of the Gaifman axioms." Comparison with the discussion at the end of ~2 shows that the $''s have the desired properties. In a countable model M of RANDOM(a), each'k(i ) is equivalent to u Ec (k P,v)V, since all of Gaifman's axioms hold. In particular, we have the monotonicity property that each 4k implies the next, +l1, in M. Since RANDOM(a) is H0-categorical, a version of Ryll-Nardzewski's theorem (Theorem 2.3.12(e) in [CK]) asserts that there are only finitely many inequivalent (in M) formulas with a fixed finite set of free variables; in particular, some /k and ah with k <l must be equivalent in M. Then +k is equivalent to A,+1 in M, because of the monotonicity property. This equivalence, being a first-order statement true in Ml, is deducible from finitely many Gaifman axioms. In any model of these finitely many axioms plus the finitely many more needed to ensure that k (u) and f4k+l(U) are equivalent to u c (k P,U),i and f El (k+1 P,&)k respectively, all these equivalences together ensure that the iteration defining so stops after the kth step and that fo is equivalent to 3k. This completes the proof of Theorem 3.1. We have presented this proof in a form applicable to any S0-categorical theory. For the particular theory RANDOM(a), we can obtain an improvement, which will be useful in ~4, by replacing the use of Ryll-Nardzewski's theorem with the more explicit bounds obtained, as the second preliminary fact, in the proof of Theorem 1.4. In that proof, we saw that there are at most 2m'r -vdescriptions, where I is the length of v and m and r are determined by a (the number of predicate symbols and the maximum arity). Since tuples satisfying the same complete quantifier-free description are related by an automorphism of M (the first preliminary fact in the proof of Theorem 1.4), they satisfy the same formulas of FO + IFP. If follows that, as k increases, the sequence of predicates defined by i c (k P,v) cannot strictly increase more than

3-3 p(l) = 2m'l times, where 1 is the number of free variables of t. Thus, in u E (IFP P,1)4', we can replace IFP by p(l) (or any larger number). Doing this systematically, we can transform any FO + IFP formula fp into an equivalent (in M) formula of FO + IS in which the iteration stage formation rule is applied only with a = p(l), where I is the number of variables in so. As we saw at the end of ~2, this FO + IS formula is equivalent (in all structures) to a first order formula o". Since this Io" and the S~' obtained in the proof of Theorem 3.1 are equivalent in the countable models of RANDOM(a), they are also, by compactness, equaivalent in all models of a certain finite set of Gaifman axioms. Therefore, so" has the property asserted of so' in Theorem 3.1. The following proposition summarizes this discussion for future reference. Proposition 3.3. The i' in Theorem 3. 1. can be taken to be the first-order translation of a formula of FO + IS in which the stages mentioned are all 2m7T, where 1 is the number of variables in ~p,

4-1 4. The complexity of the FO + IFP theory of random structures The proofs of the zero-one laws for first-order logic and for FO + IFP show that a sentence is true in random (countably infinite) structures if and only if it is true in almost all finite structures in the sense that FRACTION(o,n) -4 1 as n? > We say that a sentence with these equivalent properties is almost surely true. This section is devoted to determining the complexity of the decision problem for almost sure truth of sentences in FO + LFP and FO + IFP. Recall that for first-order logic this problem is PSPACE complete (Theorem 1.4). Theorem 4.1. The decision problem for almost sure truth of FO + IFP sentences can be solved by anr atternating Turing machine in polynormial space. Proof. The machine proceeds according to the following algorithm. Given an FO + IFP sentence go with I variables, it replaces every occurrence of IFP with p where p = p(l) = 2m'tT and m and r are, as before, the number and the maximum arity of predicate symbols in a. By Proposition 3.3, this replacement almost surely does not alter the truth value of Ao. Since p is written in binary notation, and since I < length of po, the space required is polynomial in the length of s. The rest of the algorithm is exactly like that in the proof of Theorem 1.4, with the following additional steps to handle the iteration stage formation rule. To decide whether D-il E (k+1 P,P)+, where D is a complete quantifier-free description for appropriate variables, decide instead whether D->' where G is obtained from Vt by substituting gi for v, replacing every P(LJ) with IL C (k P,'U)~, and forming the disjunction with u c (k P,v)ir. (See the end of ~2). Finally, to decide whether D->i e (O P,U)-, reject. The algorithm never needs to deal with complete quantifier-free descriptions for more than vari

4-2 ables, so every description it uses is a conjunction of at most m lr simple formulas and negations of simple formulas. Thus, the machine needs only polynomial space to record descriptions and iteration stages. Its other storage needs are comparatively minor, so it operates in polynomial space, and the theorem is proved. In the following corollary, EXPTIME refers to deterministic Turing computation with a time bound 2J(n) where f is a polynomial (not necessarily linear) function of the input length n. Corollary 4.2. Almost sure truth of FO + IFP sentences is decidable in EXPTIME. Proof. EXPTIME is equivalent to alternating PSPACE, by [CKS]. Theorem 4.3. The decision problem for almost sure truth of FO + LFP sentences is EXPTIME hard, Proof. Because of the result of [CKS] just quoted, it suffices to reduce, to the decision problem for almost sure truth of FO + LFP sentences, every language recognized by an alternating Turing machine that operates in polynomial space. For simplicity, we assume that our alternating machines have only universal and existential states, not negation states; it is shown in [CKS] that this assumption causes no loss of generality. Let M be such a machine, and let S(|w i) be a polynomial in the length of its input w that bounds the space used by M. We show how to compute, in polynomial time, from any given input w a sentence A, in FO + ULP such that qJ is almost surely true if and only if M accepts'w. To construct ca, we use the well-known fact [Ck] that the activity of a Turing machine can be described by a string of truth values (or bits). In the present situation, the computation of M on input w is too large to be useful for our purposes, but each configuration (instantaneous description) of M can be

coded by a string of length polynomial in w |. For concreteness, we adopt the convention that, if M has s states and a tape symbols, then the bit strings have length s + (a + 1) S( ]w i) and consist of the truth values of the following statements about the configuration: "M is in state q " for each of the s states q, "M is scanning square n" for each of the S( Jw 1) relevant squares, and "The symbol in square n is Z" for each of the S(l w i) relevant squares and each of the a symbols Z. Of course, a string corresponds to a configuration only if it satisfies some obvious consistency conditions: M is in exactly one state, scanning exactly one square, and each square has exactly one symbol in it (when the blank counts as a symbol). As in the proof of Theorem 1.4, we may assume for notational simplicity that a contains a unary predicate symbol Q. For each input word w, we can easily construct, in polynomial time, first-order formulas INITIALw(.i), UNIVERSAL, (x ), EXISTENTIAL, ( ), YES,, ( ), and S UCCESSORW (x, ), in which x and yj are sequences of l=s+(a+1)S( 1w I) variables, and which assert, respectively, that the string of bits Q(x) = Q(zx).. ~ (xz) codes the initial configuration with input w, that Q(Y) codes a configuration where M is in a universal state, that Q(x) codes a configuration where Mi is in an existential state, that Q(x) codes a configuration where M is in an accepting (terminal) state, and that M can go in one computation step from the configuration coded by Q(Y) to that coded by Q(y). Of these five formulas, the last four depend on w only through the dependence of I on the length of w. Recall that, by the definition of the way alternating Turing machines operate, the set of configurations that M accepts is the smallest set A such that (i) A contains every configuration in which M is in an accepting terminal state, (ii) A contains every universal configuration all of whose successors are in A, and (iii) A contains every existential configuration at least one of whose successors is in A. This means that the set of x for which Q(?) codes an accepting

44 configuration is definable by ACCEPTW (x) back. arrows x E (LFP P,i)(o where o is YESW (U) v (UNIVERSAL, W() Vz(SUCCESSOR. (v,z) - P(i))v (EXISTENTIAL (&) A 3 z (SUCCESSORW (U,Z) A P(z)) This definition works in any structure where at least one element satisfies Q and at least one does not, so that every code occurs as Q(z) for some z; no other properties of random structures are needed for this proof. Finally, we have that M accepts w if and only if the following sentence 63, is almost surely true: (INITIAL, ( ) A ACCEPTw (x)) Since s, can clearly be written down in polynomial time when w is given, Theorem 4.3 is proved. Corollary 4.4. The decision problems for almost sure truth in FO + LFP and FO + IFP are EXPTIME complete. Proof. Combine Corollary 4.2, Theorc n 4.3, and the fact that IFP subsumes LFP.

5-1 5. Unbounded vocabulary In the preceding sections, we have worked with a fixed finite vocabulary a. The number m of relation symbols in a and the maximum arity r of these symbols played an important role in the proof of Theorem 4.1, where we constructed an alternating Turing machine that decides in space roughly proportional to m lr, whether a FO + IFP sentence ~o with I variables is almost surely true. If we remove the restriction to a fixed vocabulary a, then m and r are no longer constant. They are, of course, majorized by the length of go, but the space bound so obtained for our algorithm is exponential, rather than polynomial, in the length of'p because r occurs as an exponent. (If a is allowed to vary with r bounded, then the complexity estimate of Theorem 4.1 remains correct.) Thus, the method of Theorem 4.1. gives only the following upper bound for the complexity of the FO + IFP theory of random structures, with no restrictions on the vocabulary a. Theorem 5.1. The FO + IFP theory of random structures is decidable in alternating exponential space.D By [CKS], alternating exponential space is equivalent to deterministic double-exponential time. The purpose of this section is to prove that Theorem 5.1 is optimal, i.e., that RANDOM is complete for alternating exponential space. The proof is similar to that of Theorem 4.3, the differences being that the space bound S is now exponential rather than polynomial and that we can (and must) use relations of high arity for the coding of machine configurations. Theorem 5.2. The decision problem for RANDOM in FO + LFP is hard, with respect to polynomial time reductions, for alternating exponential space. Sketch of proof. As in the proof of Theorem 4.3, let M be an alternating Turing machine with only universal and existential states, and let the space it

needs, for input wU, be bounded by S( w I), where now S(n) = 2pTn)' and p is a polynomial. To code configurations of M, we use s unary relations STATEq, one for each state q of M, a p (lw I)-ary relations CELLz, one for each tape symbol z of M including the blank symbol 0, and one additional p(l w )-ary relation IHEAD. We now define what it means for a pair of distinct elements x,y in a structure for this vocabulary to code a configuration C of M. In the definition, we let t stand for a tuple of length p (1w ) each of whose components is x or y; the 2P(Iw ) = S( tw 1) such tuples are lexicographically ordered (with x preceding y), and the ith tuple in this ordering is to be considered a name of the ith tape cell of M. Then (x,y) codes C if, for all q,z,t as above, (1) STATEq(x) holds if and only if q is the state in C, (2) HEAD(t) holds if and only if t names the square scanned in C, and (3) CELLz (t) holds if and only if the square named by t contains the symbol z in C. Of course, a pair (x,y) codes a configuration only if the truth values in (1), (2), and (3) satisfy appropriate consistency conditions. We use randomness to ensure (via finitely many Gaifman axioms) that every configuration has a code. Pairs (x,y) will play the same role in the present proof that tuples ~ played in. the proof of Theorem 4.3. We shall define INITIALw(x,y), UNIVERSAL. (x,y), EXISTENTIALw (x,y), YES,(x,y), and SUCCESSOR. (x,y,x,y') with the same meanings as the formulas with the same names in the proof of 4.3; once this is done, the construction of 6w. is exactly as before. The formulas INITIALw(x,y), etc., can be produced by the well-known methods of [Ck] once we have formulas describing the way tape squares are named. We shall exhibit formulas NAMEw (x,y,u), <w (x,y,,v), NEXTw(x,y,'Uu), and FIRSTw(x,y,i) which assert that, with respect to x and

y, u names a tape square, the square named by ui is to the left of that named by v, the square named by u is immediately to the left of that named by if, and u names the leftmost square, respectively. In all these formulas, U and v are p(lw l)-tuples of variables. For readability, we shall suppress the subscript tw and the arguments x,y, we shall write < between its arguments, and we adopt the convention that i and j range from 1 top(lw 1). NAME (u) > Ai (ui = v u =y) u < v <- Vi(ui=x A v=y jiuj =vj) NEXT(u,v) <-> v< A - 3 t (NAME(t )Au <tAt <V) FIRST(U) - Ai += Now we are in a position to define INITIAL, UNIVERSAL, EXISTENTIAL, YES, and SUCCESSOR. Since the ideas involved are quite standard, we define INITIAL as an example and leave the rest to the reader. In this definition, q ranges over the states of M, go is the initial state, k ranges from 1 to Iw, Wk is the kth letter in the word w, 0 is the blank symbol, and U, u'.wl are p(lw )tuples of variables distinct from each other and from x and y. INITIALw (x,y) < STATEq0 (x) AAq,qo -STATEq(X) A 3u' 3 1. IW I (FIRST(U1) A WJ((HEAD (')ANAME (v))'- >V =,')A Ankc< I ( NAME (,k +l )^ANEXT (Ut,Uk )) ^ Ak CELLWk (u k) J ((NAME (v)^A - (v =U)) - CELL O(U)) UNIVERSAL, EXISTEpNTIAL, and YES are much easier to define; they refer only to STATEq predicates. SUCCESSOR is more tedious but straightforward. Once

these definitions are available, we can produce An as in the proof of Theorem 4.3, thereby completing the present proof.0 The coding technique used in the preceding proof and the technique used in the well-known proof [HU] of Stockmeyer's theorem that QBF is PSPACEcomplete can be combined to show that the decision problem for the first-order theory RANDOM is EXPTIME-hard. This decision problem is solvable in AEXPTIME = EXPSPACE by means of the algorithm in the proof of Grandjean's Theorem 1.4 above. The problem of determining the exact complexity of RANDOM is open.

6-1 6. Fixed-points of bounded arity In this section, we reinstate the assumption that we are working with a fixed finite vocabulary a, and, in addition, we restrict the arity of the predicates that we allow IFP or ULP to define. More precisely, let FO + IFPk, where k is a nonnegative integer, be the logic obtained by adding to first-order logic the IFP formation rule subject to the constraint that IFP can be applied only to formulas with at most k free variables. We show that the complexity of the decision problem for almost sure truth in this logic is, for each fixed k, the same (modulo PTIME reductions) as in first-order logic. Theorem 6.1. The decision problem for almost sure truth of FO + IFPksentences is, for each fixed k, PSPACE-complete. Proof. In view of Theorem 1.4, it suffices to give an algorithm for solving the decision problem in polynomial space. As in Section 4, we proceed by extending the algorithm in the proof of Theorem 1.4 to cover formulas involving tFP. This time, however, it will be convenient to work with the deterministic polynomial space version of the alternating polynomial time algorithm of 1.4. This deterministic version, as constructed in [CKS], is essentially a systematic depth-first search of the computation tree of the alternating algorithm; its space requirements are only polynomial because it keeps track of only the choices made by the alternating machine on the branch leading to the node currently being simulated and it re-uses the space previously used for computations from other branches. To handle IFP, we expand the algorithm as follows. Before beginning its actual computation, the machine writes, on a portion of the tape that will not be needed otherwise, a list of all complete quantifier-free descriptions D(ii) for some lists i of variables, one list of each length k. It leaves a little space after each D(w) to allow descriptions to be marked later in the algorithm; this mark

6-2 ing space after each D(v) is to consist of as many tape squares as the maximum depth of nested IFP's in the formula ~o to be tested. Since the number of udescriptions is at most El_,k2m'rl, independently of Ao, the space used here is linear in ]ICP After these preparatory steps, the algorithm begins to operate like the deterministic version of the one in 1.4. When it encounters an IFP operator, of depth d (measured from the outside), it proceeds to mark tin the dth square of each space provided for this purpose) those v-descriptions D(vu) such that the tuples satisfying D(v) also satisfy the predicate defined by this IFP. It does this by starting with all'-descriptions unmarked in the dth space, erasing, if necessary, any marks already there (corresponding to PO = 0 in the definition of iteration stages) and following the definition of the stages P, to mark D(P)'s as the tuples satisfying them enter the predicate being inductively defined. At the nth stage, the descriptions of tuples in P,_ are already marked, and the algorithm evaluates the formula V to which IFP was applied, for tuples satisfying the remaining descriptions, using the currently marked descriptions to interpret P. Any tuples found to satisfy'/ are marked at the next stage, for they belong to P,. The evaluation of -t may, of course, involve further uses of this marking procedure, if.P involves IFP's. The maximum number of marking processes that ever proceed simultaneously during execution of the algorithm equals the maximum nesting of IFP's in so, which is why we provided this much space for markings. 0 Note that it was essential to this proof that not only the number of free variables in the scope of an IFP be bounded (by k) but also that the vocabulary be fixed so that m and r are constants. With unbounded vocabulary, replacing IFP with IFPk does not improve the complexity estimates. To see this, simply observe that, in the proof of Theorem 5.2, IFP was applied only to a formula with

6 -3 just two variables. The restriction on the lFP formation rule in FO + WfPk bounds not only the number of variables bound by IFP but also the number of additional variables (parameters) in the formula to which IFP is applied. We do not know the complexity of the decision problem for almost sure truth in a logic where only the number of variables quantified by iEF' is bounded while the number of parameters is unrestricted.

7-1 7. Additional remarks We pointed out in the proof of Theorem 4.3 that the coding of alternating Turing machines used there can be done in any structure where at least one element satisfies Q and at least one element does not satisfy Q; no further use is made of randomness. In particular, we could take the structure to be the twoelement Boolean algebra and take Q(z) to be x=1. Thus, if we extend the theory QBF of quantified Boolean formulas by adjoining LFP to the logic, the resulting theory is EXPTIME hard. In fact, so is the FO + LFP theory of any structure with more than one element, since we can use the binary predicate of equality in place of the unary predicate Q. It is easy to check that the FO + LFP and FO + IFP theories of the two-element Boolean algebra or of any non-trivial set (with only the equality predicate) are decidable in EXPTIME and are therefore complete for this class. The results proved in this paper for general structures also hold for certain restricted classes of structures, for example undirected graphs (=irreflexive symmetric binary relations). A zero-one law for the first-order theory of almost all structures in a first-order definable class can be transferred to the FO + IFP theory by our methods provided the almost surely true first-order sentences constitute an Ho-categorical theory. If we have, in addition, effective estimates for the number of inequivalent types of t-tuples (a number that is finite for each I by Kyll-Nardzewski's theorem) and effective ways of describing these types, then our methods also provide upper bounds on the complexity of the decision problem for almost sure truth. All of these apparently stringent hypotheses are satisfied in the case of undirected graphs and in the case of simplicial complexes (of arbitrary but fixed dimension). Although we worked with finite structures with a fixed universe t1, 2,.'. n3 (labeled structures), our results apply also to isomorphism classes

7-2 (unlabeled structures). Indeed, if FRACTION(~o,n) were defined using numbers of isomnorphism classes in both the numerator and the denominator, the new numerator and denominator would be asymptotically equal to the old divided by n! and the value of FRACTION would thus be asymptotically unchanged, because almost all structures have no non-trivial automorphisms.

R-1 References [AU] A. Aho and J. Ullman, Universality of data retrieval languages, Proc. 6th ACM Symp. on Principles of Programming Languages, 1979, 110120. [BH] A. Blass and F. Harary, Properties of almost all graphs and complexes, J. Graph Theory 3 (1979), 225-240. [BR] J.W. DeBakker and W. DeRoever, A calculus for recursive program schemes, Proc. 1st Internat. Coll. on Automata, Languages and Programming, North-Holland, Amsterdam, 1972, 167-196. [CH] A. Chandra and D. Harel, Structure and complexity of relational queries, J. Computer System Sci. 25 (1982), 99-128. [CKS] A. Chandra, D. Kozen, and L. Stockmeyer, Alternation, J. Assoc. Comp. Mach. 28 (1981), 114-133. [CK] C. C. Chang and H. J. Keisler, Model Theory, North-Holland Publ. Co., Amsterdam, 1973. [Ck] S. Cook, The complexity of theorem-proving procedures, Proc. Third Annual ACM Symposium on the Theory of Computing, 1971, 151-158. [ES] P, Erdos and J. Spencer, Probabilistic Methods in Combinatorics, Academic Press, New York, 1974. [Fa] R. Fagin, Probabilities on finite models, J. Symbolic Logic 41 (1976), 50-58. [Fr] P. Freyd, Aspects of topoi, Bull, Austral. Math. Soc. 7 (1972), 1-76. [Ga] H. Gaifman, Concerning measures in first-order calculi, Israel J. Math. 2 (1964), 1-18. [GKLT] Y. V. Glebskii, D. 1. Kogan, I. M. Liogonki and V. A. Talanov, The extent and degree of satisfiability of a form of the restricted predicate calculus, Kibernetika 2 (1969), 31-42. [Gr] E. Grandjean, Complexity of the first-order theory of almost all finite structures, Preprint (1982), and also in Logique des Structures Finies et Complexite Algorithmique, thesis, Universite Lyon I (1984), 11-47. [Gu] Y. Gurevich, Toward logic tailored for computational complexity, Technical report CRL-TR-3-84, University of Michigan, Jan. 1984, and to appear in Proc. 1983 European Logic Colloquium, Springer Lecture Notes in Math. [HP] P. Hitchcock and D. M. R. Park, Induction rules and termination proofs, Proc. 1st Internat. Colloq. on Automata, Languages, and Programming, North-Holland, Amsterdam (1973), 225-251. [HU] J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages and computation, Addison-Wesley, Reading, Mass., 1979. [Im] N. Immerman, Relational queries computable in polynomial time, Proc. 14th ACM Symposium on Theory of Computing, 1982, 147-152. [Kn] B. Knaster, [Un theoreme sur les fonctions d'ensembles, Annales de la Societe Polonaise de Mathematiques, 6 (1928), 133-134. [Ko] D. Kozen, Results on the propositional /A-calculus, Proc. 9th Internat. Colloq. on Automata, Languages, and Programming, 1982, 348-369. [KS] M. Kaufmann and S. Shelah, On random models of finite potwer and monadic logic, to appear in Discrete Mathematics.

R-2 [Li] A. B. Livchak, The relational model for process control, Automatic Documentation and Mathematical Linguistics 4(1983), 27-29 [Ru'ssian]. [Mol] Y. Moschovakis, Elementary Induction on Abstract Structures, NorthHolland Publ. Co., Amsterdam, 1974. [Mo2] Y. Moschovakis, On non-monotone inductive definability, Fund. Math. 82 (1974), 39-83. [Pa] D. M. R. Park, Fixpoint induction and proof of program semantics, in: B. Meltzer and D. Michie, eds., Mach. Int. 5, Edinburgh Univ. Press, 1970, 59-78. [Ro] W. P. DeRoever, Recursive program schemes: Semantics and proof theory, Ph.D. Thesis, Free University, Amsterdam (1974). [SB] D. Scott and J. W. DeBakker, A theory of programs, Unpublished manuscript, IBM, Vienna, 1969. [Sp] C. Spector, Inductively defined sets of natural numbers, in "Infinitistic Methods", Proc. Symp. on Foundations of Math. (Warsaw, 1959), Pergamon Press, Oxford, 1961, 97-102. [St] L. Stockmeyer, The complexity of decision problems in automata theory and logic, MAC-TR-133, Project MAC, MIT, Cambridge, Mass., 1974. [Ta] A. Tarski, A lattice-theoretical fixpoint theorem and its applications, Pacific J. Math. 5 (1955), 285-309. [Va] M. Vardi, Complexity of relational query languages, Proc. 14th ACM Symposium on Theory of Computing, 1982, 137-146.

UNIVERSITY OF MICHIGAN 3I 9011111I1111111111111111111111 21111111111111111 3 9015 02229 1838