THE UN IVERSIT Y OF MICHIGAN INFORMATION SYSTEMS LABORATORY Department of Electrical Engineering College of Engineering Technical Report ISL-65-4 A STUDY OF A MODEL FOR PARALLEL COMPUTATION by Raymond Reiter July, 1965 This research was supported by Air Force Contract AF 30(602)-3546.

Abstract This paper deals with a generalization of a model for parallel computation as formulated by Karp and Miller, A parallel computation is viewed as a directed graph in which a node represents a sequence of operations to be performed upon data on the node input branches with the results of an operation being placed upon the output branches. An operation associated with a node n may initiate if and only if, for each input branch d to n, the data queue length on d is greater than, or equal to, some threshold T ~ It is shown that a unique computation is defined independently of the timing of the node operations. Methods are derived for determining whether a computation terminates and for finding the number of performances of each node computation step involved. In particular, an algorithm is given which yields the number of performances of' each node computation step for the set of terminating nodes of a computation graph. Finally, necessary and sufficient conditions for data queue lengths to remain bounded are derived, iii

Table of Contents Section Page 1 Introduction........................ 1 2 The Model......................... 4 3 Determinacy of Computation Graphs............. 8 4 Termination Properties of Executions............ 17 5 Termination Properties of Proper Executions........ 18 6 Conditions for Self-Termination.............. 22 7 Properties of Queue Lengths............... 31 8 Conclusion........................ 33 References........................ 34 v

1. Introduction This paper generalizes a model for parallel computation formulated by Karp and Miller [l]. Briefly, this model represents a computation as a directed graph in which a node n. represents an operation to be performed upon data which lie upon the input branches to n.. The results of the 1 operation by node n. are placed upon the output branches of n.. A branch, therefore, represents a data queue. With each branch d directed from node n. to node n. is associated a quadruple (A,U,W,T ), the elements of 1 j pppp which are interpreted as follows: A - the initial number of data items on branch d P P U - the number of data items placed on d upon the termination of the operation associated with n.~ W - the number of data items removed from d, under a first-in P P first-out queue discipline, upon the initiation of the operation associated with node n.o T - a threshold, In order to initiate the operation associated p with node n,, the queue length on d must be greater than, J P or equal to Tp p It is shown in [1] that such a computation graph represents a unique computation, determined independently of operation times. Methods of determining termination properties and the number of performances of each computation step are developed. Finally, necessary and sufficient conditions for data queue lengths to remain bounded are derived, In the present paper, a model for parallel computation based upon that in [1] is given. This model is a generalization in that the operation associated with a given node need not be fixed, but may vary throughout the 1

computation, provided that it is uniquely specified as a function of the number of previous performances of the given node during the course of the computation. For this general model, determinacy is established in the sense that a unique computation is determined independently of operation timeso The model is then specialized to be data dependent upon the lengths of data queues as follows: with each branch d, directed from node n. to node n., is associated a quadruple (Ap, Up(n)), Wp (n)), Tp) J j p p p p where A and T have the previous interpretation. The sequence Up (n)} specifies the number of data words, placed on d upon the termination of the n-th operation of node n.. (W (n)] specifies the number of data words, under a first-in first-out queue discipline, removed from d upon the initiation of the n-th operation of node n.o J With this model we address ourselves to finding methods of determining whether such a computation terminates and to finding the number of performances of each node computation step involved. In particular, an algorithm is given which yields the number of performances of each node computation step for the set of terminating nodes of a computation graph. It is found, as was the case in [1], that the maximal strongly connected subgraphs of a computation graph, and their loops, play a central role in the analysis. Finally, it is assumed that the sequences [U (n)] and [W (n)} are periodic. Under this assumption, methods of determining termination properties and the number of performances of each node computation step are given, and necessary and sufficient- conditions for data queue lengths to remain bounded are derived. 2

In the development it will be assumed that the reader is familiar with the definitions and principal results of [1]. The proofs of all results which follow in an obvious fashion from the proofs of analogous results in [1] will be omitted. 3

2. The Model Definition 1 A computation graph is a directed graph consisting of: (i) nodes nl,n2,...,n. (ii) branches dl,d2,...,dt, where any given branch d = (ni,nj) is directed from a specified node n. to a specified node n. o J (iii) a sextuple (O,H,a,h,tp,p ) associated with each branch d = (ni,n.) where: (a) M is a given set, the alphabet. (b) H JT where T is the free semigroup on C, with p — p P identity 0. If ceHp, re p, then ocH. (c) a E t P P. (d) h is a mapping: i -t A such that for acH, P P P P Tpt, hp(aT) = hp(a) and for actp - H, h (a) = 0. (e) t is a mapping: tp - it such that for acHp TeA, t (a') = t (a)z and for aci - H, t (a) = a. (f) a is a mapping: t X tP X.. X i -rT where n Pl P 2 r has input branches d,d,...,d, and for a(pi)CEp pl p2 Fr p i=l,2,...,r; ap' (h (( (pi))...,h (cr(Pr))) -(P(p)E.p The sextuple associated with each branch may be interpreted as follows: lp is the set of all data words upon which 0., the operation associated with node j, can operate. it represents the set of all sequences of data words which can appear as a queue on the branch d. H is interpreted as the P P class of all sequences acceptable to 0.; i.e. 0. cannot initiate unless the 4 J 4

data queue on branch d is an element of H, so that H defines a threshold P P for the initiation of 0.. The data string a represents the initial data 1 p queue on the branch d. The function h defines, for a given data queue, p P the minimal acceptable head which permits the initiation of 0., and the function t defines the data queue remaining after 0. removes data from p J the head of the queue upon its initiation. Finally the function ap defines the data sequence which is placed on the tail of the branch d = (ni,nj) upon termination of the operation 0.. We shall require that the operation 0, associated with a given node n. be eligible for initiation if and only if for each branch d directed into p nj, the data sequence on d is acceptable, i.e. is an element of H, and that no previously initiated performance of 0. is still in progress. After J 0. becomes eligible for initiation, there may be some delay before initiation actually occurs. For each branch d directed into n., Oj, upon initiation, removes an initial sequence of data words on the queue associated with d.There may then be a further delay before 0. terminates, at which time P J for each branch d directed out of n., a sequence of data words determined q J by the function a is placed on d. The delays mentioned above are left q q unspecified by the model and, in fact, may differ for different initiations and performances of the same operation. The above constraints on initiation lead to the following definitions of the possible sequences of initiations associated with a given computation graph Go Let ~ be a sequence of nonempty sets S,Si,.,2 S,., such that each S C (l,23,l,2 1, 2,..., ] where i is the number of nodes in G. Let 5

x(j,O) = 0 for j[1,2,..., } and for N > 0 let x(j,N) denote the number of sets S, 1 < m < N of which j is an element. A similar definition holds for x(QN). Let d = (ni,n.) and let n. have input branches d,d,...,d. We p J 1 define a sequence a(p,N)ETt for N=O,1,..., as follows: a(p,O) = a. For N > i, assume that for all p, a(p,O), a(p,l),..., a(p,N-l) have been defined and let N' = max (T | T < N and icST. Then if iCSN and N' exists define C(p,N-1) = a[h (a(p N'-l)),..,h (a(prN'-l))]. Otherwise, p p! P^. (,r a(p,N-l) = 0. Then, if jeSN, o(p,N) = tp[a(p,N-l)a(p,N-l)] jSN', a(p,N) = oi(p,N-l)a(p,N-l). Definition 2 The sequence & = S1,S2'.., is an execution of G if and only if, for all N > 0, the following conditions hold. (i) x(j,N) - x(j,N) c (0,1) (ii) If jeSNt there exists R > N such that eSR (iii) If jeSN+l and G has a branch d from n. to n., then o(p,N)cHp (iv) If i is finite and of length P, then for each j, there exists a node n. and a branch d from n. to 1 p 1 n. such that a(p,P)tHp Definition 3 An execution $ of G is called a proper execution if the following implication holds. (v) If a(p,N)Hp for every branch d = (ni,n.) directed into nj, P P J then jcS for some P > No 6

The sequence $ is interpreted as giving a temporal sequence of initiations of operations throughout a performance of the parallel computation specified by Go The occurrence of SN denotes the simultaneous initiation of 0. for all jCSN and the simultaneous termination of 0. for all jcS. The recursive definition of a(pN) is given with this interpretation in mind so that &(p,N) is viewed as the sequence of data words occurring on branch d immediately after all initiations and terminations associated with SN have been effected. Thus, for example, if icSN and jcSN, the sequence a(pN-l) which is the result of the previous initiation of i0. is placed upon the queue a(p,N-l), yielding the data sequence a(pN-l)a(p,N-l). The result of initiating 0. then leaves on d the sequence t (c(p,N-l)a(p,N-l)) which is j(p,N). Conditions (i) and (ii) of Definition 2 insure that to each initiation of 0. there corresponds a termination and that no termination may occur which is not preceded by an initiation. Condition (iii) states that a node of G may initiate only if it is eligible for initiation. Condition (iv) gives the only situation under which an execution may terminate. Condition (v) of Definition 3 states that any node of G which is eligible for initiation must eventually initiate, Finally, it should be noted that the simultaneous initiation and termination of any operation 0. is permissible. J 7

3. Determinacy of Computation Graphs In this section we prove that, for proper executions of a computation graph with given initial data, the sequence of data words occurring on any branch is unique. Thus a computation graph represents a unique computation regardless of which proper execution occurs. We refer to this property as the determinacy of computation graphs. This result is embodied in Theorem 1, for the proof of which we require the following: Lemma Let = SS2...,SN,..., be an execution of a computation graph and let x(j,N) = n. Then a(p,N) = t(n)[a c(pO)c(p,l),... p p a(p,N-l)] o PROOF: We shall use induction on N. For N=1 we must have x(jl)eCO,1l. If x(j,l) = 0, j/is], so that a(p,l) = ap a(p,0) = t(0) [a a(p,)]. If p p p x(j,l) -- 1, jS1 so that o(pl) = t [ap a(p,O)]. In either case, the lemma holds. Assume the truth of the lemma for N=P. Then if x(j,P) = n. o(p,P) = t(n)[ap (p,O)...a(p,P-l)]. Suppose jSP+l. Then a(pP+l) = o(p,P)C(p,P), i.e. a(p,P+l) = t n)[ap c(p,O)... (p,P-l) ]a(p,P). If n=0, the lemma follows immediately. If n > 1, t(n)[ap C(p,O).. Ca(pP-l)] = t [tnp (ap a(p,O)..o a(p,P-l))] and by (iii) of Definition 2 we must have t(nl)- (a aC(p,O)... a(p,P-l)) C H. Hence t [t 1)(ap a(p,O)... o(pP-l)]c(p,P) = tp[t( )(ap(p,O).. pp p P P p c(pP-l))a(pP)]. By iterating this argument we finally obtain a(p,P+l) = t()[ap (pO)... a(p,P)] so that for jSp+ the lemma holds. 8

If jcSp+l a(p,P+l) = tp[a(p,P)a(pP)] t [t()a a(p,O).. a(pP-l)1a(p,P)] - t [t()[{ a (pO)... (pP)] by the above argument, so that (p,P+l) = t(nl)[ap c(p,O).. Cl(p,P)] p p and the induction is completed. For the proof of Theorem 1 we shall require some notation. If a has the form a = OT with O,TCT e, we write 0 C a. Let $ be an execution. We write N(js) = Min N x(j,N) = s). In what follows, primed symbols will indicate that these are symbols defined with respect to the proper execution ~' (e.g. o'(p,N'), x'(j,N')) while unprimed symbols either relate to the proper execution d (e.g. a(p,N), x(j,N)) or are independent of the executions considered, (e.g. hp t, ap) Theorem 1 Let SlS2 = SlSN, and 6 J^SN be two proper executions of a computation graph G. If x(j,N) = s then there exists N' such that x'(j,N') = s. Moreover, if jESN) then N' can be chosen such that x'(j,N'-l) s-l and h (a(pN-l)) = h (a'(pN'-l)). p p PROOF: We apply induction to N(j,s) with respect to the proper execution t. Suppose N(j,s) = 1. Then sc(O,l}) If s=O, i.e. x(jl) = 0, then, since x' (j,O) = O, we take N' = 0 and the theorem holds in this case. If s=l, then jeS1 which implies, by (iii) of Definition 2, a eH. This implies, by (v) of Definition 3, that there exists N! such that x' (jN') = 1 with 9

jcS'N,. By the above lemma we have a'(j,N'-l) = a a'(p,O)...'(pN'-2) and since a sH, we have h (a(p,0)) = hp(a ) = h (c'(j,N'-l)) and again the theorem holds. Now assume the theorem holds for N(j,s) < P and suppose N(j,s) = P+i. Clearly, we must have s > 1. Then by the lemma, a(pP) = t( s [ap c(pO). P Pp a(p,P-1)] and since jeSp+l by (iii) of Definition 2, we must have p(pP)~H p Now N(j,s-l) < P so that, by the induction hypothesis, there exists M' such that x(j,M') = s-1 with jeS'M. We shall prove that there exists N' such that x'(j,N') = s. Suppose, on the contrary, that N' does (s-1) not exist. Then, for all R > M', we must have a'(p,') = t( [apa'(pO)... a' (pR'-l)]. Let x(i,P) = t. Then N(i,t) < P, so that by the induction hypothesis and the definition of a(p,-), to each nonnull a(p,T), T < P-l, there corresponds an a'(p,T), such that a(pT) = a'(p,T'). Finally, for M1M2 _< P-l, suppose that a(p,M1) and a(p,M2) are the results of two consecutive occurrences of ieST and ieST, i.e. T1 < T2. Then by (i) of 1 2 Definition 2, we must have M1 < M2. By the induction hypothesis there exist TJ and T' with T1 < T2 with ieST; and icST,. By (i) and (ii) of Definition 1 2 2 there exist MI and MI such that icS'M,+1 and ieS'M +l and Mi < M'. Hence, to each nonnull sequence a(p,M) of a(p,P) there corresponds a sequence c'(pM') = a(pM) of r'(p,R') and their ordering is preserved. It follows that, for sufficiently large R', o(pP) C o'(p,R'). But a(p,P)eH so that, -- p by (v) of Definition 3, there exists N' such that x'(j,N') = s and j S'' Clearly, x'(jN'-l) = s-1. Finally, if x(i,P) < x'(i,N'-l), then o(p,P) C a'(pN'-l) while if x(i,P) > x'(i,N'-l), then a'(p,N'-l) C o(pP). In either case, since both o(p,P) and a'(pN'-l) are in H, it follows that h (cr(pP)) = h (al(pN'-l)). This completes the induction. 10

Theorem 1 establishes that the number of performances of an operation O. is the same for all proper executions of a given computation graph Go Moreover, the n-th performance of 0. under two proper executions operates J upon the same data in both. It follows that the results of a proper execution of G are determined solely by the initial data regardless of the timing of the operations, i.e. regardless of the proper execution considered. One more point should be noted. The proof of Theorem 1 and its preceding lemma requires only that the set H be the same at the n-th initiation of 0. for P J all proper executions. and that each of the functions t, a, and h be single valued and perform the same mapping for all proper executions at the n-th initiation of 0.. Therefore consider modifying the definition of a J computation graph by replacing the single "threshold set" H by a collection {Hp(n)) such that the threshold associated with d = (ni.nj) is H (n) upon the n-th initiation of 0.. Similarly, replace the functions t (.) and J p h (') by functions t (i,n) and h (,n) respectively, and modify the recursive definition of c(pN) in the obvious way. Then by the above remark. Theorem 1 holds for this more general definition of a computation graph and its associated proper executions. In the following development we shall investigate a particular realization of this more general form of a computation graph. More precisely, we shall deal with a model which is specified by the following choice of Hp(n) and t (.,n): H (n) = H = a | cp and 4(op) > T p) Here i(op) is p pp pp p P-p' P e p the length of the sequence o with the convention 1(6) = 0, and T is a nonnegative integer. Let (w (n) I n=1,2,...) be a sequence of nonnegative integers with W (n) < T. If ca eH, we have a = a2...a ).aT..a p11alap' aW aT P a 11

where a.e Q, i=1,2,...,r. Define Lp(a,n) - ()or o t (eo,n) - O. LJet G be an execution for the above form of a computation grapsh/.,l.. sium'pl:]e i nduct:io)n. y'i.elds: N-l x(0,i) [a(pN) ( - 2,(a ) _ [ (i(p,Rl)] - W (n) p R=O n:=l But a(p,R) = ) except possibly when ieSR. For icSR define R-tl* R+1 U [x(i,R+l)] = i[a(p,R). Then x(i,N) x(j,N) [a(p,N)] = A + 2 U (n) - E W (n) where A = i(ap). n=l n=l It follows that o(p,N)cH if and only if o[a(p,N)] > T Thus for d = (ni,n.), T, Ap U (n) and W (n) may be interpreted, respectively, as a threshold which prevents 0. from initiating whenever the threshold exceeds J the queue length on d, the initial data length on branch d, the length of data placed on d upon the n-th termination of Oi, and the length of data removed from d upon the n-th initiation of 0.. P J Finally, we note, by (i) of Definition 2, that x(i,N) - x(iN)C(0,l1, so that o(p,N)cH implies that x(i,N) x(j,N) A + E Up(n) - W (n)> T n=l n=l P- - while 6 finite implies x(i,N) = x(i,N) by (i) and (ii) of Definition 2. We may then reformulate Definition 2 and 3 of an execution and proper execution to conform to the present model as follows: 12

Definition 2' The sequence S = S1,S2,.., is an execution of G if and only if, for all N > 0 the following conditions hold. (i) x(jN) - x(JN)Ct0,lJ. (ii) If jSSN, there exists R > N such that jcSR. (iii) If jcSN+l and G has a branch d from n. to n., then x(i,N) x(j,N) Ap + U (n)- ) W(n) > Tp P 1 (iv) If & is finite and of length P, then for each j there exists a node n. and a branch d from n. to n., such that 1 p 1 J x(i,N) x(jN) A + 2 U (n)- Z W (n) < T P 1 P P Definition 3' An execution 6 of G is called a proper execution if the following implication holds. (v) If x(iN) x(j,N) A + ~ Up(n)- W (n)> T p 1 P 1 p p for every branch d = (n.,n.) directed into n., p n irec J in JN then jESp for some P > No 13

An Example Consider matrix multiplication C = AB where n Cij z aikbkj k=l i=l e *om; j=l...p. Figure 1 shows a computation graph for this calculation where each branch d is labeled with its associated quadruple (Ap, (U (n)), (W (n)), Tp). Each of the indices i,j, and k is assumed to have zero as its initial value. Index modification is performed by the operations 0~1'~,0 ^,07 0as follows: 01 replaces k by (k+l) mod n 02 replaces j by (j+l) mod p 03 replaces i by i+l. The initial data on the branches (nl,nl),(n2,n2) and (n3,n3) consist of markers sequencing the indices through their appropriate values. Termination of the queue associated with branch (n3,n3) is caused by the depletion of the queue associated with branch (n3,n3). The values U = np and W=l on the branch (n3,n4) specify that any given value of the index i is used in np consecutive performances of 04. Similarly, for branch (n2,n5). 04 fetches aik from storage 05 fetches bkj from storage 06 forms the product aikbkj 07 adds aikbkj to the partial sum in a sum bucket and puts out c.. on (n7 n7) only when it has performed n summations. 14

At the end of the calculation (n7,n7) contains the sequence cllcl2, lp~''C clppyc21,''c'p,*,m...cc M.P' 15

9T UOT.eOT- dTqid TInui xIT;ceul joj qdejS uoTqp;enduzIoo V T 1 MSTA GSTrT-'qqo 0 u I T JT T - (J) n (o' (o})' ((a)i n'o), ( d ) j-, / t c LT x ( 9u ( x )1 - (> %> \N fCl jfi~i~i~Ti~TTO~a ~U3 —~6 u)iT~iT~T~r-C)( 0>u -^ ^ ^ ^~~~~~~('r3C~' r

4. Termination Properties of Executions We shall assume throughout that Up(n), 0 and W (n) A 0 for all n. The case U (n) = W (n) = 0 for all n is treated in [1]. We write ap = A - T + 1, and Z. = ((i,P) d is a branch from node n. to node n.}. P P P P 1 Theorem 3 Let G be a computation graph. Consider the system of inequalities x(i) x(j) min [a + Z U (n) - Z W (n)] < 0 for j=l,2,..., (1) (i,P)c Pp 1 1 - If Z. is empty for any j or if (1) has no solution in nonnegative integers then every execution & of G is of infinite length. If (1) has a nonnegative integer solution, then every execution Q of G is of finite length, and, for any j, the number of performances of 0. is x (j), the minimum integer solution of (1). J From this point on we assume that Z W (n) = a. Lemma 1 If, for some (i,p)EZj, Z Up(n) < o, n. terminates. (Note in J p- 3J this case n. need not terminate.) If, for some (i,p)cZj, n. terminates, then n. terminates. J Lemma 2 Let G' be a strongly connected subgraph of G. Then in any execution & of G either every node of G' terminates or none does. 17

5. Termination Properties of Proper Executions We consider, in what follows, the case ~ U (n) = o, as well as W (n) = o. p Lemma 3 The node n.eG terminates if and only if, for some (i,p)EZ. J J n. terminates. Also, if n. terminates, there exists (i,p)eZ. such that n. terminates and x(j) x(i) Wp(n) > a + E Up(n) (2) 1 P -p 1 P where x(k) = max x(k,N) in any proper execution of G. Theorem 4 Let S be the set of indices of nodes terminating in G. For each jeS, x(j) is given by the minimum value of x(j) in any nonnegative integer solution of x(i) x(j) min [a + Z U (n) - Z W (n)] < 0. (3) (i,p)Ez. p 1 p 1 p - icS J Note: The proof of Theorem 4 establishes that the set (x(j),jcS) is a solution of (3). Since x(j) = x (j), the minimum value of x(j) in any nonnegative integer solution of (3), the set (x(j), jcS} simultaneously minimizes all of the quantities x(j) over the set of all nonnegative integer solutions of (3). We call this the minimal solution of (3). 18

Theorem 5 The following iteration scheme converges in a finite number of steps to the minimal solution of (3), for j S. x(0 ) )= Choose x(n+l) (j) so that: p (i) Whenever x(n) (j) x(n) (i) Wpw(k) > P + I U(k) p() U() and k=l k=l (n+l) ( (n) (. ) (ii) Otherwise (n+l) (j) (i) W (k) > o + U (k) and. p - p P k=l k=l x~nl (j)- x(n) (i) W (k) <c + U (k) where k=l k=l x(n) (r) = Min x(n) (r) reS. (ip)C P PROOF: We use induction on n to show that for all n and for jeS, x(j) > x(n)(j). For n=O, x(O)(j) 0 < (j). Assume the result for n=r. We show it true for n=r+l. Clearly x ( (j) > x0)(j). If x(r-l) = (r), the result is immediate. Otherwise x(rl)() > x(r)(j) so that for all (ip)cZ., (ii) of the iteration scheme applies. 19

Suppose, now, that the induction hypothesis fails for n=r+l. Then (r+) (j) > x(j) Hence, for any (i,p)cZj, x(r+l(j) > x(j). Then by J- p (ii) of the iteration scheme, x(j) x( )(j) x)(i) V-7 p \ (k) W (k) <) < a + U(k) so that P - p p p k=l k=l k=l X(j) x(r)(i) x(i) Wp(k) < p + u (k) + Up(k) k=l kl =l k=l the last inequality following from the induction hypothesis for n=r. But then, for all (i,p)CZj, x(i) x(j) p + z Up(k) - Wp(k) >0 k=l k=l contradicting (3). Hence x(n)(j) < x(j) for all jcS. Since, for each j, x(n)(j) is bounded above by x(j), and since x(n+l)(j) > x(n)(j), the iteration must terminate with some n such that x(nl)(j) = x(n)(j) for all jeS. For this value of n, (i) of the iteration scheme must hold for all jeS and some (i,p)cZj, otherwise (ii) would yield x(n+l)(j) > x(n)(j). For each such j, and some (i,p)cZj, (i) yields x(n) (j ) x(n) (i) W (k) > a + U (k) i.e. p - - p k=l k=l x(n) (i) x(n) (j), + X U(k)- Wp(k) < 0 k=l k=l 20

i.e. x (n)(j), jcS) is a solution of (3). Combining this with the result x(n)(j) < x(j) yields x(n)(j) = x(j) Theorem 6 Let G' be a strongly connected subgraph of a computation graph G. Then G' is terminating if and only if: (i) For some n.eG' there exists (i,p)cZ. such that n. is terminating and ni.G', or (ii) The system of inequalities x(i) x(j) Min [a + Z Up(n) - Wp(n)] < (5) (i,p)eZ PP p P - d cG' J p for all j such that n.eG' has a solution in nonnegative J integers. Theorem 7 A maximal strongly connected subgraph G terminates in G if and only if there exists G such that G is self-terminating and Gr > G 21

6. Conditions for Self-Termination The condition for self-termination for a loop with I nodes is the existence of a nonnegative integer solution of the following inequalities, obtained from Equation (5) of Theorem 6. x(l) x() ~ W (n) > a + E U (n) 1 1 and for k=l,2,...,2-l (6) x(k-t-l) x(k) Wk (n) -> + z Uk() 1 1 Theorem 8 A maximal strongly connected subgraph G' of a computation graph is self-terminating if and only if G' contains a selfterminating loop. Corollary 2 Let G be a computation graph contained in a computation graph G. If G is terminating in G, then G is terminating in G, and every node of G accessible from a node of G is terminating in G. Also, if n.:G, x(j,G) > x(j,G). Lemma 4 Let l = S1,S',...SN,... be an execution of a loop L. Then, if x(k+l,N) > 0 x(k,N) x(k+l,N) Ak + Uk(n) - Wk(n) > Tk W 1 1 where Wk= max Wk(n). n 22

PROOF: Wk exists since we have, for all n, Tk > Wk(n) o We use induction on N. For N=O, x(k+l,O) = 0 and there is nothing to be proved. Assuming the result for N=P, we prove it for N=P+lo If x(k+l,P+l) = x(k+lP) the result follows from the induction hypothesiso Otherwise k+l e Sp+l Thus x(k+lP+l) = x(k+lP)+l and from Definition 2', x(k,P) x(k+lP) Ak+ 2 Uk(n)- 1 Wk(n) >Tk 1 1 i.e. x(k,P) x(k+lP+l) Ak+ Uk(n) _- Wk(n) > T - Wk(x(k+l,P+l)) > Tk - Wk 1 1 p The desired result now follows since x(kP+l) > x(kP). In the case of a loop, the iteration scheme of Theorem 5 becomes: x()(k+l) = 0. Choose x(n+l)(k+l) so that: (i) Whenever x(n) +l) x(n) (k) Wk(r) > k + kU(r), x(n)(k+l) =(n (k+l) (10) r=l r=l (ii) Otherwise x n1 (k+1) x(n)(k) ) Wk(r) > a + Uk(r) and r=l r=l x(n+l) (k+l)-l x(n)(k) Z;? Wk(r) < k + Uk(r) r=l r=l In order to prove Lemma 5 we require the following. 23

Lemma x(n+l)(k+l) > x(n) (k+l), x(n)(k) > x(n- (k ). PROOF: Since x(n+l)(k+l) > x()(k+l), condition (ii) of the iteration scheme applies so that + (k+ 1) x(n (k) x (k+l) Wk(r) > k + Uk (r) > Wk(r). l r=l r=l r But, in general, x(n) (k+l) (n-l) (k) Wk(r) > ak + Uk(r) r=l r=l Combining these two results yields x(n) (k) x(n-1)(k) k+ Uk(r) > ak + Uk(r) r=l r=l x(n)(k) x(n-l)(k) i.e. j Uk(r) > Uk(r) r=l r=l i.e. x(n)(k) > x(n-l)(k). Lemma 5 If I is the number of nodes in the loop L, then, for k=l,2,.e.,i x(k,L) > 1 if and only if x() (k) > 1. Also, if x(k,L) = 0, then L is selfterminating. Since x()(k) = 0 implies x(k,L) = 0O the condition x(W)(k) = 0 is sufficient for self-termination. 24

PROOF: Clearly, x()(k) > 1 - x(k,L) > 1. Conversely, suppose x(k+l,L) > 1. Then there exists n such that x(i)(k+l) = 0 for i=0,1,2,..,n-l and x(n)(k+l) > 0. Assume n > i. Now x() (k+l) > x( -)(k+l) so that by the above lemma, x(n-l)(k) > x(n-2) (k). Applying the Lemma I times yields x(n-2)(k+l) = 0 > x(n-(-l)(k+l) = 0, a contradiction. Hence x(k+l,L) > 1 - > x()(k+l) > 1. The rest of the theorem follows. We assume, in what follows, that (Uk(n)} is periodic with period pk and (Wk(n)) is periodic with period qk. Write Pk z Uk(n) n=l qk Z Wk(n) k k Uk Wk The following inequalities are self evident. 25

x(k+l) E Wk(n) < Wk x(k+l) + (qk - 1)} n=l x(k) Z Uk(n) > Uk (x(k) - ( - 1) n=l x(k+l,N) (8) Z Wk(n) Wk (x(k+l,N) - (qk- )) n=l. x(k,N) Z Uk(n) < Uk x(k,N) + (Pk- 1)} n=l By applying the first two of these inequalities to the fundamental loop inequalities (6), we obtain the following system: k x(k+l) > k x(k) - - - C(pk-l) - (q -1). k Put ak k+l = ^ - _ (Pk - 1) - (k -1). Then -this system can be written in a matrix form as 1 0 0 -a x(lTx rl I 1 0 0 x(2) 2 K o... o. -a ( a -a 1o.... o (2) 26 i I: i.e. (I -A) X > P (7) 26

Ak - + x(k+l,N) < cO x(k,N) + + O(pk-) + (qk-1) Wk Put Ak - T + X 7k+1+ -a% (pk-l) + (qk-1). Wk Then this system can be written (I - A)XN _ y (9) where x(k,N) is the k-th component of XN and 7k is the k-th component of;, The fact that A is a positive matrix yields the following: Lemma 6 If the matrix inequality (7) holds, then for any positive integer n (I - An) X > (An- + An-2 + + A+)p. If (9) holds, then for any positive integer n n X ( 1l An-2 (I - A )XN < (AN-+ + A +...!L lI)7. Define g = ola2".. of The matrix I-A' is diagonal with each diagonal element equal to l-g. A. Loops with g < 1 Theorem 9 g < 1 > L is self-terminating. PROOF: Suppose L is not self-terminating. Then Lemma 5 implies that there exists P such that if N > P, x(k,N) > 1 for all k. For all such N, (I-A)XN < 7 by Lemma 4 and inequality (9). By Lemma 6 (I-A2)XN = (l-g)X < (Al- +... + A+I)7. 27

Thus, each component of XN has an upper bound independent of N so that L is self-terminating. B. Loops with g = 1 When g=l, det (I-A) = 0 so that the homogeneous system (I - A)X = 0 has a nontrivial solution, as follows. X1 = a x2 =.la X2 C' 1a xI a1a2...-1a where a is an arbitrary parameter. Thus, by judiciously choosing a, we can arrange that each xi, i=l,2,...,2 be a multiple of pjqj, j=l,2,...,2. i.e. p.qj. xi, i, j=1,2,...,~. This is possible since each a is rational. Let X be the minimal positive integer solution of (I-A)X = 0 subject to these conditions. X exists, for the minimal a which yields a solution satisfying the above conditions will yield a component-wise minimal positive integer solution X. Theorem 10 If L is self-terminating with g=l, then at least one component of X is less than the corresponding component of X. PROOF: L is self-terminating, which implies that X is the smallest nonnegative integer solution of (6), i.e. x(k+l) x(k) nl Wk(n) _> ak + nl Uk(n). 28

Therefore x(kd- -l) x(k-i-l ) x(k) x.(k-l) s Wk(n) - z Wk(n) > c + a Uk(n) -: wk(n) n —l n-l n=l n-l Assume, for all k, that x(k) > x(k). Then x(k —l) x(k) x(k-l) 2 Wk (n) > ak + U (n) - r Wk(n). x(k+l)+l 1 1 But x(k+l) is a multiple of q. Hence x(k+l)-x(k+l) x(k) F Wk(n) > k + Uk ) - Wk x(k-tl) Finally, x(k+l) = ak x(k) Uk. - x(k) Wk so that Wk x(k+l) = k x(k) x(k) - Uk(n) 1 since x(k) is a multiple of Pk. Then we have x(k+l) —x(k+l) x(k) x(k) ~z Wk(n) > ak + - Uk(n) - Z Uk(n) 1 1 1 x(k) C k U (n) k (k)+l x(k)-x(k) - F' ^uk (n) 1kk again since x(k) is a multiple of pk' But then x(k+i)-x(k+l) is a smaller nonnegative integer solution of (6), a contradiction, 29

This theorem leads to a finite procedure for determining selftermination. The iteration procedure (10), is followed until either: (i) it terminates, yielding the number of performances of each operation, or else (ii) for all k, x(n)(k) > (k), establishing nontermination. C. Loops with g > 1 Put P = A-1 + A-2 +... + A + I. Then Lemma 6 yields (I-AI)X > Pp, or, (l-g)X > P,p or, since g > 1 < 1 P. From this inequality it follows that if any component of PB is positive, the loop is not self-terminating. For if the loop is self-terminating, let X be the minimum nonnegative integer solution of (6). Then X satisfies X < i P (15) so that some component of X is negative, a contradiction. The upper bound on the values of x(k,L) given by (15) leads to the following finite procedure to determine whether the loop L is selfterminating or not: Apply the iteration scheme (10) until either (i) it terminates, yielding the number of performances of each operation, or else (ii) some component of x(n)(k) exceeds the upper bound obtained from (15). 30

7. Properties of Queue Lengths Define x(iN) x(j,) q(p,N) = A + Z Up(n)- Z Wp(n). 1 1 The queue length on d immediately after the x(j,N)-th initiation of O. is q(p,N) if the x(i,N)-th performance of 0. has been completed; a 1 otherwise, the queue length is q(p,N) - U (x(i,N)). Theorem 12 Let d be a branch of G from n. to n. If n.eG and n.cGS p 1 J 1 J r s where G and G are different maximal strongly connected subgraphs of G, then the queue length of d is uniformly bounded if and only if n, terminates. Theorem 13 Let G' be a nonterminating maximal strongly connected subgraph of G. Every loop of G" for which g > 1 contains at least one branch for which the queue length is not uniformly bounded. Also, the queue length for every branch which lies in a loop with g=l is uniformly bounded. PROOF: Let L be a loop in G' consisting of the nodes nln2,o..n and branches dl,d2,...,d, with dk directed from nk to nk+l, 1 < k < 2-1i and dI directed from n to n. Choose a proper execution of, and consider the linear form W o W W1W WW_ W + 1 Tq 1 2 12.. 2-1 F = q(l,N) +- - (2N) + -- q(39N) +... + q( 31 3 31

x(k,N) x(k+l,N) Now q(k,N) = Ak + Z U(n) - Wk(n). By applying the inequali1 1 ties (8) to this equation, we obtain q(k,N) Ak + Uk x(k,N) - Wk x(k+l,N) + Uk(k-1) +Wk( Substitute for q(k,N), 1 < k < X, in F yields < W1 W1W2... Wih F U. (g^ )x(lN) + A + A +... A1 U2 253.. U W + 1[U (Pi-1) + Wl(ql-1)] + [U2(p2-1) W2(q2-l)] +... U2 _ _-2 [Uf (p -1) +W (q -1)]}. Iuhu -. u^' Suppose g=l. Then by taking the < case in the above inequality, we see that F is bounded above by a constant, and since each q(k,N) is bounded below by zero, a uniform upper bound on each of the quantities q(kN) must exist. Suppose g > 1. Then the coefficient of x(l,N) is positive. Since G' is nonterminating, x(l,N) grows without limit in any proper execution of G. By choosing the > case in the above inequality, we see that F must also grow without limit showing that some q(k,N) must tend to infinity. 32

8. Conclusion An analysis has been presented of determination for computation graphcs, and of termination and queueing in the case that (U (n)) and [W (n)) are p p periodic. An open problem is the analysis of termination and queueing in the case that (U (n)) and [W (n)) are not periodicO However, such an analysis would appear to be of little practical value since most arithmetic calculations which one would wish to effect by a computation graph seem to be iterative. Many other problem areas immediately suggest themselves, Among these are: (i) the synthesis of a computation graph given a computation to be performed (ii) the determination of a proper execution which minimizmes the time for a computation (iii) programming techniques (iv) the allocation of storage registers for data queues and their utilization so as to minimize the number requiredo 33

References [1] Karp, R. M., and Miller, R. E., Properties of a Model for Parallel Computations: Determinacy, Termination, Queueing, IBM Research Paper RC-1285. (To appear in SIAM Journal) 34

UNIVERSITY OF MICHIGAN ii3 901 03695 420 3 90103695 4520