THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY1 A DEFINITIONAL TECHNIQUE FOR SPECIFICATION AND IMPLEMENTATION OF DATA TYPES Khosrow Hadavi CRL-TR-16-83 Under the Direction of Professor Keki B. Irani APRIL 1983 Room 1079, East Engineering Building Ann Arbor, Michigan 48109 USA Tel: (313) 763-8000 1This research was supported by the Department of the Army, Ballistic Missile Defense Advanced Technology Center, Rome Air Development Center, and the Defense Mapping Agency under contract F30602-80C-0173, and by the Air Force Office of Scientific Research/AFSC, United States Air Force under AFOSR contract F49620-82-(>0089. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agency.

ABSTRACT A Definitional Technique for Specification and Implementation of Data Types by Khosrow Hadavi Chairman: Keki B. Irani A model is proposed for specification and implementation of data types. This model is based on a novel multi-level graph structure, viz. the Z-structures. A universal set of data structure operators are defined to characterize the data types. Constructive definitions of these operators are presented to the extent that the task of defining them is reduced to specification of only three first order predicate expressions. Using these predicates, the important issue of error is easily and automatically taken care of. This is achieved through the use of functions whose domains are defined by the above predicates in such a way that every constructor operation results in a "non-error" configuration. The type manipulation operations (TMO) are introduced in order.from define data types of a different behaviour to those of the existing ones. An example of a TMO is the "embed" operation. It enables one to combine two data types so that the resulting data type exhibits a behaviour which can be automatically derived from the operand data types. iv

This facility may also be used to parameterize data types. Another TMO is the enrichment operation. Two types of enrichment are introduced. These are enrichment for "convenience", and enrichment for "change of behaviour." Sufficient conditions are developed in order to distinguish one from the other. It is demonstrated that the proposed model is highly "extensible." For example, with a very minor alteration, a stack specification may be changed to that of a queue and vice versa. Also demonstrated is the ability of the specification to define parallelism of the operations. To this end, without any extra burden on the user, highly parallel operations may be defined for updating and/or accessing data. It is shown that our specification technique offers an invaluable tool to ensure the "security" of a data base, or an operating system. It is also demonstrated that different views of the same set of data may be held by different users concurrently by assigning different predicates to each user. v

TABLE OF CONTENTS DED I CATION..................... i ACKNOWLEDGMENTS.................. iii LIST OF FIGURES............... viii LIST OF APPENDICES................. ix CHAPTER I- DATA STRUCTURE ABSTRACTION-PROBLEMS AND MOTIVATIONS................. 1 II- A NEW MODEL............. 11 2.1- Description of The Model......... 11 2.2- Components of the Model......... 12 III- DATA STRUCTURE OPERATIONS.......... 42 3.1- Characteristic Set of Operations..... 44 3.2- A Definition of Data Types........ 62 3.3- Data Types as Lattices.......... 66 3.4- The Primitive Data Types......... 86 3.4.1- Operations to Characterize Primitive Data Types....... 89 3.4.2- Constructors of Primitive Data Types.............. 92 3.4.3- Primitive Data Types as Posets.. 95 IV- COMPLETENESS AND ~TYPE MANIPULATION OPERATIONS 100 4.1- Completeness and Soundness....... 101 4.1.1- Deduction of Terms........ 105 vi

4.2- Type Manipulation Operations....... 111 V- ENRICHMENT, EQUIVALENCE AND ERROR... 121 5.1- Enrichment of Data Types...... 121 5.1.1- Changing Behaviour by Enrichment. 122 5.1.2- Auxiliary Operations...... 125 5.1.2.1- Equational Theory.... 127 5.2- Equivalence of Data Types...... 133 5.3- Implementation by Emulation....... 140 5.4- Error................ 146 5.4.1- Domain of DSOs.......... 149 VI- CONCLUSIONS AND FURTHER WORK..... 152 BI BLI OGRAPHY................ 175 vii

LIST OF FIGURES Figure 2.1- A 3-LEVEL Z-STRUCTURE, F3...... 17 2.2- STRUCTURES DESCRIBED BY A PREDICATE... 20 2.3- DATA STRUCTURE CONFIGURATIONS...... 40 3.1- REACHABILITY OF ELEMENTS IN Z...... 74 4.1- REMOVAL OF INNER STRUCTURES BY /-FUNCTION 118 5. 1- STRUCTURAL SIMILARITY.......... 140 5.2- ARRAY IMPLEMENTATION OF BINARY TREE... 144 viii

LIST OF APPENDICES APPENDIX A- EXAMPLES 1..... 59 APPENDIX B- DYNAMIC CHANGE OF BEHAVIOR... 165 APPENDIX C- X-NOTATION..... 67 APPENDIX D- MORE ON INSERTION AND DELETION.. 169 ix

CHAPTER I DATA STRUCTURE ABSTRACTION-PROBLEMS AND MOTIVATIONS Three phases may be identified with almost every formal approach to realization of data types. These are the concept, the specification, and finally the implementation. Assuming the concept or the desired object to be specified is already known to us, our task is to identify the two remaining areas, i.e. the specification and representation. More specifically, these two concepts parallel the "behaviour", and the "storage representation" [AFS 80] of data structures. Behaviour of data structures deals with its properties, i.e. the features which are unique to that particular data structure. Storage representation is a real world object, i.e. physically tangible, that meets all the requirements of an abstract specification. Consider a stack data structure, its behaviour is characterized by a last-infirst-out discipline. The representation of stack, *on the other hand, may be a linear list or a linked list or any other storage representation which exhibits a LIFO characteristic. Problems regarding the storage representation of data structures, concerning efficiency and optimality, were a l

2 popular research topic, particularly, when the non-numeric computing became widespread and the volume of data started to expand very rapidly because of the industry-wide usage of computers. The large volume of data, either in database environments or in numerical problems triggered a number of studies concerning storage of data, some of these are [DIM 69, HAN 69, PAT 69, RAN 72, ROS 71, WON 75, KAR 75, ROS 78]. Behavioral study of data structures has been the main theme of research for the past decade or so. The main thrust has been towards developing a formal technique to specify the behaviour of data structures. This led to the concept of abstract data types. An abstract data type defines a class of abstract objects which is completely characterized by the operations available on those objects [LIS 74]. In general there are two main categories in which the specification of abstract data types may fall: operational and definitional[GUT 77]. In the operational approach, instead of describing the expected properties, one provides a methodology by which a desired type may be built. The definitional approach, on the other hand, provides a set of axioms, under which the properties are upheld. The definitional techniques are more formally oriented; their most valuable merit is claimed to be the ability to describe data structures without giving any bias as to how it should be implemented. This lack of implementation bias,

3 however, may sometimes prove to be not so desirable when an implementation is sought. Two of the definitional techniques which have won a good deal of support are the axiomatic and the algebraic techniques. Both of these techniques describe the behaviour of data structures by a set of axioms which are defined on the operations of data structures. As a consequence of the abstract descriptions, such techniques are normally difficult to construct, comprehend and verify. The algebraic method [GUT 75, ZIL 74, GOG 75] has shown the most potential for automatic implementation and automatic correctness proofs. This is accomplished through the use of-equational theory which is of central importance in this scheme. However some crucial issues such as identification and detection of errors are difficult to handle and implement in the theory. A more indepth, though still incomplete attempt in this area is due to Goguen et. al. [GOG 78]. Their proposed technique requires additional axioms to be added to handle error instances. The so called error equations make the task of specification, in their own words, "..unbelievably complicated" [GOG 78]. Yet one of the principal reasons to have a formal approach is the ability to express errors. Specification of concurrency and other performance related topics are also issues to be considered and solved. In general, there are a number of factors to be considered and satisfied in a specification technique. These are briefly described below.

-Constructability and Comprehensibility In general we are interested in constructing a specification with a reasonable degree of confidence which is comprehensible to humans as well as machine. Some authors maintain the view that comprehensibility is not an important issue, since a formal specification is always comprehensible to a machine. We believe that it is just as important for the human user to comprehend the underlying specification since: a) a comprehensible scheme is of value in construction phase of specification; it enhances the degree of confidence, and b) users may exchange their ideas without any ambiguity and the tedium of line by line description of the conventional programs. -Minimality Every specification should be free from any extraneous information. For any specification, we are concerned very little about how it should be done; we are only interested in. what function(s) should be performed. In addition, minimizing the specification reduces the number of properties to be verified in correctness proofs. -Applicability It is obvious that for any specification technique, a wide range of applicability is desired. In other words, how universal or how limited is the power of a specification technique is in order to describe data structures in general.

5 -Extensibility It is desirable that a small extension of a concept results in a similar small modification in its specification. For example, in changing a specification of a set to a bag, it is undesirable to undertake major changes in the original specification of the set. The techniques which have been developed so far are rich in some aspects and deficient in some others. Graphical techniques have the potential of enhancing certain attributes of a specification, for example constructability and comprehensibility. However one problem remains, and that is the minimality issue. Due to the presence of the extra graph structures, the specification is no longer minimal. The algebraic approach, on the other hand, treats operations as letters of alphabet, T. T*, the free monoid generated by T, is provided with a finite set of equations between certain elements of T. Every instance of the data type is then given by a word W in T*. The specification technique which we have proposed is a synthesis of operational and definitional techniques, combining the ease of use and constructability of the operational methods, and the advantages of formality and applicability exhibited by the axiomatic and algebraic methods. Built into our model are concepts such as ease of construction, parallelism of operations, easy extension and the ability to handle "error." Many of these concepts have

6 been either left out or lightly considered by the existing methods. In the proposed model the behaviour of data types is specified by the operations defined on their graph structures. This approach is substantiated by the fact that behaviour and structure are intimately related [BRO 80]. Furthermore, the use of pocinters, directed graphs or any other unnecessary implementation directed information has been avoided. Despite the many claims that are made regarding the lack of implementation bias in the algebraic approach, there are numerous common examples where without the use of "hidden functions," the technique would not yield favourable results [MOI 80, MAJ 77A]. As a result extraneous information, i.e. implementation bias, is introduced not only to make the technique convenient, but also feasible. In spite of all this, there are still some simple and commonly used data types, for which it is not easy to employ algebraic technique [GOG 78]. We argue that in order to ease the task of specifications we have to make some trade-offs. Following the tradition of the recent past we also have adopted the principle that data types are algebras. However graph structures are also introduced to gain in the areas where the implicit methods have demonstrated weaknesses. A number of these areas are: constructability, error treatment, comprehensibility, extensibility, and

7 performance constraints. Guidelines to build complex structures as well as provision of direction for the implementor are among other problems associated with the implicit methods. We have employed graph structures as an aid to describe "relations" between data items; but unlike Earley's approach we do not adopt different types of nodes. It is also unnecessary, as cited in Shneiderman [SHN 74] and Majester [MAJ 77B], to use directed graphs to impose implementation bias. Among the previous studies employing graphs, only the approach of Majester offers any adequate formalism, the rest are either incomplete or insufficient theories. In Majester's work 9 only the relation between abstract entities and the operations characterizing the data structures are considered. The important notion of error and other crucial concepts such as parameterized types have not been considered adequately. In our approach we have developed a mechanism by means of which one can build complex structures in terms of the more primitive ones. The appealing feature of this process is that the resulting complex structure retains all the properties of the constituent structures. Consider, for example, the list-of-trees (lot) data type. Once we have defined a list and a tree, by either the user or the host language, a lot data type, as will be shown later, is no more than (say) an infix expression with list and tree as operands: lot = list x tree

8 As pointed out earlier lot, like any other data type, requires a set of operations to characterize its behaviour. Our intention has been to define the "embed operation" (x) so that the "characteristic set" of operations for lot is formed automatically out of the characteristic operation set of the constituent list and tree data types. Having defined lot as above, one can extend the datatype lot to contain parameter types, e.g. list-of-tree-ofstrings (lots) is formed as follows: lots = lot x string A very important criterion, rarely considered is the question of performance. The method which we have adopted, enables the user to specify his intended data type as well as to emphasize, if so desired, some performance criteria concerning both the specification and implementation. Specification of parallel operations, for instance, is an important criteria. None of the previous methods have presented the ability to specify concurrency of operations such as parallel insertion and/or parallel deletion. Performance issues may also be specified at the implementation level. As an example of an implementation related performance, consider a two-dimensional array specified as the "embed" of two arrays: matrix=array x array

9 With an appropriate terminology, one can also specify the layout, of the data items, either in a row-major order or in a column-major order. Use of the directed edges,to serve as the access path, may also prove to be another tool to specify performance criteria. The reader should be reminded that although having the ability to specify some performance measures would introduce information extraneous to a minimal specification, we are merely offering the choice. In our approach the edge between two nodes (say) does not necessarily imply an access path (link, pointer etc.) in its corresponding memory representation. One of the major problems of the algebraic approach is to define a set of operations on the desired data structures in a way that would "completely" characterize its behaviour. In addition, it is also necessary to define the operations so that "error" or illegal results (configuration or instances) are avoided, or at least identified. The error treatment of data types, treated lightly in the literature, is an important and difficult problem as exemplified by Goguen et al [GOG 78]. We believe that the recognition of error instances of data structures using a graphical approach is a much easier task than when an axiomatic technique is employed. For example consider a stack data type. The error instances of "empty-stack" and "full-stack" are immediately obvious when a graph structure is

10 visualized. This same information is not quite as obvious when an algebraic approach is employed, unless some "constructive" description of it is visualized [MAJ 77A]. Using our approach of combining simple data types to get more complex ones, the complex data types' error conditions are easily and automatically met by those of the constituent types put together. This facility is made possible by the nature of the type manipulation operations, which preserve the properties of the constituent data types. The next chapter will introduce the basic components of the model. It will be followed by chapter 3 where we introduce the data structure operations and formally define the characteristic set of operations. The type manipulation operations are examined in chapter 4. The concept of equivalence is discussed in chapter 5 where we also discuss the notion of error and enrichment. Finally chapter 6 contains the conclusions of this study as well as suggestions for further workO

CHAPTER II A NEW MODEL 2.1- Description of the Model Graph structures have proved to be useful visual aids in many areas of science. Their presence is of value to human brain and our better comprehension of the underlying system. To a formal computing system, however, graph structures are no more than sets and relations. Although their presence may or may not be exploited for any "biasing" purposes such as implementation guidelines, we have opted to leave the potentials open-ended. However, there are many instances for which the use of the graph structures imposes some unnecessary and extraneous information on the underlying specification. We are willing to accept this in order to gain in other aspects of a specification scheme discussed in the previous chapter. The concept of multi-level graph structures is introduced to depict relationships between data objects or any other type of elements in general. Following the minimal description techniques of the algebraic advocates, 11

12 the operations defined on these graph structures will determine the characteristic and the behavioural aspect of the structures. It will be seen that certain predicates are necessary to define the correct relationship between data elements. Such predicates are particularly useful since they ensure correctness of operations carried out on data structures. These predicates will be referred to as pexpressions. the p-expressions merely represent structure of data and not the behaviour of it. For instance a list structure may have the same structure as that of a stack, but their behaviours are completely different. The combination of both structure and behaviour is referred to as a'data type." In the remainder of this chapter some basic components of the model are introduced in order to lay the groundwork for what follows. 2.2- Components of the Model Definition 2.2 below introduces the concept of structures contained within other structures. Hence the notion of multi-level structures is evolved. We have termed them as Z-structures due to their close resemblance to manysorted Z-algebras. Each level of the structure may be employed to represent one or more "sorts." The distinction is not significant, since two sorts or more may be merged into, and considered as, one sort. In addition the relation between the elements of the algebra is emphasized

13 by means of the "u-relations." Emphasis on both structure and behaviour has many advantages as we shall see, and also noted by[BRO 80]. The following definitions and concepts are designed to serve as a formal framework for the proposed model. Definition 2.1 A graph is a pair g=(N,E), where N is a set of nodes, and E is a set of unordered pairs in N, called edges. Following the BNF notation, in the definition given below, a vertical' bar(j) indicates "or," and the symbol "C" denotes "contains." In this definition, the concept of nodes "containing" other nodes, i.e. identifying different "levels" of nodes is introduced. A collection of elements <x1,x2,...,Xn> where multiple occurrences of the same element is allowed is referred to as a "bag." Finally (x,y) indicates an ordered pair whereas (x;y) denotes an unordered pair. The function MAX, below, acts on a set(of integers), S, such that if MAX(S)=s then seS and for every reS, s>r; also MAX({})=0.

14 Definition 2.2 An L-level X-structure ZLr is defined as follows: ZL = (SL1, UL1) Slk={Slki: i=0 or =0, Ilk' Ilk+l,1'*,Ilk+l-1 AND Ilk=MAX({j: Slkj U S lq} )+l } q<k 1=1,2,...,L & k>1 Slki C (S-l,i' U11,i) 1>1 & iU0 Ulk C <(x,y)n: x,y c Slk & nN+> k=l,2,... S1ki'g undefined for all i>11 Z=(Sgi,u0i) for all i21. Zundefined is an. undefined X-structure. sak=Saki for some icN & k=1,2,... Uak=< > Saki is a primitive node. Siki is a Structure Support Node (SSN) for l,k,i 2 1. SlkO is the empty structure (node) for l,k'>. N+ is the set of non-negative integers. A E-structure is said to terminate with primitive nodes, if it possesses level zero nodes. Similarly a Xstructure is said to terminate with SSN, if all of its lowest level SSN's support Xundefined' Henceforth, unless otherwise specified, all structures terminate with SSN. The following observations are of interest: * The primitive nodes do not contribute to the number of levels of the X-structure.

15 The indices 1, k, and i assign a numbering of the nodes as follows: 1- level indicator, sometimes referred to as the first index. k- index of the node of the most recent structure within which the current structure is being defined. This may also be referred to as the second index. i- index number of the current-level nodes, referred to as the third index. Hence each node is distinctly indexed. Nodes Slki with i=0 are referred to as empty structures. Thus if a node S+1,k,j contains Slj0, then the latter implies that there exists a defined lth level structure, though no "insertion" has yet been made at this level. In effect Slj0 acts as a "place-holder." Henceforth the existence of these place-holders, where applicable, will be implicit in our notation. Furthermore, an empty node SlkO implies the existence of other emty nodes at levels 1-1, 1-2,..., 1 (or 0 if the structure terminates with primitive nodes) contained in Slko. Thus Slk$ contains S1 _1,00. and the latter contains S1-2,0,0 and so on. Note that there may be more than one S1,,00 at a level 1. We shall see that this ambiguity will not cause any problems since no "activity" may be associated with the content of an empty node. In contrast to the empty structure, there may also exist the undefined structure. An undefined structure would

16 only occur at the lowest level, such that if level 1 contains the undefined structure then no structure may, or can, be present at any of the level 1 nodes. The distinction between the undefined and empty structures may be shown by the following analogous situation in a high level language. When an array_of_integers is declared, we have a structure which is defined but empty until it is initialized. However, when an array structure is defined without any parameters, the structure which may be contained in each node of the array is undefined. If the structure terminates with primitive nodes, a level 1 node, indexed k, may contain any integer; i, labeled <0, k,i>. A u-relation is defined as a bag of unordered pairs of nodes. Thus each ulk is a u-relation. We shall employ the notation "i<*>j" to designate (i,j) is an element of a u-relation. That means: i is u-related to j and j is u-related to i. Since it is understood that both (i,j) and (j,i) are in a u-relation, we only need to consider the bag of pairs where only one of the two pairs is present. To be consistent in our approach we only consider the subset, u, of every u-relation such that if (i,j)Eu, then jri. Furthermore, for our immediate goals we shall treat the u-relations as sets rather than bags since the bag concept is not of much importance in specifying the "behaviour" of data types. However for "implementation"

17 purposes, namely the actual memory representation of data types, the bag may prove to be a valuable tool. Example A 3-level.Z-structure. The desired structure is depicted below in a graphical manner. The nodes are labeled (Iki) to denote S 1ki' 312 ol t2 2\ 1 213 /,~.. 226 Figure 2.1 A 3-level E-structure, Z3.

18 Let us now derive the description for the above structure using the definition of s-structures given earlier. Note that Slkm denotes a node Slkm. level 3 3=( {S31 1 <(S31 12)>) level 2 S31 1g({S211,S 212,S213}, U21) u21 <(S 1IS 212(212'213 211S 213(S2'S212) (S 21213) )> S312S( {224 S225' S226}' < (S224 S225)>) level 1 S211C([S111?S 12}, l U11) 1=<(S 11 SlS112)> S212c({s123}, u12) U12=< > S213( {S130}, u13) u13=< > S224c( S 144,S145,s146}, u14) U14=<(S144,S145), (S144 S146)> S225 (S157 U15) U15=< > S226( {S160},' 16). u16< > Finally each S lki' for all k and i, contains Xundefined' hence the structure terminates with SSN. Let ul denote the union of ulk and S1 be the union of Slk over all values of k. For notational convenience we may denote EL by a pair (Su) where S is the set of all nodes in ZL and u is the union of ul for all levels 1.

19 Definition 2.3 The content of a node Slkj, C(Slkj), is defined to be the 1-1 level s-structure contained in Slkj. As we shall see shortly, in general we are interested in dealing with a set, not necessarily finite, of Zstructures collectively. Such a set would contain elements of the form Z=(S,u). In order to be able to describe the desired set of E-structures we need to impose a number of restrictions. In the presence of these restrictions we can employ certain predicate expressions, known as "pexpressions", to describe the desired set of Z-structures. Example In this example we present the basic idea of how we may use a predicate, such as <-> which means "is u-related to", in order to describe list-like structures. Later on we shall show that a p-expression would perform the same task but for a whole class of structures. Let i denote Slki. Also let us assume that the domain of values of i is: {1,2,3,...,I-2}. Then, a) The predicate expression to describe a list structure, E1, of length (I+1) is: (Vi)(i<*>(i+l1)). This is illustrated in figure 2.2(a). b) A list where each node is linked to the two nodes immediately following it may be represented by:

20 Cite figure 2.2(b). Figure 2.2 Structures described by a predicate. Notation: t A traversal function (to be defined later) for level 1 indexed by q. Er A variable symbol to denote a L-level Z-structure in SIGL. S- A variable symbol to denote a set of all nodes at level 1 of ZL Slk- A variable symbol to denote a set of nodes at level 1 of 4L contained in the kth node of level 1+1. ~~r usEi r~ 1r

21 ur- A variable symbol to denote a u-relation at level 1 of Zr L' Ur Ur 1 - A node Slkm may be abbreviated to <l,k,m>; or if 1 * and k are clear from the context the index m may be used to represent Slkm - The symbols i,j,lki,lkj (and a few others which would be clear from the context if used) will be employed as variable symbols to denote elements of Slk. - Note that symbols with a superscript r denote variable symbols of the corresponding set. That is Z denotes any element of the set SIG. Thus instead of using the phrase "for all EL we may use "for all r." Consider now a set of Z-structures: SIG={Zr: Zr=(Srur) } we may write u=fur: cr SIG }; i.e. u is the set of urelations each of which is associated with an element of SIG. Our intention is to associate a family of pexpressions: PI,...,' L with a set of L-level Z-structures SIG such that uf, the u-relation at level 1 of IL' may be described by Pl, 01iL, for all elements of SIG. Before any further elaboration on the nature of the p-expressions let us state our first restriction on the set of E-structures, SIG.

22 r i) Every element ZLcSIG has the same number of levels L and all ZL terminate with the same level, either SSN or primitive nodes. Thus we may refer to a set of L-level s-structures as SIGL. Definition 2.4 A partial computable function, t: Slk+Slk, such that whenever t(Slk,i)=Slkj then ji is referred to as a traversal funct ion. Since a traversal function tq maps the set of nodes Sl into itself, then for all the elements of the domain and range the indices 1 and k are the same. Therefore each element SlkmeSlk may be uniquely identified by m. Henceforth, for the sake of notational convenience, we may use tq(m) to denote t(S k m) whenever there is no ambiguity. As a second restriction on the set of E-structures SIGL we state the following: ii) For every level 15L of ZLeSIGL there exists a set of traversal functions t1 defined for that level 1 for all values of r. We are now in a position to describe p-expressions and their satisfaction. An atomic p-expression is an expression of the form: (i<>j) where i and j are variable symbols. Its satisfaction requires the presence of a "structure"; we

23 shall deal with this shortly after we fully define pexpressions. Definition 2.5 The well formed formulas of p-expressions (pwff) are defined as follows. Let wff denote the well formed formulas of first order predicate calculus[MAN 74] then a pwff is obtained by using the following rules. 1) An atomic p-expression is a pwff. 2) If A is a quantifier-free pwff, then -(A) is a pwff. 3) If A is a pwff, then ((A)A(B)) and ((A)u(B)) are pwff's where B is either a pwff or a universal wff such that one of the following cases is satisfied. Case a) If A has two free variables, then the only free variables in B, if any, are those occurring in A. Case b)If there exists only one free variable, i, in A, then B may only have either'two free variables one of which must be i, or one free variable, or none. Case c) If A has no free variables, then B may have at most two free variables. 4) If A is a pwff then V (A) is a pwff where l>1. 5) Only those formulas obtained by a finite number of applications of (1) to (4), are pwffs.' Note that p-expressions are universal formulas.

24 Since we will be concerned with only pwff's, henceforth the term "p-expression" implies a pwff. We shall also omit the parentheses where there is no ambiguity. In general, in order to determine the satisfaction of a first order predicate expression we - need to. define a "structure"[END 72]. 2 A structure V is a function whose domain is the set of parameters: 1- V1, 2, V3.3'...one for each positive integer. 2- Predicate symbols: =, <, >-, <, >, <>o 3- Constant symbols: Slk0, Slkl.. FSlkm' Slkn' 4- Function symbols: +, -, x, /, tq, f, g, h. The arity of these functions is generally either 1 or 2 unless otherwise specified. such that: 1) V assigns to each quantifier symbol V a non-empty set Iv IcIVI; where IVI is called the universe of V. 2) v assigns to each n-place predicate symbol P an n-ary relation PVCIV.in 3) V assigns to each constant symbol c a member cv of the universe IV}. 4) V assigns to each n-place function symbol f an n-ary operation fV: vmln_,vl. 2 Some authors refer to this as an "interpretation."

25 Furthermore we need to define a function s: Var+iVI from the set Var of all variables into the universe IVI of V. The extension of s denoted s is defined as follows: 1) For each variable i, s(i)=s(i). 2) For each constant symbol c, s(c)=cV. 3) If t1,...,tn are terms and f is an n-place function symbol, then s(ft,...,tn)=fV(s(t ),...,s(tn) ) The above definition of s maps all the variables into the whole universe IVI. This definition may be extended to those cases where there are different sets of variable symbols Var1, Var2,...,VarN such that each set of variables VAR may be mapped into some IV ICIVI by s. Let us now present the definition of satisfaction of a wff, p, for the above structure where s maps different set of variables Var to different subsets of the universe Iv lclVl. We say that V satisfies p with s, denoted by I=Vp [s], as follows: Let a and B denote a wff 1) For an n-place predicate parameter p I=V pt1....tn iff <s(tl),....,si(tn)>CpV where t1,., tn are terms. 2) I=V ~a[s] iff I#va[S] 3) I=V aAB[s] iff l=V a[s] and l=v B[s] 4) =V UB[S] iff I=, c[s] or j=v = [s] or both

26 5) I=V Vlia[s] iff for every desv I I==V a[s(ijd)] Here s(ild) is the function which is exactly like s except for one thing. At the variable i it assumes the value d. To def~ine satisfaction of a p-expression Pl, we need to define the type of structures that we shall be dealing with. Let S be the set of all possible nodes Slkm, where 1, k and m are non-negative integers. Let SlkCS, denote the set of nodes, Slki, having the same values of 1 and k. Thus Slk4{Slk: ieN+I, where N+ is the set of non-negative integers. It is obvious that the set Slk is isomorphic to N+. Because of this, in cases where the values of 1 and k are known, we may refer to the nodes as simply: 0, 1, 2,..., m, n,..., where we really mean Slk0, Slkl, S lk2''* Slkm' Skn.... A structure olk is defined for each ELeSIG for lknr lk L each level 1 and for each k. The universe (domain) of each structure k is S In every structure, there exists a alk 1k* distinguished family of traversal functions tq, 1q<ml, such that each tq maps Slk1Slk. Such a structure may contain other functions and relations depending on the nature of the p-expression employed. In general any computable function and decidable relation may be defined in a structure olk It should be noted that for every structure ok the set S is a subset of its universe Slk representing the set of nodes "present" in the structure. All the variable symbols i,j, lki, lkj,....that are used in a pwff or any other wff

27 may only map into the set Srk CSlk by the appropriate assignment function s. Let us call the structure described above o From now lk'r on, unless otherwise stated, orlk assigns to V1 the set Slk which is a subset of the universe olkI=Slk.~ In the following definition of satisfaction, we have only defined the satisfaction for the cases where the only quantifier allowed is V1 (abbreviated as V). Henceforth we shall restrict the pwff's to those quantifiers whose domains are finite. Let pi be a pwff then aOk satisfies Pl with s, denoted by = r Pl[s], as follows. alk 1) =-r (i<>j)[s] iff s(i), s(j)Slk and s(j)tq(s(i))!lk for some 1lq<ml where s(i) and s(j)~Slk0. Let a and 0 denote either a wff or a pwff 2) For an n-place predicate parameter p Lr pt1....tn iff <s(tIt),....,S(tn)>p where tl,...,tn are terms. 3) 1=r ~a([s] iff i ar a[s] 0lk lk ) alk aA[s] iff =rk a[s] and 1= r $[s] 4 l) rlk A1k 5) J= r aUB[S] iff I= r a~[s] or r a[s] or both 0lk lk r lk 6) 1= r Wia[s] iff for every mcSlk, J=-r a[s(ilm)] Here 0lk 1k s(ilm) is the function which is exactly like s except 3 V eNote that t denotes the function t, where t is a traversal functionqsymbol and V, in here, is of course 01k.

28 for one thing. At the variable i it assumes the value m. Example Consider a structure olk with traversal functions tl(i)=2xi and t2(i)=2xi+1.' Assume a p-expression: i<>j. Let s(i)=<1,1,2> and s(j)=<1,1,5> then since 5=2x2+1, the nodes indexed <1,1,2> and <1,1,5> are adjacent (or urelated) because the above p-expression is satisfied with s. However if s(i)=<1,1,2> and s(j)=<1,1,9> then the pexpression is not satisfied with s and therefore the nodes indexed <1,1,2> and <1,1,9> are not u-related (or not adjacent). Let us now look at a slightly more complicated example of a p-expression which may be formed using the above definition of pwff: Example Let S 1 1>,<1 1,2>,<1,1,3>,<1,1,4>,<1 1 5>} and t1(Sl)=S Let p(ij)=(i<>j)A(i<10); and s(i)=<1,1,3> q lki lki+0' A and s(j)=<1,1,4>. The relation < is defined as follows: r Slknm<' Slkl' if m is less than or equal to II. With s as defined above we have: 4 In here we have extended the usual definitions of x and + such that Kx<l,ki>=<l,kKxi> and K+<l,k,i>=<l,k,i+K> where K is an integer. As we mentioned before since 1 and k are fixed here, we are only concerned with the node index i. Hence every node may be treated just like an integer.

29 (<1,1,3> <> <1,1,4>)A(<1,1,3> < <1,1,10>) The expression (<1,1,3> < <1,1,10>) is obviously true. To know if <1,1,3><><1,1,4> expression is satisfied for the given s we refer to the definition of satisfaction of an atomic p-expression given earlier (i.e. if s(i),s(j)eSr and s(j)=t1 (s(i)) then it is satisfied else not satisfied). Substitute <1,1,3> for i and <1,1,4> for j; thus we have t1(<1,1,3>)=<1,1,3+1>=<1,1,4> which is equal to q s(j)=<1,1,4>. As a result the p-expression p(i,j) is satisfied with the above s. Clearly there are other values of i and j that would satisfy p(i,j), whereas some pairs such as <1,1,2> and <1,1,5> would re'sult in the p-expression not satisfied since 5~2+1. Hence the satisfaction depends on what the values of s(i) and s(j) are. Let us take one step further and deal with a whole class of Z-structures, SIG, such that the same family of pexpressions may be used to describe u-relations, at different levels, for every element of SIG. Definition 2.6 Let p be a family of p-expressions P=<Pl'''''PL>' Let SIGL be a set of Z-structures such that it obeys the restrictions (i) and (ii) with a finite number, ml, of traversal functions tq, at every level 1. Let olk be a structure as defined before equipped with the traversal functions tq and any other computable functions or decidable

30 relations. Then SIGL is implementable with p if for every r and every level 1 and every k, and any assignment function s, the following is true. 1) If'(i,j) is the p-expression at level 1 such that i and j are both free variables, then k'r(i,j) [S1] iff (sl(i),sl(j))Culk "lk 2) a- If WVi(i,j) is the p-expression at level 1 and j is the only free variable in w, then for every deSk, I= r W(i,j) [sl(ild]] iff lk' I lk for every deSrk, (d,s (j))curk 2) b- If Vj'(i,j) is the p-expression at level 1 and i is the only free variable in a, then for every deSk, r = r 7(i,j) [Sl(jjd)] iff for every deSlk, (sl(i),d)eu k 3) If ViVj'(i,j) is the p-expression at level 1, then for every d!, d2ESrk, I= r r(i,j) [sl(ild1)(jld2)] 2Eal-' "lk iff for every d1, d2Slk, (d1d2)lk P1 is referred to as the p-expression associated with level 1 of SIGL Let SIGL be implementable and L=(Sr,ur)eSIGL; let tq, 1<q5m1, be a traversal function for level 1 of every ZL, Then ZL may be described by (Sr,P) where P=<P1,...,pL> since from the above definition, for every 1>1, ul may be defined 5 s is the assignment function at level 1; such that s may be considered to be a family of s1 for every level 1, I<<lL.

31 by the associated p-expression. Therefore it would suffice to have just the pair (Sr,p) to describe (Sr,ur). Definition 2.7 Let Sr r (Sr. Ur Let Sr be the set of nodes in a E-structure ZL=(Sr,ur), A node Slk jSr is a successor of a node Slki Sr denoted j=suc(i). 1) if (i,j)cur and jri, or 2) if j'=suc(i) and (j,j')cur and j>j'. We also say that i is a predecessor of j, i=pred(j), if j=suc(i). The suc($)=pred(0)=0. Finally, j=1_suc(i) if (i,j)eur Definition 2.8 A successor set of a node SlkicS is: sucset(i)={ j: j=suc(i) }. Definition 2.9 Two Z-structures E=(S,u) and Z'=(S',u') are isomorphic, denoted ZX=', if there exists a bijection -: S+S' such that (i,j)sU iff (n(i),n(j))su'. For a family of Z-structures SIGL, it is imperative that it can be decided whether an arbitrary I-structure is an element of SIGL. One way to decide such a membership is

32 to show that the problem of satisfaction of p-expressions is decidable. This is demonstrated below. Lemma 2.1 Let Z=(S,u) be a s-structure; let SlkCS and Olk be a structure as defined earlier; let p be a pwff such that the only variables in p are those denoting the elements of Slk. Then it is decidable whether p[s], for any given s, or 0lk not. Proof Assume p is written in prenex normal form(pnf), such that it may be represented in the form of i<r>j#E with or without quantifiers in front of it, where i<r>j denotes either i<>j or -i<>j and # is either an AND or an OR function. Note that this is possible since.pwff's are special cases of first order wff's. And since every wff may be written in pnf, therefore every pwff may be written in pnf. Recall that the "formula-building operators" of pwff's are the same as those of wff's(see definition 2.5(2)-(3) ). Also E denotes a wff such that the p-expression is in fact a pwff. Consider the following cases (a)-(d) where = is assumed to be a wff. We shall show the satisfaction by induction on the number of prime formulas. Cases(a)-(d) show the basis of this induction. The extension to the case of n prime formulas is trivial as described in each case that follows.

33 a) p is either i<>j or ~i<>j. Since the number of traversal functions is finite, therefore 1 p[s] or lk its negation may be decided in a finite number of steps. b) p is p1#=, where Pl=i<~>j. _ is a quantifier-free wff. Hence it may be written in the pnf as follows: p1 #1 P2 #2'.*.*#n Pn' where each pq, 1<q5n, is a predicate with either zero or one or two free variables. Furthermore each pq is quantifierfree and assigned, by the structure to some decidable relation. There are three cases to consider: 1- pq (no free varibales occur in pq) 2- p (i) (one free variable) 3- pq(i,j) (two free variables) Case (1) is trivially either true or false. Cases (2) and (3): pq(i) and pq(i,j) are assigned to some unary and binary relations respectively by the structure at hand. In both case, since the relations are decidable,' one can immediately find if i or (i,j) is an element of the corresponding relation or not. This is of course possible using the characteristic function of the relation. Hence for each pq, 1<q<n, we can decide if it is satisfied or not. c) p is of the form Vi(i<s>j)#ViE. That is either i is free and j is quantified over Slk, or vice versa. Since Slk is finite then Vi(i<z>j) is indeed decidable. A relation is (primitive) recursive if it is equipped with a (primitive) recursive characteristic function.

34 To decide the satisfaction of ViE, we need to consider, for every deSlk, the satisfaction of I= 0 s[ild]. This reduces the problem to that of (b) above. Clearly the case when i is free and j is bound is similar. finally, d) Both i and j are bound, i.e. p is of the form vivj(i<~>j)#ViVj= Once again the problem reduces to that of: for every dl1Slk, = lkVi= s[jld 1]. Thus the problem is the same as that of part (c) above. e) All the other cases may be reduced to one or more of the above. This, is shown below. i) Consider the case of a pwff written in the form of p=i<J>j#E where p is quantifier-free and - is itself also a pwff. Then p may be written in the form of: p=i<s>j # i<;>j #1 - where E=i<t>j #1', and is a quantifier-free pwff. Continuing with the above procedure, p may be decomposed as follows: p=i<s>j # i<~>j #1....#. ie5>j, _ where En is either a wff or simply i<m>j. Now if the satisfaction of Pn-_=i< i > #1..... #n-1 i<9>j is decidable, then obviously the satisfaction of Pn -l#nn is also decidable by part (a) or (b) or both described above. ii) Consider the case (i) above, but the occurrence of the quntifiers is also permitted. Using the definition 2.5(3)-(4), restricting ourselves to V only, we can write a

35 general p-expression of the form shown below where the only free variables are i and j.,1,. ~n) Vk Vk2 (k1<>k 2 1 )#11Vk3((k3<>>j)#2=2)#22 ((i<># n) Let us only consider the case of Vk1Vk2(k1<t>k2 #1 1). This would show the basis of the induction. First we assume that 1 is a quantifier-free pwff. WE may write the above _1 expression in the form of Vk1Vk2(kl<t>k2) #1 Vk1Vk2 But the satisfaction of Vk1Vk2(k1<r>k2) is decidable by the case (d) above. Therefore we are only interested to know if Vk1Vk21 is satisfied with some s, or not. This problem reduces to that of deciding the satisfaction of s1 s[k11dlJ[k21d2] for every d1, d2cSlk. But this is also.1 decidable from (i) above since is quantifier-free. Finally consider the case where l, in the above expression - is not quantifier-free. If k1 and k2 do not occur in then the problem would become a special case of what follows. If one or both occur free in, then the problem is to decide: if for all d1, d2eSk s[k1d1j][k21d2] is satisfied or not. If there are more occurrences of the V quantifier in _1 =, viz. Vk, we proceed in the same way as above until all the V quantifiers are removed. The latter form would then lend itself to that of case (i) above.

36 Lemma 2.2 Let SIGL be implementable wrt P=<Pl1'r**PL>, then SIGL is recursive. Proof.First we consider the case for L=1 and p=<pl>, where p1=r(i,j) and i and j are free in r. Let SIG be the set of 1-level Z-structures X=(S,u). For SIG to be recursive we need to have a recursive characteristic function 4 such that 4(Z)=1 if ZeSIG, and =0 otherwise. Since u is finite we can examine each pair (14,v)cu to See if wr(],v) is satisfied and vice versa, where t, vcS and S is finite. But by lemma 2.1 the problem of deciding if r is satisfied, or not, is computable in a finite number of steps. Therefore 4 can be realized as follows: 4(Z)=1 if for I, vcS, (,v)EUcu iff ur(i,v) is satisfied7 4(Z)=0 otherwise. For L>1 we can have a finite number of characteristic functions 1', one for each level 1. Clearly the above argument holds for every Slk, where for each 1' if ~l((slk,ulk)) evaluates to 1 for all Slk, then 4(Z)=1, and =O otherwise. Note that at every level the set S1 is the union of a finite number of finite sets Slko 7 This may readily be extended to the cases where the p-expression has only one or no free variables.

37 A "data structure", as will be seen later, is an implementable set of s-structures. As an example consider a list structure, Z, where the p-expression associated with it is i<>j, where the traversal function is t(i)=i+1. Let F=({1,2,3}, <(1,2),(2,3)>). Using the above definition of the characteristic function we can see that O(Z)=1. Hence Z is an element of the set of all list structures. Corollary 2.1 Let SIGL be implementable with p. SIGL is recursively enumerable(r.e.). Definition 2.10 The largest set of X-structures, SIGL, implementable wrt p], and satisfying the following conditions is called an L-level data structure ZL. For every.=(Srur)EZL, and every level 1 and for every k, if i, jeS kr{Slk0}, thena (i> MIN(Sr )(j(j<i (ji)CUlk))).(a) Condition (a), in the above definition, ensures that no node, except the smallest node Slki(i.e. having the smallest "i-index"), is present without the presence of at least one of its 1-predecessors. Note that i and j in the above equation denote <lki> and <lkj> respectively. MIN(X.) returns the element in X with the lowest index value other than Slk$. Also Sl,k,i>Sl,k,j is true iff i>j.

38 A "data structure configuration" is defined next. Definition 2.11 For a given data structure ZL, an element zEZL is a data structure configuration. In every L-level data structure, the configuration ({SL10},P) is referred to as the empty or the starting configuration, denoted by 4. The empty configuration implies that although a data structure is already defined but no nodes have been "inserted" into the structureS Lemma 2.3 Let ZL be a data structure, then 4=({SL10},p)EZL' Proof By the definition of data structures, ZL is the largest set of E-structures which is implementable wrt p. Therefore 4eZL otherwise ZL U {I} is the largest set. Lemma 2.4 Let ZL be a data structure. ZL is r.e. Proof The proof is essentially the same as that of lemma 2.2 and corollary 2.1 except that the condition (a) of definition 2.10 is also imposed on equation (1) of the proof. That is for 4(Z)=i, we must have the condition: for

39 every jeSlk' except if j is the smallest one, there exists an iCSk and (ij)cu r iff'(i,j) is satisfied. Note that lk aknd (i) l the condition (a) is decidable since Sr is finite. lk Example Consider a two level data structure, namely tree-oflists. Structures (a), (b), and (c) depicted in figure 2.3 are valid configurations, whereas (d) is not, since a lower level structure cannot exist without its immediate higher level(supporting) structure. This of course follows from the definition of s-structure. The family of p-expressions p is <pl, P2>' where P2: (i<>2i) U (i<>2i+1) P1: i<>i+l Our notion of data structures do not completely specify the semantics of a data type. It only specifies the relationship between the "data items." In order to completely characterize the "behaviour" of a data type, we need operations to be performed on data structures. A pair, data structure and its operation set, yields a "data type." A preliminary and informal definition of a data type is given below. A more rigorous definition will be presented in the next chapter after we define the "Data Structure Operations." A data type, t, is a pair (ZL,O) where ZE is a

40 213 211 212) 789 a d0-0 788 Figure 2.3 Data Structure Configurations. L-level data structure, and 0 is a set of operations which enable us to make transitions between elements of ZL. The set O will be referred to as the set of Data Structure Operations(DSO). It should be noted that for a given data type, the number of levels L of its data structure remains unchanged under any DSO operations. Chapter 3 identifies the nature of the DSO operations and a minimal set of operations which would universally

41 specify data types. We shall see that, in the presence of such universal operators, some "fine tuning knobs" made available to the user would suffice in order to specify the particular behaviour which is desired.

CHAPTER I II DATA STRUCTURE OPERATIONS The behaviour of abstract data types is generally determined by the operations defined on them. We shall refer to such operations as data structure operations. A major problem has always been to define a set of operations in order to completely specify some desired behaviour. The use of logical axioms and equations to specify data structure operations(DSO) have been most popular in the last decade. In almost all cases however, the techniques deploying the axiomatic approach, have left the user with some degree of uncertainty regarding the completeness and consistency of the specification. Such problems are common even in the case of simple and reasonably well understood data types. Data structure operations may, in general, be divided into two categories: the characteristic set and the auxiliary set of operations. The former characterizes the behaviour, hence it includes the constructor operations and operations concerning the access of data entities. The auxiliary set contains operations that are not crucial to the behaviour exhibited by the data type. For example in a 42

43 stack, push, pop, and top are all elements of the characteristic set. The auxiliary set of operations for stack may contain operations such as is_empty or is_full. It should be noted that the latter operations may all be derived from the elements of the characteristic set, a process commonly referred to as "enrichment." In this chapter a universal DSO set will be presented. These operations are employed to define the characteristic set of operations of a specific data type with the aid of three types of first order predicate expressions. Notation: Throughout this chapter for notational brevity as well as clarity we have referred to nodes mostly by their third index value m. In such cases the values of 1 and k are either assumed or the argument presented may readily be extended to nodes having different 1 and/or k values. Also implicit in some of our discussions, will be the levels of the underlying structure. In many instances the arguments are presented either for level 1 only of an L-level Estructure or simply a 1-level structure is assumed. Once again unless otherwise stated, we may readily extend the arguments *to the general case. Unless otherwise specified all the X-structures terminate with SSN. In order to identify the traversal functions at a given level, we employ the abbreviation described below. This would enable us to express both the p-expression as well as the traversal functions needed to decide the satisfaction of the pexpression. If we assume that the underlying structure, a,

44 has the traversal functions t1, t2,...,tn associated with it, then i<>t1(i) U'U i<>tn (i) replaces every occurrence of i<>j in the p-expression. - i<1t(i) U'U i</tn(i) replaces every occurrence of ri<>j in the p-expression. Using the above notation, we can explicitly state the actual traversal functions associated with every level as well as specifying the p-expression at that level. As a final note, let a(i) be a first order predicate formula; we say a(m)=true in the presence of some structure o iff o satisfies a(i) with any assignment function s[iIm]; a(m)=fa7se iff a(i) is not satisfied with s[ilm]. 3.1- Characteristic Set of Operations Every data type is required to have a set of operations defined over its data structure in a manner which characterizes the intended behaviour of the data type. For example we are not allowed to delete a node from the middle of a stack data type. As a result insertion and deletion operations must ensure that no such illegal operations are carried out. We shall see that due to the way we define DSO's the "correct" configurations will always result. Furthermore, the p-expressions may be used in order to assert the correctness of the DSO operations by Floyd/Hoar approach to program correctness. For example p-expressions may be employed as pre- and post-conditions before and after

45 every data structure operation respectively. Consequently we would ensure that the structure remains intact. In general, from an abstract point of view, we are interested in developing some common characteristics exhibited by the DSO's of all data types. We shall present certain requirements to be possessed by the elements of a characteristic DSO Set. It will be shown that we can form all the configurations of a data type using the "characteristic DSO set." This philosophy is similar to the notion of the canonical algebras and the constructor signature approach of Goguen et. al., although we pursue a more general approach. Before defining the elements of the characteristic DSO (CDSO) set, the notion of "b-expression" is introduced. These predicate expressions will be used to define "behaviour." Hence illegal insertion and deletion of nodes in a structure are avoided. A b-expression is a first order predicate calculus expression whose function is to determine the "boundary conditions" of an insertion or a deletion function defined for a data structure. For example insertion into a fullstack violates the boundary conditions of the push function; deletion from an empty-stack is also inconsistent with the boundary conditions of the pop function. Insertion into the middle of a FIFO queue is another example of an erroneous operation.

46 The index of the node to be inserted(or deleted) has to meet certain conditions. Such conditions are specified by the b-expressions associated with the data structure. A b-expression, b(i), defines the range of values that the node index can take for insertion (deletion) purposes. Thus if m is within the desired range, then b(m)=true; b(m)=false otherwise. Clearly the satisfaction of a b-expression, defined below, depends upon the underlying structure. Consider a stack of depth 4. A b-expression associated with this stack may, or may not, permit further insertion into the stack depending on what the maximum length of the stack is supposed to be. Let us assaume.that the maximum permitted length is 5. After one more insertion into the stack of length 4, the b-expression would have to prevent any further insertions. Thus the b-expression associated with a class of structures must be able to dynamically convey, to the system, sufficient information for insertion and deletion purposes. Intuitively, a b-expression, b, defines. the places where insertion (deletion) of a node is, or is not, allowed. There is always a possibly empty set of nodes that may be inserted into a data structure configuration without disturbing the intended behaviour of the data type. Such a set varies with different configurations of a data structure. Similarly, there always exists a finite set of nodes that may be. subjected to deletion. Hence there exists a variable set from which deletions can be made with respect

47 to the configurations of a data structure. More precisely, consider a configuration of a data structure z=(S,u) where S is the union of all Slk, then we may define a "construction set" for each (Slk,ulk) of z as follows. The construction set of Srk CSr is a subset of the total construction set lk' Ck' (TCS) for Slk defined below. Let m denote an element of Iolkl and n denote an element of Slk,then r TCSlk={m: 1=r p1 with either s[ilmJ[jln] or s[jlmJ[iln]}' where s is any assignment function and Slk~{Slk0}. If Sr k=Slk} then, TCSk {Slkm: m=MAX( U Slq)+1} q<k q where MAX(USlk)=j where SlkjiUSlk and for all SlkiZUSlk' jzi; also MAX({<lk10>.<lk20>f,..e<1kn0>)=, kr->; and MAX({})=i where i is some integer greater than zero. Corollary 3.1 For a given Slkr the set TCSk is recursive. Proof We can define a characteristic function j such that 4(m)=1 if mcTCSrk and 4(m)=O otherwise. To show that meTCSlk, we need to demonstrate the satisfaction of: = r P1 s[ilm][jIn] or 1=r p1 s[jmli jn clk i, i and j are the two free variables, if any, that may occur in a p-expression pl.

48 By lemma 2.1, this problem is decidable. Hence the characteristic function 4 is computable. Therefore, TCSlk is recursive. Trivially, a total destruction set of a S r is the set Srk less the empty node Slk0. But more interesting is the set of nodes permitted for deletion from a given configuration, the latter will be referred to as the destruction set(DS). The "b-expression for deletion" would define the exact subset of S k which is equal to the DS r lk' Let us now define a "b-expression" for a data structure Z. In chapter 2 we pointed out that a data structure Z may be equipped with a set 0 of "insertion" and "deletion" as well as some other functions of interest. The insertion and deletion functions enable transitions from one configuration of Z to any other configuration, in Z, by addition or subtraction of nodes. The b-expressions help us to determine the domain of these functions. Definition 3.1 Let wff denote a well formed formula of first order predicate calculus. Then a b-expression wel formed formula, denoted as bwff, is defined as a universal wff in which there exists exactly one occurrence of a free variable.

49 We shall deal with two types of b-expressions for every level 1 of a data structure: b-expression for deletion, b, and b-expression for insertion, b1. The satisfaction of the b-expressions is essentially the same as that of pwff's. Consider a structure r as lk defined in chapter 2. Let the range of values of the free variable, i, be TCSrkClolkI, and the range of values of all the other variables be Slk If TCSrk is empty then the bexpression is not satisfied with any s. Recall that V refers to "for all elements in S r k' Definition of the satisfaction of b6 is exactly the same as that of bl above, except that s maps all the variables, including the free variable i, to SLk!k' Now using the two b-expressions mentioned above, we may define the two sets: the construction set and the destruction set for S', -lk CSlk={i: bl(i)} r 1 DSlk={i: b (i)} For every data structure we associate a b-expression b1 (bl) 1 6 for insertion (deletion) at level 1, for all levels 1<L. Then for every level, we may associate CS's and DS's for every S k at that level using the above definition.'1 A={i: a(i)} denotes that for every value of i, say m, that ax(i) is satisfied with, then meA.

50 Lemma 3.1 Given a structure or it is decidable whether b1 (b1) is satisfied, or not, for a given s. Proof The proof is the same as that of lemma 2.1. Note that here we have added a domain, TCSlk, to our universe of the structure. However, since there are no quantifiers over TCSrlk therefore the satisfaction problem is decidable. Using the above result, it becomes a trivial task to determine whether for a given structure, (Slk,ulk), in a configuration of a data structure, a certain node index is, or is not, in its construction set. Let ZL be a data structure and let bl(i) ( b(i) ) denote the associated bexpression for insertion (deletion) at level 1 of ZL Consider now a set Sr at level 1. Define the lk characteristic function for CSik as follows: (m)=1 iff b (m)=true =0 otherwise Since the satisfaction of the b-expression is decidable for a given s, is computable. In *other words for every Sr Cr (DSr) is a recursive set. It also follows that CS r lk (k 1K is r.e. Example Consider a configuration of the list data structure

51 Sil, where the p-expression is i<>i+l:'' 1 2 3 4 0 —-O-0. —0 The question is whether a node indexed <1,1,675> (say) is an element of CS11? This is not the case since there exists no icS11 such that i+1=675 or 675+1=i, see definition of TCS. Hence <1,1,675> is not an element of TCS11. Therefore 4(<1,1,675>)=0. Now consider a node <1,1,5>, abbreviated as 5; by the same argument we would find that 4(5)=1. Hence 5eCS11. Finally consider a list data structure where the length of its elements must be less than or equal to four nodes. Such a restriction, on the length of the configurations, is imposed using the b-expression for insertion, viz. b _i<4. Then although 5 is an element of TCS11, 5fCS11, since the b-expression for insertion, is4, would yield b1(5)=false. Before we define functions for insertion and deletion of nodes it is necessary to introduce the notion of accessibility of nodes. This is essential since insertion and deletion functions must be restricted only to the structures that are contained in "accessible" nodes.'' Note that for all the nodes <lki>, the value of 1 and k is fixed, i.e. (l,k)=(1,1). Hence a node may be referred to by its third index i. Henceforth we may write relations such as <lki>>l where S is a node and i is an integer. This relation merely implie' i>l. Same approach may be adopted in case of functions; for instance Slki+l denotes

52 The issue of data access and security has gained considerable importance in the study of data base systems as well as the operating systems. In order to incorporate this concept into our model, another function of crucial importance is introduced, namely the probe function. The purpose of a probe function, as suggested by its name, is to enable the user to have access to various nodes in the structure. However an important issue is whether every node is permitted for access or not. In the stack, for example, the only node which is permitted for access is the top node. This may also be the case in a queue data type where the "front" node is the only node accessible'. In contrast in an array data type one may probe any one of the nodes in the structure. An abstraction of the notion of the accessibility of nodes may be carried out by means of a certain kind of predicate expressions described below. With every data structure, ZL, we may associate a family of first order predicate formulas, namely the a-expressions. An aexpression defines the set of nodes which may be accessed in every configuration of that data structure. The well formed formulas of the a-expressions, denoted awff, are the same as bwff's. The definition of their satisfaction in the presence of a structure r (defined earlier in chapter 2) is exactly the same as that of b ith

53 the following exception. Let al(i) denote the a-expression at level 1, then Olk satisfies al(i) with s if - 1= al(i)s AND if 1<L then 01+1 q satisfies al+l(j) lk with S[jlSl+lq,k for some q. For every level 1 of a data structure ZL. we associate an a-expression, al(i). The above definition of satisfaction of the a-expressions implies that, to access a node at level 1, one must have access to all the other nodes at levels 1+1 and above that contain the desired node. We are now in a position to define the three types of functions that comprise the "characteristic data structure operation set." The i-function is defined first followed by the d-function and the p-function respectively. Definition 3.2 Let z=(S,u), Z'=(S',u')eZL, where ZL is a data structure and let b1 be the b-expression for insertion, and a1 be the a-expression, associated with level 1, 1I1<L. A primitive insertion function (i-function) is a partial mapping: 3:ZL x N ZL where l(z, 1, k, m) is defined and equal to z' if the following conditions hold at level 1; let k,ml1:'2 12 This equation denotes: insert a node indexed m at level 1 of the configuration z in node k of the (l+l)th level. The resulting configuration is z'; and the inserted node is labeled'm Note that, the primed sets and symbols refer to tIhe new configuration. Double primed symbols are merely "temporary" variables.

54 1) bl(i) is satisfied with Olk with s[ilSlkm] 2) If l<L then ol+1,q satisfies al+1(Sl+l,q, k) for some q, 3) Slkm~Slkr 4) If 1 3 Slkm>MAX(Slk)~ then- Let S"=Su{Slkm}. Renumber the nodes at level 1 of S" starting with the structure contained in all the 1-successors of k and all the structures contained in nodes Sl+1,r,q for any K such that q>k. The renumbering is done using the convention of definition 2.2, such that for every SXqt whose elements are renumbered to get S"Xq, (S q'uXq)=(S"XqU" xq). Note that the renumbering. would not affect the l-index(level indicator) of the nodes. Also Sx =S"x for l<X<L and S =SK for X=l and K<k. S' is then assigned to be the same as S". else- S'=SUlkm} 5) C(Sikm)=zundefined if 1=1 & Z terminates with SSN, S -1,m,0 otherwise. Example An i-function for the stack S 1 = {S11' S112' S113' S114 P1 = i<>(i+1 ) i(z, 1,1,5)=z' S= {SS11, s512, Si13, ST14' S115 13 MAX( Slk ) means the node Slki Slk such that for all Slkn~Slk, i~n.lk

55 u1= u1 U <(S114'Sl15)> U=<(Sv1,s112),(s1l2,s'113),(s113'S; 114), (S114'S115)> bl1(i): Vj(j<i) For i=5, the above b-expression is satisfied. But for any value other than 5, b1(i)=false. For values of i>6 the node cannot be inserted since it violates the definition of data structures which requires a predecessor of the node to be present. In other words, for the given configuration, s(i)=6 is not an element of TCS11, consequently 6+CS11. Note that in the above insertion, since we are dealing with a 1-level structure we do not need to be concerned about the accessibility of the structure. Another component of a characteristic set of operations is a deletion function. Definition 3.3 A primitive deletion function (d-function) is a partial mapping: (following the same notation as in definition 3.2) 6: ZL x N~ + ZL where 6(z,l,k,m) is defined and equal to z'=(S', u'), if the following conditions are satisfied:'4 1) b6(i) is satisfied with Olk with s[ilSlkm], 2) If 1<L then o1+1,q satisfies al+l(Sl+l,q,k) for some q,'4 The deletion equation denotes: delete the node at level A of configuration z,indexed m, in node k of the (1+1) level.

56 3) sucset(Slkm)=1 AND Slkm contains either S1-11,mi or 7undefined' 4) if Slkm<MAX(Slk) then- Let S"=S- {Slkm}U{Slkk}. Renumber the nodes at level 1 of S" starting with the structure contained in all the 1-successors of k and all the structures contained in nodes Sl+1,Kq for any K such that q>k. The renumbering is done using the convention of definition 2.2, such that for every SXq, whose elements are renumbered to get S"XqP (SxqUXq )=(S" xqU" Xq ) Note that the renumbering would not affect the l-index(level indicator) of the nodes. Also S =S"X for l<X<L and Sk for X=l and K<k. S' is then assigned to be the same as S". else- S'=S-{Slkm}U{Slk}l. The i-function described above restricts insertion only to those nodes which are not already in the structure. Likewise the deletion function may only delete the nodes with no successors. In certain cases where such restrictions are not acceptable we may use the i- and dfunctions described in appendix D. In any case, as it will be demonstrated later on, one may define other functions using the primitive i- and d-functions. A probe function, defined for a data structure ZL, is any function whose codomain, for every configuration

57 z=(S,u)EZL, is S. There are many different probe functions that one may define for a structure ZL. The range of values of a probe function would always be a subset of the existing nodes in any given configuration of ZL. In every configuration, zCZL, for every level 1, every structure (Slk,ulk) is associated with the same set of probe functions defined for that level. Consider a tree data type, there are many different probe functions that one may define for it. For example a function that always returns the leftmost leaf node in a given configuration of the tree is a probe function; so is a function that given a node Slkm returns one of its 1-successors. In order to be able to define such functions in general, we introduce the notion of the "primitive probe function." A primitive probe function may be considered to be a tool by means of which one may "correctly" define other probe functions.'5 Given a configuration of a data structure and a node, lki in that structure, the primitive probe function returns a node, Slkj, of that structure only if the latter is allowed access by the appropriate a-expression. In the binary tree data type, a possible probe function would be one that given a node as one of the arguments of this function, it would then return one of its 1-successors. Therefore with two primitive probe functions, one returning'' The correctness of a probe function is decided by whether the value of such function satisfies the aexpression for the structure or not. More will be said about this later in this chapter and chapter 5.

58 the left child and the other the right child, every binary tree configuration may be "scanned." Our choice of the primitive probe functions is arbitrary, and not every probe function does, or should, necessarily return the 1-successor of its argument. In a stack data type, say, not every 1successor of every node is to be accessible. For instance, consider a configuration of depth 4 of the stack; if the probe function were to return the 1_successor of a node then the 1-successor of the node indexed 2 is 3. Since 3 is not the top node it is therefore not to be accessed. It is conjectured that using the following definitions of the primitive probe functions, one is able to define any other probe function such that all the nodes permitted for access may be probed. Definition 3.4 Let ZL be a data structure, and al(i) be the aexpression associated with ZL at level 1. A primitive probe function (p-function), at level l<L, is a partial mapping: Pl: ZL x S S such that for any Slkm1Slk and z=(Slk,ulk) if Pl(ZSlkm)=Slkn then SlkneSlk and Olk satisfies al(i) with s[ilSlkn], P1 ('Slkm) Slk0 otherwise. It should be noted that not every node argument, Slkm in the above p-function, causes the function to return a

59 value Slkn Slk, n0B. This was illustrated in the above stack example. In addition Slkm itself is not necessarily in the range of the p-function. It may be any arbitrary node in Slk. The only important issue is whether the value of the function, viz. Slkn' satisfies the a-expression or not. If these conditions are not satisfied then the pfunction evaluates to the empty node Slk0. Example First consider the list and the binary tree data types. The p-expression ( for the sake of brevity, assume a 1-level structure) for the list data type may be defined as: i<>i+l. To define a "p-function" for the list we employ Church's Lambda-notation (see appendix C), so that the p-function of the list is p(z,i)=Xi. (i+1). This function takes a node, i, and a list configuration z as its arguments and returns a node labeled (i+l) corresponding to its 1-successor, if any, in z.'' For the case of the binary tree data type; The corresponding p-functions are: p-functionL(z,i)=Xi. i=0-1, 2i p-functionR(z,i)=Xi. i=0+1, 2i+1 The a-expressions for both the list and binary tree are i>1.'' Note that the p-function and the p-expressions are not necessarily related, as implied in this example. In majority of the cases, however, this approach enables one to use these p-functions recursively in order to enumerate the set of all nodes which are accessible. Also a node <lki> is represented by i, since 1 and k are assumed fixed.

60 Next we look at the p-function for the stack data type: This is defined recursively, so that it always yields the "top" node of the stack. Assume 1 and k are fixed, then p(z,i)= Xi. Vj(i>j)+i, p(z,i+l) Clearly the expression Vj(i>j) is satisfied in the presence of olk if s(i) is equal to the "topmost" node in some Slk in z. This expression is indeed the a-expression for the stack data type. A node <l,k,m> of a data structure configuration, z, is accessible at level 1 iff <l,k,m>CRlk, where Rlk is the set of nodes {i: al(i)} and al(i) is the a-expression associated with level 1 of z. For example, for every stack configuration (Slk,ulk), the set of accessible nodes is: {i: Vj(j-i)}. Similarly for an array configuration, (Slk,ulk) lk={i: i1. It is needless to say that the predicates defining the latter two sets are the aexpressions for the stack and the array data types respectively. The notion of accessibility of nodes is particularly useful in the area of maintaining the integrity and security of data in a data base environment. For example consider a set of data items shared by many users. In such an environment each user (or group of users) may have his own a-expression associated with the data in a manner that would confine him to his own allocated area(s). Such an

61 arrangement implies that the given set of data items would behave differently to each user. Three types of primitive functions have now been identified. To characterize a data type these functions are put together to form a "characteristic operation set" of that data type. Definition 3.5 A Characteristic DSO Set (CDSO), %, with respect to a data structure ZL, is defined as a set: *=IUDUP where, - I is a set of i-functions, il, one for each level 1IL of ZL' - D is a set of d-functions, dl, one for each level 1EL of ZL' - P is a set of p-functions, P1, one or more for each level 1<L of ZL, such that for every z~({SL10},l), if gl ~IUD and r z-=gl (''g12 (g (('11'kl'm1)'12'k2'm2)' ln'k n mn) is defined then for every 1 and every k, the destruction set DSzk is non-vacuous. lk So far we have talked about three types of predicate expressions: p-, b- and a-expressions. These expressions may be associated with different levels of a data structure in the manner described earlier. We shall refer to these

62 predicate expressions, associated with each level 1, as the base predicate expressions(set), n1, associated with that level 1 of ZL. 3.2- A Definition of Data Types In chapter 2 the notion of data structure was introduced. In the preceding section we presented a set of operations that may be associated with a data structure. Using the elements of this set of operations one may make transitions from a configuration of the data structure to another. Recall that Z is the largest set of Z-structures which is implementable wrt some family of p-expressions p. We are often interested in only a subset of Z. That is only those configurations in Z that are "valid" configurations. For example if we are interested in list configurations of length less than 10 (say) then only a subset of the set of all lists is of interest. Let us now present a definition of data types. Definition 3.6 Let Z be a data structure and * be a CDSO set defined wrt Z, then t=(Z,I) is a data type. Definition 3.7 Let t=(ZL,*), then a configuration of t is a pair (z,o) where zCZL is a data structure configuration and ~ is the set of DSO's.

63 Let us now introduce the concept of reachability of the elements of a data type. Definition 3.8 Let z, z'EZL. Let G=IUD denote the set containing ifunctions, i, and d-functions, d, of $ defined wrt ZL. Then z' is reachable from z wrt p, denoted zl- zi, if for some geG and 1, k, meN:' 7 1) z'=z or 2) z'=g(z,l,k,m), or 3) z"l-*z and z'=g(z",l,k,m) Definition 3.9 Let t=(z,*) and t'=(z-,*)et;'1 then the reachability relation(p-relation) on t is: p={ (t,t'): z1-*z' } In some cases where i is understood, we may say (z,z') is in the p-relation if (t,t')Cp. Lemma 3.3'17 z=z' means S=S' and u=u', where z=(S,u) and z'=(S' u' ). 15 t=(z.4)~t denotes zZZ where t=(Z~ip).

64 Let t=(Z,*) be a data type such that for zeZ and every 1 and k, Olk satisfies the b-expression for insertion, bl(i), with every s(i)cTCSlk, and Olk satisfies the al expression, al, for every 1 and k, then z is reachable from 4 wrt i. Proof In order to simplify the discussion, we shall only consider a single-level structure. The proof for the general case follows in a straight forward manner. In chapter 2 it was shown that, every data structure Z is recursively enumerable. Using this fact we can show that every z=(S,u)EZ, is reachable from 4 wrt i. We employ an inductive argument on the number of nodes in the configuration z=(S,u). Let zn denote the configuration (Sn,p) where n denotes the number of nodes in Sn other than the empty node S110. Then z0=({S110},p) is indeed reachable from % by definition. Let us now assume that for every configuration, z n-, with n-1 nodes, zn-1 is reachable from %, then the question is if every zn=(Sn-1 {S lj},p)EZ is reachable from ~. Let n=(Sn u lj,)EZ and let us assume that (Sn i,p)=Zni, then i(zn- 1,ij)=zn. The latter equality is true, since S1ij must have satisfied the p-expression with some existing n-i node of z 1 otherwise znz. This is of course a contradiction. Therefore S 11ETCS and also S11jCS of Zn-1, ~~i ti

65 recall that the b-expression is also satisfied. Hence by the definition of i-function zn=i(zn-ll, 1 1,j). Therefore zn is reachable from 4. Using the above lemma, we'can immediately say the following. Corollary 3.2 Let t=(ZL4,); let VI —z'=(S',P)CZL, if b (Slkm)=true and al+1 (Sl+l,q,k)=true for some q and 1<L, then lWlZ=(S U{Slkm} t P)CZL. Corollary 3.3 Let t=(Z,*) be a data type; for every zcZ if 4I-%,z then zjProof The proof is a consequence of the definition of $. Theorem 3.1 Let t=(Z,*) be a data type. The reachability relation, p, defined on t is an equivalence relation. Proof The transitivity and reflexivity follows from the definition of p. The symmetry is a consequence of corollaries 3.2 and 3.3.

66 So far we have been examining a global view of data types. In the next section we shall delve into the properties possessed by each data type. 3.3- Data Types as Lattices In section 3.2, it was shown that some elements of the structural component Z, of a data type t=(Z,*) may be either 4 or i(z',l,k,m) where ic*, z'cZ and z' is reachable from 4. These elements of Z may be represented in terms of the elements of the corresponding "word algebra", also known as the Herbrand Universe. For example a stack of length 3 may be represented by its i-function, PUSH, namely: PUSH(PUSH(PUSH(,,1),2), 3). Similarly a binary tree configuration may be represented by i(i(i(,1),2),5). Consider a data type t=(Z,*), then the word algebra, WORD( *)=(W,*), of t is defined as follows. For each constant in Z there exists a term f0eW of rank zero. Also for each non-negative integer there exists a term f0EW of rank 0. The elements of rank zero are called the generators, or alphabet, of the word algebra, WORD(*). For every i- and/or d-function f4 e, f41'(x,x x there exists an element of W defined as any expression (also called word or term) of the form f4x1x2x3x4 of rank m, where m is a positive integer, and there exists at least one xi of rank m-1 and no x;, 1<i<4, with rank greater than m-1. It

67 should be noted that for a data type t=(Z,*), there exists only one constant, namely the starting configuration 4. The alphabet of the corresponding word algebra would then contain the generators corresponding to ~ and the elements of the set of non-negative integers N. From now on, for the sake of clarity, we have taken the liberty of using the symbols ",", "(" and ")" in the representation of the terms of the word algebra. Using the above representation, there are many terms of the word algebra, of a data type t, that represent the same configurations of t. Only one representation of such terms would be desired. Consider for example the term push(pop(push(4,1),1),1) which represents the same configuration of a stack as push(4,1) does. Under such circumstances we are only interested in a single representation. Let Wt represent the set of terms W-R(N), where t=(Z,*) and word(*)=(W,*) and R(N) is the set of elements of rank zero representing the elements of N. The set Wt contains one or more terms that correspond to some configuration zcZ. Also in Wt there are certain terms that do not make any sense as far as t is concerned. In other words they do not represent any of the elements of Z that are reachable from 4. For example we may have a term in Wt, where t is a stack (say), in the form of

68 pop(pop(pop(4,2),4),5).' Such a term does not of course make any sense in the stack data type. As pointed out above there may be more than one way of representing a configuration of a data structure using the corresponding word algebra. In order to have a unique representation for every data structure configuration of a data type t, an equivalence relation is defined on Wt; this relation will be referred to as the "u-relation." A urelation, u, partitions Wt into equivalence classes resulting in a quotient set Wt/u. The definition of u on Wt will be presented shortly after we distinguish between the two kinds of terms that one may find in Wt, viz. either the "non error" kind or the "error" kind. The non-error terms are defined recursively as follows: 1) 4 is a non-error term, 2) g(t 1 k m) is a non-error term if t is a non-error term and g(t,l,k,m) is reachable from 4, where t is the corresponding value of t in Z 2 3) There are no other non-error terms. The set of all the non-error terms is referred to as NET. Clearly NETCWt. The second category of terms in Wt, namely the set of terms t*NET, as noted earlier, are the error terms. The set of error terms in Wt, where t=(ZL,*),'' The 1- and k-values for the node indices are assumed to be some fixed value, hence not specified. 20 For a term g(t,l,k,m), g(t,l,k,m) denotes the value of the function g.

69 may be generated as follows. Let gl. be either an ifunction or a d-function in $, where lj<L. Then t'=gl (t,l,k,m) is an error term, and is referred to as an overflow term if at least one of the following is satisfied. - g91 is an i-function and t is reachable from ~ and t' is not reachable from t. - t is not an underflow term (see below) and not reachable from 4 and gl is an i-function. - t is an overflow term. Similarly, t'=gl (t,l,k,m) is an error term, denoted as an underflow term, if one of the following is satisfied. - gl1 is a d-function and t is reachable from 4 and t' is not reachable from t. - t is not an overflow term and not reachable from 4 and 91, is a d-function. t is an underflow term. In summary, there are two types of error terms: the overflow and the underflow. The former set will be denoted as OFT and the latter will be referred to as UFT. Let us now define the u-relation. Let t1, t2EWt; t1 u t2 iff either 1) Both tl and t2 are non error terms and t1=t2,.21 or 2) Both t1 and t2 are overflow terms; or 3) Both t1 and t2 are underflow terms. 21 t1=(S1.u1) = t2=(S2,u2) means S1=S2 and u1=u12

70 If (tl,t2)cu, then this may be denoted as "t1 is uequivalent to t2." It is obvious that the u-relation is an equivalence relation. Hence Wt can be partitioned into equivalence classes. To choose a representative for each class and to be consistent in our approach, we select a term of that class known as the "minimal form" of the terms in that class. A minimal form(MF) of a term, t, representing a configuration, z, is a term tMF which is free of redundant deletion or insertion function symbols. We shall shortly demonstrate that for every data type the underlying system is "Church-Rosser" [CHU 36, ROS 70, SET 74]. Hence every non-error term may be- condensed to a' unique MF. In order to have a formal notion of MF, and indeed to show the existence of such a unique representation for each configuration, we adopt the following approach. A replacement system, (W,=>), is a set of objects and a transformation on the objects[SET 74]. Let us assume that, for our purpose, W is the set Wt, where t=(ZL,*). Define => as a transformation on Wt as described below. Let t,, t1, t2cWt, and gl1 denote either an i-function or a dfunction symbol, then t1 reduces to t2, or tl=>t2, if at least one of the following rules is satisfied. Rule 1: t1=dl(gl (Opgl (il(t,l,k,m),11,k1,m1)..),ln, kn,mn),l,k,m) and t2 =gln('.g91 (tll'kl'kml) )lnknmn) AND t1, t2eNET.

71 Or Rule 2: t(i. i1 (G i1 (t, 11k1,ml), 12,k2,m2).,lnknmn)cNET n 2 1 where teNET and there are no instances of a d-function symbol in tl, and t2i (...i(i1 (i(tl2,k2,m2), lik1 m1 ) -.ln,k mn)cNET 1 2 such that at least one of the following conditions is satisfied: i) 12>11 ii) 12=11 and k2<k iii) 12=11 and k2=k1 and m2<m1. Or Rule 3: t1cOFT (UFT) and t2=oft(uft), where oft(uft) is any one of the terms in OFT(UFT) which may be selected arbitrarily. The following theorem shows that when a non error term tl reduces to a term t2, the value of the term does not change, i.e. if t1=>t2 then tE=t Theorem 3.2 Let t=(ZL,I) be a data type, and let t1, t2cWt be non error terms; if t1=>t2 then t1=t2. Proof There are four ways of performing a reduction. These are considered separately below. i) let gl denote either an i-function or a d-function at level 1; and let n(''gl (il(t1l'k'm)'1klml)''')'ln'kn'mn ),l',km)

72 and t2=gl (...gl (t'11,k1,m1)..' )'ln'knmn) n 1 Let us now assume that the set of nodes inserted in the configuration represented by t1 is Si, and let the set of nodes that are deleted in t1 be Sd* Then tl =(Si-Sdp) where p is the family of p-expressions associated with ZL. Similarly, the value of t2 is t2 (Si-{Slkm}-(Sd-{Slkm} ),P)=(Si-SdP)=t1 1= (...i1 (i1 (tl1 kimi)l2,k2m2)...l,k,mn) ii) Let t1 (***i (i ln 1 n 12 1 1 2 2 2 where there are no occurrences of a d-function symbol in t1. If 12>11 and t2i1 1(. 1(i1 12(t,l2,k2,m2),1,k1,m1))...ln,kn,mn) then we want to show that t =t2. Two cases are considered: a) Sl k m contains a structure in which S1 k m 2' 2' 2t1' occurs. This cannot possibly be the case, since it implies insertion of a node without the presence of its supporting node. b) S does not contain a structure in which l2' 2'm2 Sllkl,ml occurs. In this case the two insertions are independent in the sense that if the higher level node is inserted first, with the insertion of the lower level node next, it cannot possibly alter the value of the term t1. iii) Let t1 and t2 be as in case (ii) above and 12=11, Commuting il and il does not alter t1 because of the fl g T 2 following. The node S1 k m is inserted after S~ k m in

73 ti where k2<k1, the latter implies that m2<m1. Consequently if m2<m1 and S2,k2,m2 can be inserted after S1 k then 22' k22 1' it can surely be inserted before it without changing tl. iv) Finally, consider the case where 11=12 and kl=k2 which is similar to that of case (iii) above. We can also think of => as a binary relation on W, i.e. tl=>t2 may also be expressed as (tl,t2)e=>' Let =>0 denote the identity relation, and let =>i= =>.=>i-1 for all i>O, where "." denotes composition of the two relations => and => i-1 i.e. if x=>y and y=>i lz then x=>iz. The reflexive closure of =>, denoted by =>#, is given by =>#= => =>. The reflexive transitive closure of =>, denoted by =>*, is given by =>* => => =>. An element teWt is irreducible or U U U... in minimal form(MF) under => if there is no t' such that t=>t'. The completion of =>, denoted by =>' is {(x,y): x=>*y and y is irreducible }, and if (x,y)e=> then y is the MF of x. A replacement system, (W,=>) is finite if for all xEW, there is a constant kx such that if x=> y then i<kX [SET 74]. Finally, a finite replacement system is Finite Church-Rosser(FCR) if =>" is a function. Note that the latter implies a unique MF for every element. Theorem 3.2 Let t=(Z, 7) be a data type and (Wt, =>) be the corresponding replacement system. (Wt, =>) is Finite ChurchRosser(FCR).

74 Proof To show (Wt, =>) is FCR two conditions must be satisfied(see Theorem 2.1 in [SET 74]): a) (Wt, =>) is finite, and b) for all zl,z2,w,x in wt, if Zl and z2 are equivalent and z1=>w and z2 =>x imply that for some y and z, where y is equivalent to z, w=>*y, and x=> z. This is shown pictorially figure 3.1. implies u-equivalence. z1 -' t 2 W x y z Figure 3.1. Reachability of Elements in Z. To show that the above condition (a) is satisfied, consider the 4 types of reduction that may be performed on a term tcWt. The number of times that the reduction rule 1 may be applied on t is bounded by the number of d-function symbols in t. The number of times that the reduction rule

75 2(i) may be applied on t is also bounded by a constant multiple of the number of i-function symbols in t less the number of d-function symbols. And indeed the remaining two types of reduction are bounded by even a smaller number than that of 2(i). Hence for any given term t the total number of reductions is bounded. To show the satisfaction of condition (b) consider zl,Z2EWt such that z1 is u-equivalent to z2. Therefore if 1 =(S1lU1) and z2=(S2,u2), then (S1,u1)=(S2,u2). If zl=>W, where w=(Sw,uw), then (Sw,uw)=(S1,u1). This is true since the => transformation does not affect the underlying structure represented by z1, namely its value. By the same argument, y=(Syuy)=(Sw,uw)=(S1,ui) and =(S,uz)=(S,ux)=(S2,u2). Therefore (Syuy)=(Szz), i.e. y and z are u-equivalent. As a result of the above theorem, we conclude that, there exists a unique minimal form that every term of a data type may be reduced to. Clearly all the terms reducing to the same MF are members of the same equivalence class of u. For example a stack configuration of length 3 may be represented by any of the following formulas, where i denotes the push function and d denotes the pop function, d(i(i(i (d(i(i())))))) i(i(d(i(i(%))))) i(i(i(%)))

76 The latter form represents the MF, and would be considered as the representative of its own ou-equivalence class. Finally let We denote an epimorphic image of NETuUFTuOFT containing all the MF's of the elements of Wt; such that We contains exactly one element, the MF, corresponding to each equivalence class of Wt/u. Thus We contains oft, which is the image of all the elements of OFT, and exactly one element, uft, which is the image of all the elements of UFT; and all the MF's of the elements of NET. Note that this mapping is no more than the completion of =>; i.e. W=-{y: x=> y & xcWt } With the above set of terms, We, we can define a word algebra of minimal forms, namely (We,*), as follows. Let tMF denote the MF of teWt, then if tMFeWO and g is either an i- or a d-function in J then g(tMF,l,k,m)=t4F,22 where t'=g(tMF,l,k,m)cWt. We may now show that for every element zpZ, of a data type t=(Z,%*), one can find what its corresponding minimal form representation is. Lemma 3.4 Let t=(Z,*); for every z=(S,p)eZ we can decide what its corresponding MF representation, tsWe, is. a Note that we have used the same function symbols for both (W 4) and (We,4); the context should remove any

77 Proof Let z=(S,p) and let tMF denote the MF of t. Construct tMF as follows: if S=tSlk}I then tMF=4 else starting with the nodes SlkmcS with the largest first index 1 in S, write T, where T-i(...i(i(~,l,k1,m 1)1,1k2,m2)...,1,kn-mn) such that kiSk2<..e<kn and ml<m2<mn.(In case of ambiguity the k-index, has the higher precedence.) Next select the nodes Sl1_ lkv,mS, if any, and write T' as follows: Ti(..i(,l-,k1,m ),...,-1,kn, n Repeat the above procedure until all the nodes in S are exhausted. The resulting term, t, is in MF since the above node selection process is no more than the application of rule 2 of reduction discussed earlier. Test to see if E is reachable from 4 wrt I. It should be noted that if the insertion is made in the sequence of nodes in t, then t is not necessarily reachable from 4 in that sequence. Thus testing for reachability must be such that if there is a "valid" way to get to t then t is considered reachable irrespective of the sequence of nodes in its corresponding MF. In general if t is given, form the set S={Slki:<l,k,i> occurs in t}. By lemma 2.2 we can decide if (S,p)EZ or not, if it is not in Z then tMF=oft, else proceed as follows. For every intermediate term T, T', T",... and finally t we can perform the reachability test as prescribed below. test T to see if T is reachable from 4, if so

78 test T' to see if i' is reachable from 4, and so on ~~-n ~~-n-1 At each stage to see if Tn is reachable from Tn, test for every 1 and k values, that occur in Tn but not in Tn-1 all the possible sequences of insertion of the node indices for -n a given 1 and k. If for at least one sequence T is n-i n n reachable from Tn 1 then TMFWoft, i.e. TMF may be obtained by the above procedure. Finally the fact that t is unique follows from theorem 3.3. Theorem 3.4 u is a congruence relation on (Wt,*). Proof u is obviously an equivalence relation. To show that u is also a congruence relation it must be demonstrated that it exhibits the substitution property. Assume tut', i.e. t and t' are i-equivalent; then the claim is that T=g(t,l,k,m) u T'=g(t',l,k,m) where g is either an i-function or a d-function in ~. Now, since tut' then tE=t', therefore T=g(t,l,k,m)=g(t',lkm)=' But if T=T' then TuT', i.e. g(t,l,k,m) u g(t',lk m) Note that if T is an error term then A' would also be an error term of the same nature, i.e. they both would be

79 either overflow terms or underflow terms. As a result T would still be i-equivalent to T'. Notation- In the following discussions i*z2 implies an application of a sequence of length zero or more of the same or different i-functions. Similarly in d*z1, d* denotes zero or more applications of d-functions. Also i# (d# denotes a sequence of at least one i- (d-) function. Such sequences of all i-functions (d-functions) are referred to as imonotonic(d-monoton ic). A bitonic sequence is obtained as follows. A monotonic sequence is bitonic; if B1 and B2 are bitonic then B=01iB2 is also a bitonic sequence. The node indices <l,k,m> may be dropped for notational brevity in places where the ambiguity is not important. Thus, i(z,l,k,m) is denoted by i(z) or iz. Finally we may use (ik)j to denote a sequence of, possibly distinct, i-functions of length k. Lemma 3.5 Let t=(Z,*) be a data type. Every term tENET is uequivalent to a term i*4eW. Proof The set We contains the MF of every term teNET. Consider a term teNET, t cannot be d#p. If so then t=d*d4. Let us consider the value of t'=d4, namely t'. According to

80 the definition of d-function, in order to have a value for d+, the node to be deleted must be in the DS of the configuration, in this case, ~. But the DS of the ~ is empty since the TDS of 4 is empty since there are no nodes in 4 other than the empty node, Slk0. Hence d~ is an element of UFT, thus t~NET. Now, if t is an i-monotonic sequence concatenated with 4, then, by rule 2 of reduction, t may be reduced to its MF in the form of i*4eW8. If t is not a monotonic sequence, by definition of =>, for every deletion of a node there must have been an insertion of that same node. This is a necessary condition since for a node to be deleted it must be an element of TDS. The latter implies that the node must have been inserted into the configuration otherwise deleting a non-existent node would result in tcUFT(OFT). The latter is of course contrary to our original assumption that teNET. Thus by multiple applications of =>, using the first rule of reduction, all instances of the d-function symbols may be eliminated resulting in an i-monotonic sequence concatenated with 4. This would then reduce the problem to the one we considered earlier. Every element teWe-{uft,oft} corresponds to a configuration zeZ which is reachable from o. The configuration z=t is the value'of t. In order to define values for the two terms uft and ofteW we introduce two e

81 special elements, namely top (T) and bottom (I), associated with every data type t. These two elements will be referred to as the improper elements of t (or Z). Note that they are not necessarily elements of Z. The improper elements are intended to "absorb" all the error instances of that data type. We may think of T and l as representatives of all the elements that may, or may not, be in Z and not reachable from 4. The value of oft (uft) is defined as the T (I). Note that the value of a term teWt, where t is u-equivalent to oft (uft) is also defined as T (I)Intuitively, the uft and the oft elements are really a "trap" as defined by [AFS 80]. It should be noted that T and J are not reachable from 4 or any configuration which is reachable from 4, whereas oft and uft are "monotonically reachable" (see below) from every towg. Let us now examine some of the properties of the word algebra of a data type t. We define a monotonic reachability relation(MRR), >>, on Wt as follows. Let tl,t2ewt, tl>>t2, read tl is monotonically reachable from t2, if tl1i t2 (tl=d*t2). Note that the monotonic reachability relation is not symmetric but it is both reflexive and transitive. Henceforth, unless otherwise specified, we shall only deal with the cases where tl>>t2 implies t1=i*t2, where i* is an i-monotonic sequence. Above definition defines MRR on Wt; we can also define the MRR,

82 >>, on the corresponding set We in a similar manner. 23 Let tli t2eWs, then t1>>t2 if t=i t2 OR jtEWt (t v t1 & t=i t2 OR t u t2 & tl=i t) Lemma 3.5 Let t=(ZLo) be a data type. Then the Monotonic Reachability Relation, denoted by >>, is a partial ordering on We. Proof To show >> is a partial ordering, it suffices to demonstrate that it is transitive, reflexive and antisymmetric. Let t 1 t2eWe. The transitivity and reflexivity of >> is obvious from the definition of >>. The relation >> is also antisymmetric since if t1>>t2 then for some sequence i1 i1 t2=tl' If t2>>t1 then t2i2 tl for some i-monotonic sequence i2. The only possible way for t =i1 t2 and t2=i2 t1 to be true is if t2=i t1, therefore t2=t 1 Theorem 3.5 For a data type t=(ZL,*), (We,>>) is a lattice. Proof First consider a single-level structure. We have already shown that (We,>>) is a po. To show that it is a 23 Note that we are using the same symbol >> in both cases of W and W to define MRR on these sets. The context removes an ambiguity.

83 lattice we just need to demonstrate that for every pair of elements t1, t2eW8 there exists a unique lub and a unique glb [STO 77]. Let t=i(... (i(i(~,m1),m2)... mM)EWO, then its corresponding structure, if any, is defined as (S,9p) where S={Sj: j is a node index occurring in t}. Let t1, t2eNET, define lub(t1,t2)=t3 where t3 is the MF of the term representing (S1 U S2', ). Recall that by lemma 3.4 it can be decided what the MF representation of z=(S,p)EZ is. Define glb(t1,t2)=t3, where t3 is the MF of the term representing (S1 A S2, P). If tl and/or t2 are error terms, then define lub(t1,oft)=oft lub(t1,uft)=t1 glb(t,oft)=t1 glb(t1,uft)=uft If lub(tl,t2)=t3 then t3 is unique. That is if there exists a t>>t1 and t>>t2 then it must be the case that' t>>t3. To show this, let us assume that such a term exists, i.e. t3>>t and t>>t1 and t>>t2. If this is the case then t corresponds to a structure (S,p) as defined above. Since t>>t1 and t>>t2 then either S must at least contain both S1 and S2; i.e. the least value of S=S1 U S2 which is indeed the same as S3 defined above or t is oft. The former case implies that t=t3. The latter, i.e. t=oft, is true only if t3=oft, hence t3 is unique.

84 The case for glb(tl,t2) is very similar to the above. That is if glb(tl,t2)=t3, then there exists no other term t such that t 1>>t and t2>>t and t>>t3. Employing the same approach as above, assume t>>t3, then S, the set of nodes corresponding to t, can be at most S1 A S2, which is exactly the same as S3. Hence t=t3. Finally for the general case of L-level data structures the proof is simply a straight forward extension of the above. The CDSO set, defined earlier, requires the presence of both the i- and d-functions, as well as the probe functions, for every level of the data type. However in order to generate all the possible configurations of a data type t=(ZL,*), not every element of * is necessarily needed. One only needs the "generators" of t which are defined below. Definition 3.10 Let t=(Z,*) where *=IUDuP. A generator function (or a constructor) of t is an i-function i (a d-function, 6) such that if zcZ and z is reachable from 4 wrt i, then z is not reachable from 4eZ wrt *-{h} ( *-{6}). Note that for those data types that we have been dealing with so far, i.e. with no primitive nodes, the only generators are the elements of Ic_. Clearly, in the stack,

85 push is a generator function; whereas the function pop is not since there exists no d-monotonic sequence in the word algebra of MF's of the stack, i.e. the starting configuration followed by a sequence of pop's. The following results may be readily concluded using the latter theorem 3.4. Corollary 3.4 Let t=(Z,*) be a data type, and let (We,*) be the corresponding word algebra of MF's, then uft is the least fixpoint [STO 77] for every d-function de*; and oft is the greatest fixpoint for every i-function icj. The above corollary implies that the i- and d-functions in (We,*) are "strict" functions [STO 77]. Thus if a term t is an element of OFT (UFT) then no matter what sequence of i- or d-functions are applied to t the result is always an element of OFT (UFT). We can reassert the fact that every configuration of zeZ, which may be represented by an element of NET, is monotonically reachable from 4. This is demonstrated below. Corollary 3.5 Let t=(ZL,4), then glb{t: teW^ A tcuft }=4

86 Corollary 3.6 Let t=(ZL,*) be a data type and let X=(We,>>) be the corresponding lattice. Then for every configuration, if there exists n nodes at level 1<L which contain Sl lk,0' then that configuration is the glb of n sets of terms of We, say Wi, where each (Wi,>>) is a sub-lattice of X and isomorphic to the lattice of the structure defined at level 1-1. Intuitively, the above corollary states that, in every node, Slki, we can construct all the possible configurations of the structure defined at the next lower level(s). 3.4- The Primitive Data Types The data structure operations, defined earlier, explicitly referred to levels one and above. Thus level zero structures, to which we referred to as primitives, could not be subjected to any insertion, or deletion, operation defined earlier. In this section it is intended to justify this exception and formalize the concept of what is commonly referred to as the "primitive (data) types." There are a number of reasons that has made it apparently desirable to treat primitive data types differently from any other ("non-primitive") data type. Firstly almost all the conventionally accepted, primitive data types such as the integer, real, char etc. have been

87 taken as already well established (defined) by the system (compilers). Secondly, primitive data types are assumed to be well understood concepts, for instance the integer and the real data types. In most cases there exists a mathematical model to describe such data types. This is exemplified by the Peano's axioms or the theory of real numbers. But both of the above reasons are not, in our view, valid arguments. The former point raised the issue of primitive data types being "built-in" data types. However it is not at all uncommon to have a machine with a built-in stack! Does this imply that the stack is a primitive data type? The widespread use of the stack data type as a builtin (hardware) data type strongly raises the importance of implementing many more non-primitive data types in a likewise manner. As for the second issue, raised above, although most commonly used primitive types are well understood concepts there may be other primitive types that are not so well understood. For example, we may want to specify a subset of the integer; or describe the days_of_week data type. Such concepts are presently being employed in some high level languages such as Pascal as the "user-defined" data types. Most, if not all, authors employing graphs have not presented specification schemes that are equally applicable to primitive types [MAJ 77B, EAR 73, SHN 74]. In contrast

88 the advocates of the algebraic approach do not make any distinction between primitive data types and other nonprimitive data types. However, their treatment of parameterized data types does imply an implicit presence of the primitives as "parameter types"[GHM 76, TWW 79, GOG 78]. In their approach a data type such as the stack may not be defined independently of a "primitive" or a parameter data type. Hence the stack is defined as stack-of-( ). As a result a great deal of work has been unnecessarily added to formalize the parameterization of data types. In our model, however, this is not the case. All data types may be specified independently of other data types as prescribed by the model. Hence parameterization is just another "type manipulation operation"(see the next chapter). Informally, a primitive data type is defined as one which is "transparent" to the user. In a stack-of-integers, the stack represents the organization of the integer, i.e. the discipline under which they are maintained, whereas the integers are concrete objects both written into and read from the stack by the user. By the same arguments, in a list-of-char, the char constitutes the primitive data type. Because of the indivisibility of the primitive data types, we have designated level 0 of E-structures as primitive nodes. This approach enables us to define data types irrespective of their "parameter" data type. The latter can always be embedded in the former with a type manipulation operation.

89 3.4.1- Operations to Characterize Primitive Data Types A primitive data type is a data type whose structural component is always of level zero. As in any other data type it needs a set of operations to characterize its behaviour. The integer, for example, may be characterized by a characteristic set of operations, viz. its "i-function" and "d-function", suc and pred. The probe function is no longer of necessity since at level zero the structures are indivisible. A characteristic data structure operation(CDSO) set for a primitive data type is composed of only the insertion functions and the deletion functions. An insertion function would cause a transition from a given configuration to another. A deletion function would do the inverse of an ifunction. The correctness of these functions is ensured by their boundary conditions which are maintained by their appropriate b-expressions(see later). Consider the string data type with 26 "constructors", namely the 26 letters of the alphabet. Each application of any of these i-functions results in a configuration which may be represented by the concatenation of that particular letter of the alphabet to the current configuration. For instance, starting from the null string ~, i1 applied to the null configuration results in "A" (say), which is represented as il(4). Moreover i22(i1(~))=VA, and i13(i15(i22(i1()))))NOVa.

90 We have already pointed out that a necessary condition for a characteristic set of data structure operations is to ensure that any two configurations are reachable from each other wrt this set. Now consider the bool data type with the two i-functions i1 and i2, where i14 represents FALSE and i24 represents TRUE. These two i-functions, on their own, do not yield a complete DSO set, since i1(4) is not reachable from i2(4) and vice versa. An obvious solution would be the presence of the "NOT" function. However in order to establish a consistent approach, and indeed a more general one, we require a deletion function for each ifunction of bool, namely d1 and d2, such that d (i (b))=b and d2(i2(b))=b As a result false is now reachable from true and vice versa. A characteristic DSO set for a primitive data type is composed of two types of "primitive functions": the ifunction and the d-function. If t=(Z0, ~) is a primitive data type, its i-function i is'a mapping: i: Z-, z, similarly, the d-function d: Z0 +. A d-function causes the inverse of an i-function. It "undoes" what its corresponding i-function "does." As a result if i2(i(i12(i1(i1 2(i12( ))))=BALL then we get the equation: d(i2(i1 (i12(i12())))))=ALL=i1 (i 12(i12() ) ).

91 Definition 3.11 A primitive data type is a pair (Z0, ~), where Z0 is a 0-level E-structure and * is a set of partial functions defined on ZO such that for oce o: Z0 - Z0 In chapter 2 we introduced a numbering scheme for the nodes of a Z-structure in which each node could be uniquely identified (cite definition 2.2). Thus a node <1, k, m> describes a node at level 1, contained in the kth node of level 1+1, and m indicates its index. For the primitive nodes however the latter is not of any value since primitive data structure configurations are indivisible. Therefore a different meaning is attached to m. For a primitive data type, m designates a numbering of the data structure configurations of that data type. More simply, each integer m may represent a unique configuration of the primitive data type. For example the ASCII code. for string symbols indicate the value of m for each individual letter of the alphabet, i.e. 41 represents "A" and 42 represents "B", and so on. Each configuration zeZ0 is merely an indivisible element rather than a pair (S,u) as in the case of the nonprimitve data types. Furthermore, the p-expressions are of no more value in the case of the primitive data types. With the above convention of indexing for a data structure

92 configuration terminated with primitive nodes, each node Slqk contains at most one element SOkm where the index m designates a configuration of the primitive data type. The insertion and deletion operations at level 0 would of course differ from that of any other level described earlier. This is discussed below. 3.4.2- Constructors of Primitive Data Types Before delving into the definition of the i- and dfunctions, let us introduce the b-expressions for the case of the primitive data types. The b-expression for primitive data types perform basically the same'function as in the case of the non-primitive data types. However there are some minor differences between the two types of expression. The bwff's for primitive data types contain exactly one free variable as before. For primitive data types, however, the free variable denotes an element zcZ0 rather than a node index. Every i-function has a b-expression associated with it, so does every d-function. There may exist more than one b-expression, one for each i-function and one for each dfunction. The function of a b-expression, associated with an i-function(d-function) is basically to determine for what elements zeZ0 is the i-function(d-function) defined. For example, if bi(z), the b-expression for an insertion function i, is satisfied then i(z)cZ0. Similarly if bd(z), the b-expression for a d-function d, is satisfied then d(z)eZ. For example, if d denotes the pred function for

93 the integer, the b-expression for d, denoted bd, has only one free variable and is defined as: bd(z)=true if z~# =false otherwise which implies that the predecessor function may be applied to any integer (.the configurations of the integer), including zero, but not to the starting configuration, 4. On the other hand, the i-function to represent the integer 0, i0 (say), has a b-expression defined as: b. (z)=true if z=4 1o =false otherwise One may use the d-function, mentioned above, to reach the integer "-2"(say) starting from "0", i.e. d(d(i0())) The definition of the structure in which the satisfaction of a b-expression, for primitive data types, may be determined, is slightly different from that described earlier; this distinction is described next. Let ZM be a set of O-level configurations. A representation for the elements of this set may be found in its corresponding word algebra. The set Z0 is simply considered to be the universe of the structure. All the variables, in a b-expression for a primitive data type, denote the elements of Z0 and all the constant symbols are also assigned to the elements of Z0. Finally, any computable function and decidable relation may be defined in the structure as before.

94 We are now in a position to present a more specific definition of the constructors of a primitive data type, namely its i- and d- functions. Definition 3.12 A primitive insertion function, or an i-function, I, for a primitive data type t=(Z0, ~) is a partial function, 1: Z~ ~Z where the domain of i is defined by its corresponding bexpression, b (z), i.e. l(z1) is defined iff bI(z1) is satisfied. Definition 3.13 A primitive deletion function, or a d-function, 6, for a primitive data type t=(Z0, i) is a partial function, 6: z 0.z where the domain of 6 is defined by its corresponding bexpression, b6(z), i.e. 6(z1) is defined iff b6(z1) is satisfied. Let us now define the. characteristic set of operations, namely CDSO, for a primitive data type. Definition 3.14 A CDSO set for a primitive data type t=(Z0, j) is ~=IUD, where I = { i; }; <.

95 D={d }KIK - for every ij there exists a dK, and vice versa, such that for zeZ0: d (ij(z))=z if ij(z)EZo, AND ij(d (z))=z if d (z)e Z. We define the reachability relation for primitive data types in the same manner as we did for the non-primitive data types. As before the symbol "I-" will be employed to denote this relation. If i2% corresponds to the letter "B", then ALLI-i2 d2} BALL indicates BALL is reachable from ALL wrt either i2 or d2. Note that ALL is also reachable from BALL wrt d2 i.e. BALLI —{i2d2} ALL. 3.4.3- Primitive Data Types as Posets In this section we shall present some informal discussions about the primitive data types' and their similarities with the non-primitive data types. In most cases, due to their close resemblance to the non-primitive data types, the results of the preceding sections may be applied directly to the primitive data types. The word algebra for the case of the primitive data types is defined in the same manner as before. Once again we are only interested in the MF's for every term. For example, there are many ways of representing the string ALL.

96 For example, d2(i2(i1(i12(12()2( )))) or i1(i12(i12(4))) are among many ways of representing this string. Obviously every configuration of a data type may be represented by a series of insertion and/or deletion functions. As before we are only interested in a unique representation of each configuration. First let us define the minimal form of a configuration, z, of a primitive data type. This is defined recursively. Let t=(Z0,0) be a primitive data type with ifunctions: i1, i2,...,in, and d-functions: d1, d2,...,dm in p. Then the minimal forms, or irreducible terms, of the word algebra of t are defined as follows: either 1) 4 is a MF, 2) i (z) is a MF if z is a MF and ij(z) is defined, 1<j<n or 1) 4 is a MF, 2) dj(z) is a MF if z is a MF and dj(z) is defined, 1<j5m Using the result of theorem 3.2, every term may be condensed into an irredundant deletion-free (or insertionfree) form using the reduction rule: - if t1,t2eWt and tlE2EZ0 and t2=d (i1(t1))or t2=il(dl(tl)), then t2=>t1. The reachability relation implies two configurations of a data type are reachable from one another wrt a finite numberof application of zero or more constructors. Since reachability as defined before is a reflexive relation then

97 every configuration is reachable also from itself wrt zero number of constructors. Similarly, as for the case of nonprimitive data types, let >> denote the monotonic reachability relation(MRR). The >>-relation denotes reachability wrt a set of either monotonically increasing or monotonically decreasing functions, but not both. Looking back at the string example given earlier, BALL>>ALL but ALL>~BALL, where >> denotes MRR wrt the i-functions only. As in the case of the non-primitive data types, the >>relation defines a partial ordering(po) on the set We of a primitive data type. Furthermore, it may be shown that the primitive data types are also lattice structures under the >>-relation. For every primitive data type there are a number of insertion and/or deletion functions without which one cannot generate (or reach) all the desired configurations. It is precisely this set of functions that is of interest to us. For example for string data type, i1(4), or A, is one such term that is not reachable in the absence of i1. The definition of a generator function for a primitive type follows from definition 3.10 given earlier. The characteristic set of DSO's presented above defines the ground axioms for all the primitive data types. To specify a specific primitive data type our task reduces to defining the boundaries of the desired operations, by way of

98 the b-expressions, such that, nothing but the desired set of configurations would result. Finally we extend the definition of i- and d-functions, for a primitive data type, for those cases where the primitive data type is at level zero of a. data structure. Note that every i- and d-function associated with level zero would then become an element of $, where t=(ZL,*) and ZL terminates with primitive nodes. Definition 3.15 Let t=(ZL,~) be a data type and ZL be terminated with primitive nodes. Let i0( d0 ) denote an i-function(dfunction) at level O. For z, z'eZL and iamb 24 i0(z,O,k)=z' iff z' is exactly the same as z except that C(S;qk)=i0C(Slqk) for some q Also d (z,O,k)=z' iff z' is exactly the same as z except that C(S;qk)=d0(Slqk). In chapters 2 and 3 we identified the essential elements that are required in order to isolate the desired behaviour of a data type. In the following chapter we shall 24 Intuitively, the equation implies that change the content of the node <lqk>, say m=C(<lqk>), to the new configuration i (m). It is noteworthy that for this change to happen, the expression for i0 (or d0) must of course be satisfied with m.

99 examine the concepts of "completeness" and "soundness" of the specification of data types. We shall also look at ways of generating new data types, using the existing ones, in order to get a data type of a different behaviour.

CHAPTER IV COMPLETENESS AND TYPE MANIPULATION OPERAT IONS In this chapter we shall examine the concepts of completeness and soundness of the specification of data types and show that every configuration that is a true configuration may be deduced and vice versa. The notion of "truth" of a term is introduced and used to examine the validity of the configurations that may be reached wrt the operations introduced in the enrichment process. Another useful concept in specification techniques, for data types, is the capability of defining operations in such a way that two or more data types are combined to arrive at a new data type. This should be done in a manner that the newly-defined data type preserves the original properties of the constituent parts. Thus the characteristic operations of the newly-defined data type may be automatically defined in terms of the operation sets of the constituent data types. The "type manipulation operations" (TMO) are intended to facilitate the construction of new data types in terms of the existing ones. The latter may be userdefined or part of the system, i.e. system defined. In chapter 3 it was shown that data types may be modeled as 100

101 lattices. Each lattice structure is formed from a reachability relation equivalence class. The TMO's, therefore, may be considered to be operations on lattices. 4.1- Completeness and Soundness There are many notions of "completeness" defined for different purposes in logic and other branches of mathematics. Logicians, for example, define a complete set of axioms as one with which every wff or its negation can be proved as a theorem, or one for which all models are isomorphic. Sometimes a complete set of axioms is related to the notion of "consistency."'For instance, to say a set of wff's is complete is an equivalent phrase for a "maximally consistent" set of wff's. The latter notion indicates that by adding any wff, not an element of the set, to the given set, we would have an inconsistent set of wff's. Our interest in completeness-is that if a term is true in a data type t, then we must be able to reach this term. The converse of this is also important, i.e. if a term is reached, it must be the case that it is a true term. These notions, we shall refer to, as the completeness and soundness respectively. To investigate the notions of completeness and soundness of a data type specification we need to define the concept of "truth" or "falsity" of a term. For example,

102 given a term t=i(i(4)), one may be interested to know under what circumstances, or what "values" of i and 4, is t, the value of t, a legitimate configuration of a data type t. Thus for certain assignments of values to the function symbols, constants, variables etc., one can claim that t is an element of t. Or we may use the paraphrase: t is true in t whenever the assignment of the functions, constants and variables is understood; or using the terminology of chapter 3, tENET of t. To express the above concepts formally, we need to define the well formed formulas of our language.25 Their interpretation will then be investigated. Let gn denote an n-ary function symbol. An individual variable or an individual constant is a term. If T1, T2,'-,Tn are terms then gn(T T2, ),T n~l, or ~ is an atomic term we 1 formed formula (twff ). 1) An atomic twff is a twff. 2) gn(Y1 Y2, oe ), n>l1, is a twff if yj, l<j<n, are twff' s. 3) Only those formulas obtained by a finite number of applications of 1) through 2) are twffs. As an example of a set of twff's, one may consider the set of non-error terms, NET, of.a data type. Recall that 2 Note that these are not the wff's of the first order predicate logic described earlier. These are an extension of the terms of the word algebra that we talked about in chapter 3.

103 every tENET is only composed of function symbols and constant symbols. Now consider a set of twffs r. Let yer. In the presence of a structure t and an assignment function s, a meaning. may be attached to each twff y; then we can say either "y is true in t" or "y is false in t." The former is denoted by I=ty[s] where s is an assignment function (see below); the subscript t may be dropped when it is clear from the context. For example, if yeNET then I=tY[s]. The assignment function s assigns values to the function and constant symbols in y. In general, s is a function s: Var-ZUN, where Var is the set of all variables: namely l,k,i,.. to denote N, and z, z1, z2,... are individual variables to denote elements of Z where Z is a data structure. 2 Let us now define what is meant by the satisfaction of a twff y in the presence of a "structure" t=(Z,*) and a function s. The symbol t, strictly speaking, is of course a data type. However it is also used to denote a structure, as described below, the context will remove any ambiguities. Let t=(Z,*) and recall that Z={z: z=(S,u)}, where every S is the union of S1 for every level 1. For each level, S1 is itself composed of the sets Slk. Thus when we talk about t, as a structure, we mean a family of structures, Olk' as 2 I z, Z 1,z,... may also be used as constant symbols. The context shotld remove any ambiguity.

104 defined in chapter 2. In addition each Olk is also equipped with the i-, d- and p-functions for that level, 1. The universe of t is ZUSUN+, where S is the set of all nodes, Slki* As in chapter 2, implicit in our notation of the structure, is the presence of the traversal functions as well as any other function which may be necessary depending on the structure at hand. 2 Let ZCZ be the set of configurations that are reachable from ~ wrt *. It is now possible to define "t satisfies y with s", denoted by l=t~[s] if y is one of the following. Let C be a twff, X be a term symbol, and let z=s(C) 1) I=tX[S] if s(x)CZ. 2) I=tg4(~,l,k,i)[s] if I=tc[s] and g4 ~*(1) and either Ss Cs if g4 is an i-function or $(lki)': s(lk) S(lki)eDS2(lk) if g4 is a d-function. And if 1<L, 01+1,k satisfies al+1(Sl+1,q,k) with s. Some explanations, regarding the above definition, are in order. Item (2) imposes the restriction, that for a twff, i(C,1,k,i), to represent a true configuration of a data type t, it must be the case that the value of C itself is a true configuration of t and the value of the node represented by <l,k,i> is in the construction set of (Slk,ulk). Where (Slk,ulk) is a configuration at a level, 1, in z. 27 Every function symbol, g, is assigned to a function g by the structure. Also s(lki)=(s(l),s(k),s(i)), and _ vs p * I,u, * _ v * t1]"[2]"~"

105 If r is a set of twff's, then ri=y denotes that for every structure t and every function s such that t satisfies every element of r with s, t also satisfies y with s. For a given t and s we may write rl=ty[s] to denote that t satisfies every yier as well as y with s. In the following definition y[s] denotes s(y), where y is a twff. Also r[s], where r is a set of twff's: {Yi}, denotes the set {~i[s]}. Finally rl-,y[s] denotes r[s]i-gy[s]; the latter means y[s] is reachable from one or more y'[s]cr[s] wrt * where the underlying logical structure is t=(zl*). Definition 4.1 A data type t=(Z,*) contains a set of twff's, r, if there exists an assignment function s such that for every YEr, 1'I-g s]. Let t=(Z,*) and t'=(Z',*'); we say that t contains t' if t contains W'-{oft,uft}, up to renaming. Note that the above definition implies that for any data type t=(Z,*), t always contains itself. It may also be concluded that for two data types t=(Z,i) and t'=(Z',if Z'cZ then t contains t'. Or, in other words t is a correct extension of t' 4.1.1- Deduction of terms The notion of reachability introduced earlier in

106 chapter 3 deals with the values of the twff's. In other words if y and y' are twff's then in the presence of a data type we may say that y[s] is reachable from y'[s]. On the other hand, if we are not concerned with the underlying structure but interested only in knowing if a term is "deducible", i.e. reachable in a syntactic sense, from another term, we must introduce some rules of inference. Let r be a set of twff's, y, and let B be a possibly empty, set of ground instances (sentences) of b-expressions, b1 (), b1 (n), and a-expressions al(r) associated with Y' yer, where m, n, and r are some node constant symbols. Let us denote the pair (r,B) by rB. Let i and d denote i- and d-function symbols and 1, k, and m be integer constant symbols. The rules of inference are as follows: let yer contain no variable symbols, 1) if yer and bi (<l,k,m>)ec and a +1(<1+1,q,k>)EB then y - i(y,l,k,m) 2) if yer and bid (<l,k,m>)eB then y - d(y,l,k,m) 3) if a=i(y,l,k,m)cr and b1 (<l,k,m>)eS then 4) if a=d(y,l,k,m)~r and b1 (<l,k,m>)EB then IY. We say that y is deducible from rg, denoted r l-y,2' if yer, or y can be obtained by a finite number of applications 2' Note that the subscript ~ of I- is missing to denote deduction rather than reachability.

107 of the rules of inference. In other words, if there exists a finite sequence of zero or more inferences 1-4: a1' a2' a3''''an where the left hand side (with respect to -) of each ai' 1<i-n, is either a term from r, or the right hand side of an aj, 1<j5i, such that the right hand side of an is y. Theorem 4.1 Let t=(Z,*) be a data type; let y and rB be a twff and a set of twff's, with the associated set of b-expression and a-expression sentences B, respectively. If bI, b and a denote the b- and a-expressions, at level 1, associated with t, then let B be the set of sentences b1 (ml ), b16 (n1), and aa(rl), where mi and nl are the constant symbols corresponding to the elements of the CS's and DS's of a[s], at level 1, respectively, and rl are the constant symbols in [s] for all k, where aer. Then rl-,y[s] iff rBl-y for some s. Proof If r-,y[ s] then y[s] is reachable from one or more a[s]cr[s]. Therefore y is some sequence of i- and dfunctions "concatenated" with 4. But if y is such a sequence then yl-4 and fl-y. Now if a[s]cr[s] then al-4, therefore al-y. Now let rl-y, then there exists a term aer such that aI-y since at every step of deduction the bexpression for insertion (deletion) must be satisfied plus the fact that the starting configuration, in this case a,

108 must be satisfied, then [Es] itself is reachable from ~ and y is, therefore, reachable from a[s], i.e. a[s]I-~,y[s]. The latter implies that rl-,y[s] since acr. Theorem 4.2-completeness theorem Let t be an arbitrary data type with the b-expressions b1 and b1, and a-expressions a1 at each level 1. Let rF be as defined in theorem 4.1, then rl=y implies rF8-y. Proof: Let us select an arbitrary data type t=(Z,*) and show by an inductive argument that if rl=ty[s] then y[s] is reachable from ~ wrt i, since, by theorem 4.1, if yEs] is reachable wrt $ then y is deducible from 4. Since t satisfies y with s we may assume that y[s]=iqk where qk is an i-monotonic sequence and I={il,i2,...,in}C*. where iq If rl=tyEs], then r[s]-l=tiq for some sequence iq. The k-1 latter may be written as r[s]I=ijiq, 1, for some 1-j-n. Since t satisfies y[s], then t satisfies ijiq,1 By the k-1 definition of satisfaction, therefore iq,1 is also k-2 satisfied, and so is iq, and so on. Note that in the above argument we have assumed that y[s] is in MF composed of an i-monotonic sequence. The presence of the d-functions as well as i-functions does not alter the above argument. All that is needed is to replace the symbol i with d to denote their presence.

109 Thus Y[s]=ijij ij...i 4, for j, Jl',,jJ<n, therefore, by the definition of reachability, the latter equation denotes: %I-1YIS].... (1) Let us now consider a term y'cr and let dm.be, elements of $, forming an inverse sequence of im, such that if y'[s]=im4 then 4=dmy'[s].29' Therefore y'[s]J,4.... (2) Since - is transitive, therefore (1) and (2) yield y'[s]I-*y[s]. But y' was chosen arbitrarily, hence rl-Jy[s].. B.ut, by theorem 4.1, the latter implies that r I-Y. The above theorem ensures that all the twff's that are true in some t=(Z,*) are deducible and hence their corresponding configurations are also reachable from 4 wrt The converse of the above (completeness) theorem is the soundness theorem presented below. Theorem 4.3-soundness theorem Let t=(Z,*) be an arbitrary data type; let r be a set of twff's and B as defined in theorem 4.1, then if rI-y then rl=y. 2 Note thaj since t satisfies y with s, once again we may write y'[s]=i ~ by definition of satisfaction.

110 Proof Once again we take an arbitrary data type t and show that if rl-gy[s] then rl=ty[s]; the result would then follow from theorem 4.1. Let z denote' y[s]. If z is reachable from every element, y[s], of r[s] wrt $, then if y'[s] is satisfied with t then z is reachable from 4 wrt * since y'[s] must be reachable from ~. Let us also assume that the configuration z is reachable after n applications of the operations f1, f2'''''fnE*, not necessarily distinct. By an inductive argument on n we show that: z is true in t(i.e. t satisfies y with s), where z=fn(...(f2(f1(4))...). But % is true in t, by definition. Also fl(4) is true in t, otherwise either a p- or b- expressions would be violated. But if this was the case, and fl is one of the many possible operations needed to reach z, then there would be no operations to make z a reachable configuration. This is contrary to our assumption. Assume fn-_1(''(fl( )0~~) is true in t; by the same arguments as above there must be a operation fn such that Z=fn(fn_l(...(fl( ),... ) and therefore z is true in t. But then if this is the case, then Zl=fn(fn_l(...(fl(()... ) is true by definition. Hence z is true in t, therefore ZI=z. But since r[s]cz, therefore r[s] -tz, i.e. rl-ty[s].

111 Now since every configuration corresponding to a true twff is reachable from ~, we can state the following. Theorem 4.4 Let t=(Z,%) be a data type; if y is a twff which is true in t with some s, then y[s] is reachable from ~ wrt F. Theorems 4.3 and 4.4 imply that if a configuration, y[s], is reachable from one or more configurations of a data type t, then it must be the case that t satisfies y with s and vice versa. Assume a family of p-expressions, b-expressions, and aexpressions, then the sets I and D can be uniquely determined, since I and D are uniquely characterized by p-, b-, and a-expressions (see the definitions 3.2 and 3.3 of i- and d-functions). The p-functions are also characterized by their corresponding a-expressions. By theorem 4.4, (4,4) is sufficient to enumerate all the elements of a data type (Z,*). Hence the base predicate set is sufficient to define t=(Z,*). Thus to specify the "behaviour" of a data type one may only specify the p-expressions, b-expressions, and the a-expressions at every level. 4.2- Type Manipulation Operations Burstall and Goguen [BUR 77] argue that: "as soon as the theories get to be interesting they become incomprehensible. We wind up with a large set of equations

112 that no one can understand and which are almost certainly wrong. So we must build our theories up from small intelligent pieces. For this we need i) The ability to write (small) explicit theories, ii) Four operations on theories: 1) Combine 2) Enrich. 3) Induce 4) Derive enabling us to build up theory expressions denoting complex theories." Our principal objective, in the use of type manipulation operations, is to provide a framework for the above operations and more. The parameterization of data types will also follow naturally from this approach. Definition 4.2 The embed of a I-structure E'LscZ', where Z' is a data structure terminated with SSN, and a data structure Z"L" terminated with SSN (1=1), or with primitive nodes (1=0), is a set of Z-structures Z, denoted ZL=X'L1xZ"Lo; where L=L'+L" and ZL={ZL=(S,u): S= UL S1 and u= U u such that %< L 1<lL L"< kL S1 <L" SLIu1) _<LT AND VSL.+1 k.io (C(SL.+,k,i)=." where T"<Z") }

113 We may represent each element zcZ by z=(z',{zm}lS ) where the latter indexed set contains elements ZmEZ" such that each z' is the configuration contained in a node of Z corresponding to S'kmESF, in other words mc{1,2,...,!lS [}. The embed of two data structures Z' and Z" is denoted by Z=Z'xZ" where Z is the set of ~-structures: Z= U Z'xZ" z ~Z' Reminder: a subscripted operation symbol 0l denotes an operation defined for the 1 th level structure of a data structure. Definition 4.3 The embed of two data types t'=(Z'Lv r A') and t"=(Z"L"' *) is denoted by t=(ZL,*)=t'xt", such that ZL='ZL' x Z L" and *={I, D, P} where I={il: L"+I1<<L"+L' & ilLI' or 0<1<L" & ileI" } D={dl: L"+1S1iL"+L' & dl_L"eD' or 0<lSL" & dleD" } P={pl: L"+1Sl<L"+L' & plf".P' or 0<1L" & plP P" Consider now the word algebra of the embed of two data types. The embed of t and t' is composed of the configurations which may be represented in the form of (t, (tl,t2,...,tn)) where n is the number of nodes at level 1 of the configuration represented by t. Consider two terms of the word algebra of MF's of txt': t=(t, (tl,t2,..,tn)) and t'=(t', (t'l t'2...,t' m)) where nrm; t>>t' iff t>>t'

114 and (tl,t2. *.,tn)>>(t' lt'2e t'm)) where the latter means tl>>t2~t,...,t i>>t for some iEN. For the sake of notational convenience we shall consider the second term in each pair: t=(t, (tlt2,...tn)) collectively, so that t may be denoted as (t,T 1n). Thus (tl,T ln)>>(t2,T- m) iff tl>>t2 and r >>> Note that the above definition of >> is consistent with the definition of >>-relation on We given earlier in chapter 3. The MF's for the configurations represented by t and t' are unique so that if t and to are translated into their corresponding MF's, tMF and t'F respectively, then tMF>>toF iff t>>t'. The latter statement is true since, intuitively, for tMF>>toF, it must be the case that the corresponding "components", viz. structures contained in each node, must be monotonically reachable. Theorem 4.5 Let w and w' be congruence relations on (Wt,p) and (Wt,,,'), where t=(Z,*) and t'=(Z',*'). Let t', t2eWt and tlmn be elements of Wt,, then the definition of the 1' t2 relation w such that (tl,t 1 m) (t2,t.n) iff tl t2 & tl't where t ~m={t11,t12,...,t1m} and tn=t2tt gives a congruence relation on (Wtxt, U') Proof First we show that the following three properties hold for w:

115 symmetry: Show that: -)..M) W 1 -*n) 1 -o w 1+n (t,,t m)mx(t2,t n) (t2t2 n)x (t,,t ) LHS is true iff t1 w t2 and tl w t2n iff t2 w t1 and t2+n,1 t1m iff RHS. reflexivity: Show that (ti,t1m) mx (t1't11m). This is true iff t1 W t1 and t 1%m l tm The latter is clearly true, hence w is 1 m 1 x reflexive. transitivity: Show if (t1,t1 m)w(t2,t2n) and (t2,t2n)wx(t 3,t3k) then (tlt m) x(t3,t3+k) (t1 w t3) iff(t1 w t2) and (t2 w t3) (t 1+m t k iff.(t n) and (t +n t ) if f 3 t)ff(1 2 2 3 (tlt m) x (t3 t3+k) Finally we have to show that the substitution property holds, note that we only deal with "unary" functions. (t1 w t2) +f (t1) w f(t2), and (1m,l+ l~m 1+n (t 1 t n) n f'(t ) w' f'(t ) then ((t1t m)) xx ((t2t 2 )) substitution property holds and wx is a congruence relation on (W+t. t $r )'

116 It can be concluded that the u-relation is preserved by the x-operator. In some cases the structure at the lower levels of a E-structure presents us with extraneous detail to the extent that we may only be interested in the structures of the top few levels. Thus it is beneficial to define a m1echanism by means of which, structures at lower levels can be removed. This facility is particularly helpful in cases where the Z-structures are terminated with primitive nodes. A reduction operator, /, is employed for this purpose. The. following definition identifies a 2-structure which is formed by removing the level zero, or one, of a Tstructure. Definition 4.4 The red-structure of an L-level Z-structure ZL, L-121, terminated with level i, i=O or 1, is an L-i level structure X'Ll terminated with SSN such that each node of' L-!. L-i Sl'k'i' is defined as follows: Let Slki denote the nodes of EL, then for all values of 1, k, and i, if EL terminates with SSN (i.e. i=1) then <1',k',i'>=<(1-1),k,i> AND 1'>1.. (1) else <1', k', i'>=<l, k, i> N.B. For the resulting red-structure, the nodes with 1'=1 support Xundefined. Also the /-operator is undefined for

117 EL' when L<1, terminated with SSN, since 1' becomes less than 1 in equation (1) above. Example Consider the reduction operator /, applied to the 3level structure G twice; the result is depicted in Figure 4.3. Thus the reduction operator, /, is a morphism such that if /(t)=t' and if t is the product t1xt2x...xtJ, then t'=t1Xt2x...xtJ_1 where tJ is either a 1-level data type at level one and the structure terminates with.SSN, or tJ is a primitive data type. In the above example of figure 4.1, the result of applying / to G yields the red-structure of G. The reduction of a ~-structure may be extended to define the reduction of a set of Z-structures, or a data structure. The reduction of a data structure ZL is a data structure ZV such that for every zeZ, there exists a z'eZL, such that z'=/(z), and there are no other elements in Z',. If L~1 AND ZL is terminated with SSN, OR L<1, then Zj, is undefined. Definition 4.5 Let t=(Z,*) be a data type such that either L>1, or if L=1 then ZL terminates with primitive nodes. A reduction(red) operator /, is a partial mapping such that /(t)=t'=(Z', p'), and Z' is the reduction of Z, and

118 Figure 4.1 Removal of Inner Structures by /-function. ~'- I-{il: 1=1 if Z terminates with SSN, 1=0 otherwise } U D-{dl: 1=1 if Z terminates with SSN, 1=0 otherwise } U P-t{P: 1=1 if Z terminates with SSN, 1=0 otherwise }.

119 The red-operator may be extended to remove more than one level of a data type rather than just the lowest level. Finally it is noteworthy that the reduction operator performs the reverse operation of the product operation in the sense that /(txt')=t where t' is a data type with a single-level data structure. In order to make our model more susceptible to automatic programming, the TMO's are designed such that the resulting data type preserves the attributes of the operand data types. As a result it would be possible to automatically determine the characteristic functions of the resulting data types without much difficulty. Another advantage of the TMO's is that it allows complex data types to be safely and correctly built using the more common and better understood data types. Hence break-down of complex structures, a valuable tool in structured programming, is strongly encouraged. The type manipulation operations are of valuable help to the user of an abstract data type language. Our experience is that, because of the presence of the graph structures, one requires less effort to specify a data type using our approach over any one of the algebraic or axiomatic techniques.30 The virtue of the type manipulation 3 This is particularly true for simple data types, such as stack, list, and tree. As the ADJ group[GOG 78] has already pointed out, even for simple data types the algebraic technique can be trivially mis-specified or the specification may be incomplete and/or inconsistent.

120 operations, however lies in the fact that they enable us to define complex data types without much effort. This is because the resulting data type retains the original properties of the constituent, or the operand, data types. The TMO's remove the burden of defining generator operations, as well as specification of error instances, for the resulting data type, from the programmer to the machine. With a few simple data types at hand, one may employ TMO's in order to construct, or gradually build, more complex data types. Users may define auxiliary operations at every stage either "for convenience" or "change of behaviour." This will be pursued in the next chapter where we shall discuss one more TMO, namely the "enrichment" operation as well as the notions of equivalence and error.

CHAPTER V ENRICHMENT, EQUIVALENCE AND ERROR We shall investigate the enrichment process in the next section and examine the validity of the enrichments in general. Some conditions will be developed in order to signal "illegal enrichments." The concept of equivalence of data types will also be' discussed. This subject is of particular interest where an implementation of a data type is sought. Finally the "error" criteria will be examined. 5.1- Enrichment of Data Types Enrichment of the data types is the process by which new operations are introduced. Enrichment may therefore result in a change of behaviour of a data type. Consequently it may be employed in order to realize one data type in terms of another. There are also cases where enrichment does not alter the characteristic of the original data type. Under these circumstances enrichment is merely used for convenience. Enrichment is particularly useful in implementation process of data types [NOU 79]. That is the new operations are used to introduce new configurations resulting in a change of behaviour of the data type. At 121

122 this stage we would like to point out some of the deficiencies of the past techniques regarding the enrichment of the data types. In the process of enrichment it is very easy to distort the behaviour of a data type. Consider,. for example, the stack data type. One can easily enrich the stack by a "nonconstructor" operation: bottomnode, say, to peek at the bottom node of the stack configurations. This is a distortion of the stack behaviour and yet it may be legal to define this type of operations in cases where there are no axioms to prevent the user. If the introduction of the above operation, by the user, is intentional then indeed there is no problem. However if such operation is carried out inadvertently, which is more likely in a more complicated data type, the consequences are obviously undesirable. 5.1.1- Changinq Behaviour by Enrichment A problem of different nature also related to the. above example is that of the correctness proof methodology of Goguen et. al. [GOG 78]. Their approach is based on cannonical term algebras (cta) and the concept of constructor signatures. Briefly, two data types are assumed to be the same up to renaming if their cta's are isomorphic. The idea advocated is that: "to know a data type is to know its constructors." By the above stack example it is immediate that this is not always true. Consider the

123 original stack (a normal stack data type) and the modifiedstack, with the new operation bottomnode. Note that bottomnode is not a constructor operation, both of these "stacks" have the same cta and constructor signatures. However they do not exhibit the same "behaviour." The introduction of the probe function, in the characteristic set of DSO, obviates this problem. In the next section it will be shown that the modified-stack is no longer equivalent to the original stack, since their aexpressions are not the same. Hence any attempt made, by a user, to add an illegal operation of this nature would be immediately detected and prevented. Any other operation that does not distort the original behaviour may be added. The question is then- if we are not allowed to change the characteristic set how can we realize one data type in terms of other data types? The answer is that certain mechanisms must be built-in to signal the change of characteristics in the process of enrichment. In an array, for example, introduction of bottomnode does not cause a change of behaviour of the array since we already have access to the bottom node (first node inserted) anyway. Thus, the question is how can we detect this change or nochange in behaviour? By enriching the operations of the characteristic set we either relax certain conditions or restrict them. To detect when a transition in behaviour

124 occurs, we adopt the following approach based on the three base predicate expressions characterizing a data type. The change of behaviour, if any, which may result from an enrichment may be detected by examining if the newly introduced functions are in line with the p-, b-, and aexpressions when they are evaluated. This is explained below in more detail. Let us consider a stack data type. The construction sets for it is determined by its bexpression. So that every node to be inserted must satisfy the related b-expression b (i)=Vj(j-i). Thus the domain of the i-function is limited to that of the top node for every given configuration. Clearly to introduce another insertion function i' to allow insertion at the "bottom-end" of the stack would cause a change of behaviour. As a result the b-expression would not be satisfied with such an insertion. Similar arguments are also applicable in the case of the deletion function. In general, there are two categories of enrichment. Firstly a data type may be enriched merely for the sake of "convenience" rather than change of behaviour. A second category of enrichment is "enrich to change behaviour." That is to obtain a data type with a different characteristic to that of the original one. A more "priviledged" operation is now needed to change the behaviour of a data type.

125 It should be noted that enrichment is a type manipulation operation. Consider the "ENRICH" operator which may be performed on a data type t. In case of the enrichment for convenience the ENRICH operator evaluates to the original data type. If only the b-expression is changed, then the resulting data type ENRICH(t) would either contain t, i.e. t C ENRICH(t),3' or vice versa depending on whether the b-expression was made more "strict" or more "relaxed" respectively. For the cases of enrichment where the a-expression is changed, t and ENRICH(t) would be isomorphic but not "behaviourally isomorphic." In order to show that one data type mimics the behaviour of another one needs to demonstrate their "behavioural isomorphism." Auxiliary operations(AO) are introduced in the next subsection. These are the class of operations that when added to a data type t the characteristic of t would not be changed; that is ENRICHAO(t)=t where the equality implies behavioural isomorphism (see later), and the subscript AO denotes the type of functions that t is enriched with. 5.1.2- Auxiliary Operations So far we have only discussed three types of data structure operations, namely i-, d- and p-functions. Such functions are of primary importance since their role is to completely specify the behaviour of the desired data type. 31' C denotes sub-isomorphism.

126 There exists another class of operations for the purpose of users' convenience rather than essential to the specification. Such operations are referred to as The auxiliary operations (AO). The auxiliary operations define the "class of programmed operations." They may be thought of as procedures in a programming language. For example we may define the attach_leftmost operation on the binary tree data type in order to insert a node to the leftmost node in a binary tree configuration. This operation does not change the characteristic of the binary tree it merely provides us with an extra tool in order to implement other, more complex, operations. In a sense, the auxiliary operations are "non-essential" to the extent that their absence or presence does not influence the underlying behaviour of the data type. AO's are also, in general, non primitive because they may be broken up into more primitive operations. For example consider an operation that performs parallel insertion into an element of the binary tree data type. That is it inserts one or more nodes into a configuration of the binary tree concurrently. Such an operation may sound to be a constructor operation. However, it may be implemented in terms of the more primitive serial i-functions of the binary tree hence it is not a generator function for the binary tree. Thus parallel insertion of the binary tree would be categorized as an AO. It will be shown later that the latter operation would not affect the

127 overall behaviour of the binary tree since the configurations of the binary tree would remain unchanged. On the other hand, introduction of a parallel operation of insertion in a stack, such as inserting more than one node at a time, would violate its associated b-expression and/or a-expression, resulting in an illegal enrichment. In order to define the auxiliary operations, one would need a media in which such operations may be specified with clarity and without ambiguity. Such a facility is provided by the language of "equational theory." This is discussed next. 5.1.2.1- Equational Theory Many basic properties of algebras are expressed by means of equations, i.e. ordered pairs (fl,f2) often represented as f1=f2 using the language of equational logic [TAR 66]. We shall be concerned with formulas of the type u=v where u and v are the extension of the twff's defined in chapter 4. These extended twff's, denoted as well formed sentences, will be described shortly. The data structure operations are, in general, divided into the characteristic set * and the auxiliary set, A. We have already discussed the former type of operations extensively in chapter 3. The set A will be the subject of what follows.

128 The central idea is to define a set of operations in terms of the elements of the characteristic set of operations. The correctness of the auxiliary operations is guaranteed by the fact that the constituent parts of an AO are themselves valid operations of the data type. Therefore we need not worry about the "error" instances. These are already taken care of by the operations of *. In any case such error instances resulting from an auxiliary operation would not lead to a change of behaviour, it may only cause undesired restriction on the actual auxiliary operation being defined. For example we may define a parallel insertion operation on the binary tree using its more primitive i-function. The error instances of such a parallel operation is restricted to those of the primitive i-function, since the semantics of the parallel operation may be defined in terms of the sequential i-function. The parallel operation would neither add a configuration to the set of tree configurations nor will it take one away from it. In order to determine if an auxiliary operation is a "legal" operation, in the sense that it does not alter the behaviour of the data type t, we define the "satisfaction" of the newly defined operations, viz. the AO's. First let us define what is meant by an auxiliary operation. An auxiliary operation (AO), defined for a data type t=(Z,O), is a partial function f such that f: A1XA2x...xAn ZUS

129 where Al, 1<<5n, is either Z or S, where S is the set of all the nodes Slkm for l,k,meN. An AO, f, is an auxiliary constructor operation(ACO) if the range of values of f is in Z, it is an aux iliary probe function if the range of the values of f is in S. In order to enhance the power of specification for the auxiliary operations a new operator symbol is introduced, namely ifthenelse, its value, wrt a data type t, will be defined later in this section. With the introduction of this operator we need to extend the definition of our twff's. The extended version is referred to as the term well formed sentences (twfs). These are defined below. Let Pn denote an n-ary predicate constant, for K>1 and n>0. An atomic term well formed sentence is either a twff or constant symbols: Z1, z2'''' denoting the elements of Z, or constant symbols: c1, c2,... denoting elements of S. 1) An atomic twfs is a twfs. 2) if pq(Tl,T 2 *..'Tm) then T else T' is a twfs if T, X and t' are twfs's.32 3) Only those formulas obtained by a finite number of applications of 1) through 2) are twfs's. Note that the set of twff's is a subset of the set of all twfs's. 32 We may also write "if X then Zl else 2" as an alternative for ifthenelse(1,zl,z2).

130 The satisfaction of twff's was defined earlier. It only defines the satisfaction of twff's only if they have a value in Z. However we would like to extend this definition so that if a twfs evaluates to a node value we also have a notion of satisfaction for it. Let us now define the satisfaction of those function symbols whose value is in S. Let f be an auxiliary probe function symbol. We say t satisfies f with s, denoted by I=tf[s] iff the value of f, denoted by f,33 for any argument values regardless of the arity of f, is such that E=Slkm and al(Slkm) is satisfied with oalk where al is the a-expression associated with level 1 of Z. The definition of satisfaction of the tw.fs's containing the ifthenelse operator is defined next. |I=t(if pln(~1 an) then y else y')[s] if I=ty[s] and'=t'[s]. We may define the value of the ifthenelse operator as follows. Let y=if w then y' else y" be a twfs, in the presence of a data type t and some assignment function s, the value of y is s[y'] if t, the corresponding structure, satisfies r with s, otherwise the value of y is s[y"]. We define an AO set, A, for a data type t=(Z,*) using a set of equations A as follows, Let eck be {=~, then e defines Z subject to the following conditions: 3 3 (T1iT2'...'Tn) denotes s[f(t1,T2,..., n)].

131 1) C is a twff of the form fn(T1,T2,..,Tn), where fn is an n-ary function symbol, n>O, and each Tr is a term, i.e. either a variable or a constant symbol, and there are no other equations e'Es in which fn occurs in the LHS of e'. And C is a twfs. 2) If fn(Tl,T2,...,Tn) is the LHS of ecE then fn is distinct from the function symbols g, corresponding to elements of i, and 4. The satisfaction of the newly defined operator 5 is defined as follows. I=t [s] iff [=tc[s] For example, let t be the tree data type. We are interested in an operation, ins2, in order to insert two nodes in a tree configuration every time that ins2 is invoked. Let ins denote the primitive tree insertion, then we may define ins2(z,<lki>,<lkj>)=ins(ins(z,<lki>),<lkj>). Thus if I=tins(z,<lki>)[s] and I=tins(ins(z,<lki>),<lkj>)[s] then I=tins2(z,<lki>,<lkj>)[s]. The above idea of defining new operations may be extended in order to define parallel, insertion and deletion, operations only if all the nodes to be inserted into, or deleted from, z are in the appropriate construction sets or the destruction sets of z. Recall that this is a decidable problem as shown in lemma 3.1. Consequently, such a powerful facility may readily be utilized without any

132 concern about the correctness of parallel operations. The ability to define such operations is most valuable in performance measures. For example, given the binary tree specification, it is possible to specify parallel insertion and deletion operations or both. The following is an example of an APF. Example Consider a binary tree data type. We wish to define a probe function which always returns the nearest leftmost leaf node to a node m. Assume tebinary tree and let P1 denote the left probe function of the binary tree'. leftmost(t,m)= if Pl(t,m)=0 then m else leftmost(t, pl(t,m)) To get the leftmost leaf of the tree, use leftmost(t, 1). Recall that in chapter 3, we introduced the reduction of the terms of a data type; before leaving this section, it must be added that if { is defined by c, i.e. {=c, then (=>aC, meaning { reduces to C. To this end, in the presence of the auxiliary operations, we have the extra axioms, the auxiliary equations, by means of which the terms may be reduced to their MF. Hence the MF for ins2 operation, defined above, is given by the MF of the RHS of the equation in terms of the primitive i-function of the binary tree. Finally, we can define what we mean by a specification of a data type.

133 Definition 5.1 A specification of a data type is a pair (EI, A) where n is an indexed family of base predicate sets 71' I={[l1lL' such that each r1 is the set of p-, b- and a-expressions, the base predicate set for level 1, and L is the index set; and A is a set of auxiliary equations. 5.2- Equivalence of Data Types In the algebraic approach of [GOG 78, NOU 79], the philosophy is that "to know a data type is to know its constructors." Clearly in the simple example of stack given in section 5.1.1 we showed a contradiction to this philosophy. Because of this, and the vital importance of data security, we have chosen to include the probe function in the characteristic set of operations of a data type. In here, we shall see that the two stacks of section 5.1.1 are not "behaviourally isomorphic" although they have the same configurations in their respective data structures. The notion of "weak equivalence" is defined next; this will be followed by a stronger notion of equivalence, viz. the behavioural isomorphism or simply "equivalence." This notion would aid us to see if a certain implementation of a data type, t, is behaviourally isomorphic to the original specification. Thus by showing that t' is equivalent to t one may then say t' is a "correct implementation" of t.

134 Let Z and Z' be data structures. A data structure homomorphism from Z to Z' is a pair H=(1I,ln) such that ~: Z+Z' and if i(z)=z', with z=(S,u) and z'=(S',u'), then l: S+S' preserving the u-relation; i.e. if (x,y)cu then (n(x),n(y))Cu'. A data structure isomorphism is a data structure homomorphism when both V and n are bijective. Definition 5.2 Two data structures Z and Z' are weakly equivalent if there exists a data structure isomorphism from Z to Z'. We say that two data types t and t' are weakly equivalent if their respective set of data structure configurations, Z and Z' respectively, are weakly equivalent. 3 Consistent with the definition of the equivalence of twff's of the first order predicate calculus we define the equivalence of pwff's, bwff's and awff's in a likewise manner. Let w denote any of the latter three types of formulas, then w and w' are equivalent if for any structure Olk that satisfies w with s then it satisfies w' with s and vice versa. 3 4 Reminder: Z denotes those configurations of Z that are reachable from 4 wrt * in a data type t=(Z,*).

135 Theorem 5.1 Let t=(ZLF) and t'=(Z.,4j'); let pi (bl) and Pi (b1l) be their p-expressions (b-expressions) associated with level 1 of ZL and ZL respectively. If for every level i<L, pl and pi are equivalent, then if b lb'1 and alai' then ZLcZ S denotes sub-isomorphism, and + means logical implication. Proof Using induction on the number of nodes, let us start with the starting configurations 4 and 4'. Let i(4)=4' where U is a morphism. Since pi and pi are equivalent therefore TCSL =TCSL1. If for every level 1, blb;, then for level L, bLab'L Now, if CSL1=empty then ZL={}; b'L, however, may be satisfied for some values of TCSj1. If this is the case, then Z[={4'}u{other configurations} in which case 2L Z5. If b'L is not satisfied, then of course iZ={t'} Once again ZLFZL. Now consider the case when CS is not empty. That is there are certain values of TCSL1 that bL(i) is satisfied with. But every value that bL is satisfied with, it also satisfies b' L(i). Hence CSLiCCSL1. Assume CSL1={mj: 1<j5n} and CSL=t{mj: 1l<jn'}. Therefore those configurations having a single node and reachable from 4 are i(4,mj), 1i<jjn. Also those configurations having a single node and reachable from 4' are 1' (4',m), 1lk<n'. Therefore for every l(C,mj) there exists a configuration'('m) such that mj=mk. Define ~ such that ((%,m))='(',m). Let us refer to the latter two

136 configurations, having only one node, as z1 and z', where ZI=]((zl). The TCS's for each Slk in a z1 is also identical to that of some Sik in the corresponding z1 since Pl and pj are equivalent for every level. Now, if blab'l then for 1 1 each Slk in z1 there exists a Sik in z' such that CSlkCCSjk, see the definition of CS. Consider now two arbitrary configurations z and z', after n steps(each having n nodes), such that 1I(z)=z'. By precisely the same argument as above, for each Slk in z there exists a Sik in z' such that TCSlk=TCSik; and since blb'l then CSlkCCS Therefore for every configuration having n+1 nodes reachable from z there exists a corresponding configuration reachable from z'. This correspondence is of course defined by V1 as described above. Theorem 5.2 Given two arbitrary data types t and t', there exists a procedure that stops with a YES answer if t and t' are not weakly equivalent. Proof Consider two data types t=(Z,*) and t'-(Z'I') where Z={z: z=(S,u) I and Z'={z': z'=(S',u') }. For the sake of brevity assume 1-level structures Z and Z'. The following proof is a construction of Z and Z' such that if t is not weakly equivalent to t' it stops with a confirmation else it

137 continues. In the process we try to find bijections n and V such that n maps every S to S', where i(S,u)=(S',u'). In order to find a bijection n which maps S to S', it must be the case that ISI=IS'I. Now, each (S,u)cZ may be represented by (S,p). Similarly each (S',u')sZ' may be represented by (S',P'). By lemma 2.3 and corollary 3.2, both Z and Z' are recursively enumerable. Therefore enumerate elements of Z, and 2', starting from z0(i.e. 4), and z (or 4') and let i(z0)=z'. Enumerate the rest of the elements z=(S,p)EZ in the ascending order of ISI. If there are more than one configuration with the same number of nodes, enumerate all such configurations first, then enumerate the next set of configurations, whose number of node indices is that of the previous set plus 1. Enumerate 2' in the same fashion. Let zi=(S,p) denote a configuration of Z such that ISil=i. For every i find all the elements zi of Z and zi of Z'. Note that there are only a finite number of configurations with i nodes, for a data type, since the Construction Set, CSlk, for every Slk is finite. Let zj=(Sj,uj)=(Sjp), Ojgn, be those elements of Z such that ISil=i for all j. In other words n is the number of configurations of Z having i nodes. Similarly let n' be the number of elements z!i=(S,p')eZ' having IS'jl=i, where O<j<n'. If non' then stop, t and t' are not weakly equivalent; else find an r such that for every value of m and n in Sj that satisfies p, p' should also be satisfied with n(m) and n(n) and vice versa.

138 The search for such an n may require back-tracking so that if we are currently dealing with those configurations zi and z! such that i=I, then the mappings n and ] that we have found so far for values of i=I-1,...,0 may no longer be valid. This implies that we have to try some other mapping, n and i, such that it works for i=O,...,I. This strategy implies that we may have to try every possible mappings of {Zi:ZieZ & i<I} to {z!:zteZ & iI-}; and for every such mapping try different combinations to try and find the right n. Note that the number of such mapping is finite. If such, cannot be found stop. Otherwise set i=I+1 and try to find (Ii,n) as above. For the case of L-level structures, the proof is a straight forward extension of the above. Let (Z,O) be a data type where Z is an L-level data structure such that the CDSO set *_CO, and O-* is a, possibly empty, set which contains the auxiliary operations. Also let zl denote a configuration of Z1, the data structure defined at the lth level of Z.35 Finally <lkm>k denotes a node at level X; and 01CO denotes the set of operations at level 1. 3 Note that Z does not denote a 1-level structure. It is a 1-level structure defined for level 1 of Z.

139 Definition 5.3 Let t=(ZL,O) and t'=(Z'L,,O') be data types. A Ehomomorphism, H, is a family of data structure homomorphisms, Hl=(n1,i1) for 1<L, preserving the operations, i.e. H: t+t' where H is a family of pairs of morphisms: ( lnl) such that I1: (Z'Ol1)(Zl''0 1') and nl: S1-Si,, 1<L and l'<L', 1' preserving the operations, i.e., if f and f' are functions in 0 and O' respectively then H[ f(zll,z1,...,z1,<lkm>1,.lr) f'('1 (z1 ),..1 (<km>, (<km> 1 1 1n n q q r Two data types t=(ZLO) and t'=(Z'L,,0') are behaviourally isomorphic, or equivalent, denoted by tat', if (ZL,IUD) and (Z'L,,I'UD') are E-isomorphic and for every zcZL, for all 1 and k, if <lki>cRlk then there exists an <1 Ik'i'>=(<lki>)cR',k,, z'=i(z), and vice versa, where Rlk is the set of accessible nodes in z having the first and second indices 1 and k in common. Let t=(Z,O), and let t'(Z',O') be an enrichment of t such that *CO and'CO', and OO'. Then t' is an enrichment for convenience of t if t is behaviourally isomorphic to t', otherwise t' is an enrichment for change of behaviour of t.

140 The latter may also be referred to as the implementation of t' by t. 5.3- Implementation by Emulation Consider two almost similar structures depicted in figure 5.1. Clearly the two structures (a) and (b) are not isomorphic. Let us use the well known replacement theorem of predicate calculus [MAN 743 as follows: Let Pa (Pb) be the p-expression associated with the data structure represented by the structure in figure 5.1-a (5.1-b). Then e (a ) Figure 5.1. Structural Similarity.

141 Pa 1<>i A i>1 and (1<>i A i>1) U (i<>i+l) or P Pa U (i<>i+l ) But the second expression on the right hand side of the last identity denotes the p-expression for a structure which is isomorphic to a list structure. Let us denote the pexpression for the latter as Pc' therefore Pb - Pa U Pc As a result a star and a list structures combined, simulate the structure of figure 5.1(b). This simple derivation illustrates the idea that, by emulating one data structure in terms of others, we can implement datastructures in terms of their equivalent structures. For behavioural isomorphism, however, we need to consider the b- and a-expressions as well as the p-expressions. Theorem 5.3-Correctness Theorem Let t=(ZL,O) and t'=(Z',O'), then t is behaviourally i'somorphic to t i'f for all 1<L, P 1 Proof As a corollary of theorem 5.1, because the corresponding p- and b-expressions are equivalent, Z=Z'. That is Z and Z' are isomorphic. The behavioural isomorphism follows from the definition.

142 Thus it is immediate that: (stack, IUDUP) / (stack, IuDuPu{bottomnode}) but (array, IUDUP) r (array, IuDuPu{bottomnode}) It can be seen that the range of values of the Auxiliary Probe Function: bottomnode does not satisfy the aexpression for the stack for every stack configuration. Hence the resulting data type does not behave the same as the stack would. Implementation of data types is a step by step procedure of realizing one data type in terms of the other data types. Every step would bring us to a lower level olf abstraction until, finally, the lowest level of abstraction, which is the physical realization. The following theorem would enable us to ensure the correctness of this approach. Theorem 5.4-Implementation Theorem Let t=(ZL,O) be a data type with p-, b- and aexpressions at level 1sL denoted as Pl, bl, and a1 respectively. Let t' be the data type resulted from replacing one or more of these expressions, or parts of them, at one or more levels of t by their equivalent expressions respectively. then t and t' are equivalent. Proof: Follows from theorem 5.4 and the well known replacement theorem of logic[MAN 74].

143 The implementation of one data type in terms of another is just a matter of "matching" their p-, b- and aexpressions. Thus complex structures may be broken down into the existing ones and therefore implemented. For example, we can realize a stack by introducing appropriate b-expressions and a-expressions to an array since their pexpressions are the same. The following examples illustrate the idea. Example Given an array data type with the following base expressions: p: i<>i+l b: i> 1 b': i>1 a: i21 then, a) A stack is defined by modifying the b- and aexpressions of the array as follows: b (i): vj(j<i) b (i): Vj(j<i) a(i): Vj(j5i)) b) A binary tree data type can be defined by modifying only the p-expression of the array as follows: p: (i<>2i)U(i<>2i+1)

144 The b- and a-expressions remain unchanged. Figure 5.2 illustrates the node formation of the binary tree for a particular configuration of the array. Figure 5.2 array Implementation of binary tree. Similarly the queue, the list or any other type may be obtained by merely modifying one or more of the base

145 predicates of a given data type in order to arrive at the desired behaviour. The array implementation -of a binary tree presented above confronts us with an interesting problem of morphisms between the data types. It can be observed that since for a given number of nodes a binary tree may have more elements than an array, there exists no epimorphism from the array to the binary tree data type but there does exist a monomorphism. Furthermore, there exists a (partial) morphism n such that A: tree-array, where n is isomorphic for some subset of the the binary tree data type. In other words, the array is sub-isomorphic to the tree. By defining a suitable congruence relation, for example "equal-numberof-nodes", E, on the binary tree to get binary tree/E, one can define an isomorphism which maps binary tree/E to the array. The question is of course: when is a data type t' sub-isomorphic to another data type t? This may be answered immediately by the statement: when t qontains t'.(Recall definition 4.1.) In general, this problem is undecidable. Finally, it is a trivial task to enrich a stack in order to make it emulate the behaviour of the FIFO queue. The p- and a-express.ions remain the same as that of stack. The b-expression for deletion would not be altered. The bexpression for insertion is the only change needed to simulate a queue behaviour. This example illustrates the extremely useful feature of "extensibility" exhibited by our

146 model. In comparison, in order to do the same extension in any one of the existing algebraic techniques, one has to change the whole specification. The concept of extensibility was discussed in chapter 1, it is one of the important properties desired of any specification techniques aimed at data types. 5.4- Error The specification of an abstract data type is, by definition, dependent upon the operations performable on that data type. For every data structure operation, defined for a data type, there are instances of that data type that, when subjected to this operation, would result in an "undefined" state. For example for an empty stack: pop(empty stack)=undefined. Similarly push(full stack)=undefined. Ideally we are interested in specifying these undefined states so that we know what to do when such a state arises. We are also interested in knowing about the origin of the undefined state; this would help us to determine the cause of the error. For instance for a stack of integer, if the result of a certain operation is undefined then we would like to know which data type the undefined state is associated with- stack or integer? The undefined states are also referred to as errors. These are no more than the I and T that we discussed earlier. The error criteria is an integral part of every specification. Indeed in the absence of an adequate error

147 criteria the specification of abstract data types would have no significant value and the whole purpose of a formal specification would be defeated. The past approaches to the way the error criteria has been handled are in general either incomplete and unsuitable [GUT 75], or the error issue has become so complicated that it has proved to be a problem of unreasonable magnitude to tackle [GOG 78]. In our approach, instead of depending on extra axioms to define error instances we tackle the problem by defining the domain of the data structure operations using the appropriate p-, b- or a-expressions. In essence, when an operation results in error it causes a transition from one reachability equivalence class to another. For instance, consider a stack10, which is a stack of maximum depth 10. The 11th push causes an error instance. Such an error instance may be interpreted either as the "overdefined' element of the stacklO, namely T, or we may interpret the 11th push operation as one causing a transition from stack10. data type to another, e.g. stack20 or stack50. The latter approach has a number of shortcomings. For instance, push or any other DSO defined for stack10 can only result in elements of stack10, so how can it possibly result in elements of stack50 or any other type? Even if we did overcome the latter problem it would not be clear which particular type the transition is made into, namely stack11 or stack67? The former problem, however, is more sign.ificant and profound than the latter. It leads to the

148 same difficulties encountered by some advocates of the algebraic technique in their error treatment [GUT 751. We shall pursue the idea that the l and T elements of a data type define the error instances of that data type. Recall that for every data type there are two fixed-points with respect to all constructor operations of that data type; these are oft and uft whose values are represented by T and l respectively. Thus every DSO is strict or bottom preserving. Since the T and l define the "limits" or "boundaries" of a data type, it is then possible to define such boundaries using the b-expressions. This is indeed what the b-expressions were primarily intended to do. Example A stack4 lattice is illustrated below. oft. (stack4 overflow) depth 4 depth 3 depth 2 depth 1 depth 0 uft (stack4 underflow) PUSH(oft)=(oft)=PUSH(depth4) POP(uft)=uft=POP(0) In fact theorem 3.1 would ensure that all the "non error" configurations are reachable from each other and all

149 the error configurations are trapped in their own equivalence classes. Hence no error configuration may reach a non error configuration and vice versa. 5.4.1- Domain of DSO's In general, we are interested to know precisely what are the domain of DSO's. It is sufficient to consider only the operations of the characteristic DSO set * since all other operations are defined in terms of these. Recall the definition of i- and d-functions in chapter 3; their -domain was described to be Z x L x K x M, where L, K, M are some subsets of N. For a given level of a data type, we are only concerned with Z and M, keeping lcL and kcK at a fixed value. Since for given values of 1 and k, the only variable is m and it is precisely this variable whose range of values affects the behaviour of the structure. Similarly, for the case of the probe functions it suffices to consider ZxM as their domain.3' The problem is how to define the appropriate subsets of Z and the set M for each kind of function i-, d-, and p- respectively. As an example consider the above stack4 data type. Let Li denote a stack configuration of depth i, then Z = {%,L1,L2,L3,L4} 3 Note that the probe function is really a mapping of the form zx<lkm>4<lkm'>; but for fixed values of 1 and k every <lkm> may be denoted by, merely, meM.

150 M- varies dynamically depending on the depth of the stack, i.e. it depends on zeZ. We saw that this was determined by the b-expression for the stack. For a stack configuration of depth 3, M={S114}. Since s[i]=S114 is the only element of the TCS of L3 (a configuration of depth 3) that stack satisfies b (i) with. Consider now a stack of depth 4, L4. The value of M would be equal to { }, which implies that nothing may be pushed on the stack. Note that the elements of M in the above example are determined by the b-expressions of the stack. Hence for stack of depth 4, M is empty, and therefore no more nodes may be pushed onto it. As a result an error state is never entered. The task of introducing extra constraints in order to alter the construction and destruction sets is a rather straight forward task. For example, the depth of the stack configurations may be limited to a maximum size of 100 by ANDing the extra constraint of i<100 to its b-expression, b. It should now be obvious that for an i-function l: ZxLxKxM+Z, where i(z,l,k,m)=z', M=CSlk. Similarly for a d-function 6: ZxLxKxM+Z, where 6(z,l,k,m)=z', M=DSl But lk' we have already seen that the CS's and the DS's for a given zcZ are recursive sets. Hence for every configuration zcZ the domain of the i- and d-functions are recursive at every level, for every structure (Slk,ulk).

151 It can be seen that the process of specification is reasonably straight forward. The "extension" of specifications is even simpler. Behavioral changes may be made through "addition" or "omission" of appropriate constraints in one or more of the predicate expressions. Hence given a data type with its related b-expressions it is easy to change such expressions in order to come up with a new desired behaviour. The same approach may indeed be adopted in order to modify the behaviour of a data type via p- and a-expressions. Such facility makes the process of implementation or extension a trivial task. For instance, given the b-expressions for the stack, we could'readily specify (or implement) a FIFO queue by making minimal changes to the stack b-expression for insertion. The a- and p-expressions remain unchanged. The same process, using the algebraic approach, means that we will have to discard the "gifted" stack specification and start with a fresh specification of the. queue and yet worry about the correctness of this latter specification.

CHAPTER VI CONCLUSIONS AND FURTHER WORK The scheme which we have adopted, for specification of data types, is a blend of definitional and operational methodologies. The advocates of the axiomatic techniques reject the graph models because of their implementation bias. The degree of such a bias varies from one graph model to another. We must emphasize that the s-structures are not to be interpreted as actual memory layout of data nor do the edges represent access paths as in Earley's approach [EAR 73]. In our approach, a much more abstract notion of access is employed, namely the a-expressions. In general the presence of implementation bias may be detected in almost every technique; this includes both the explicit as well as implicit techniques. A problem of considerable importance is the difficulty of construction and comprehension which faces the present formal implicit techniques. Our usage of graph structures has considerably alleviated this problem. A major problem with previous techniques has been to define a set of operations to completely and consistently characterize the behaviour of a data type. The characteristic set i and the properties required of them 152

153 provide us with sufficient information to characterize data types both completely and consistently. By employing the DSO functions, viz. the i- and d- functions, we showed that data types may be modelled as lattices. As a valuable merit to our approach, we showed that all that is required to specify a data type is to specify the p-expressions, b-expressions and the a-expressions for the desired data type. Also demonstrated was the fact that a given data type specification may conveniently be extended to exhibit a different behaviour. This was exemplified in the case of the stack where by making a small change in a single expression the FIFO queue was realized. Because of this built-in facility of extensibility, in our model, in most cases one can realize the desired data type by a few minor and simple alterations to an existing specification. However, as noted earlier in the introduction, the price that one pays for such conveniences is the minimality of the specification. For example, the set data type may only be expressed as an indexed set. Thus an extra constraint is unnecessarily imposed on the set data type. Among other potentials of the base predicate expressions is their use in program verification. They provide the user with predicates that must hold for every instance (configuration) of a data structure so that every configuration is true in the sense that was defined in chapter 4. Furthermore, both the soundness and completeness properties of a specification

154 were demonstrated so that all the true configurations are deducible, and all the deducible configurations are true. As a further facility to specify data types of a more complex nature we introduced type manipulation operations. These'operations enable us to arrive at more complex data types without the burden of redefining their operation set; since their characteristic set is automatically obtained from the constituent parts. This is indeed a very convenient tool in specifying less commonly used data types and be certain of their validity. Thus, data types may be defined independently from each other, and then combined according to the users' recipe. This philosophy is highly related to the concept of abstraction of data found in SIMULA [DAH 66] and other languages that followed from it. It is clear that parametrization of data types is a direct application of the embed TMO. Two types of enrichment were introduced: enrichment for convenience and enrichment for change of behaviour. Using the inherent properties of our model, it was shown that, one may enrich a data type by any other operation. Such auxiliary operations are defined in terms of the more primitive operations. The "correctness" of the former is automatically decided by the elements of the existing base predicate set. We also demonstrated that in order to enrich a data type for change of behaviour we must change its base

155 predicates. This process was referred to as implementation by emulation. The base predicate expressions are of tremendous value in order to prevent reaching the error instances of a data type. These expressions are employed to define the domain of the characteristic functions of every data type. This idea alleviates the complex and tedious error theory approach of the past. By merely specifying the base predicate expressions of a data type the domain of its characteristic functions, as well as the error instances, would become decidable. Hence the burden of specifying the extra "error equations" [GOG 78] and other exceptional equations such as that of [KAP 80] are removed. An important and valuable merit of our approach is the fact that the issue of parallelism of operations is inherently built into the specification of a data type. The b-expression for insertion (say) of a data type defines the construction set of any configuration of that data type. The latter set would in turn define the nodes which may be inserted into the configuration. As a result of this arrangement, for every given configuration the elements of the construction set may all be inserted in parallel into that configuration. Likewise the elements of the destruction set may all be deleted in parallel. Toward this end, parallel insertion and/or deletion and/or probe operations may be defined and in addition such operations

156 may be used to define other operations of a more complex nature. With this facility, a number of operations may be carried out at different levels of the structure on possibly distinct nodes of that structure concurrently. The correctness of such operations is guaranteed by the base predicate expressions. The ability to define operations acting concurrently on the same set of data is indeed invaluable in many systems such as data bases and operating systems. The combination of structure and behaviour enables us to specify certain criteria which have not been demonstrated before by other techniques. For example the aexpressions,or equivalently the probe function, and its use in security of data. For example: in an operating system environment one may readily allocate different a-expressions to different users, for the same data type, so that the more priviledged users have more access than others. Finally by renumbering the nodes it is possible to reconfigure data types such that a given data type may exhibit a characteristic distinct to that originally intended. Thus a set of data items may perform different behaviours to different users, all at the same time, depending on the numbering scheme associated with its nodes. This, of course, is a very useful tool in data base systems. Appendix B presents a brief introduction on the potentials of our model regarding this concept.

157 Further Work -Extension of our model to specification of "algorithms." An algorithm is a collection of one or more data types. To define an algorithm, we would like to bring together, or "combine", all the necessary elements of different algebras, and define operations which may or may not cause transitions between different sorts of the resulting algebra. For example, in order to sort elements of the array of integers, we may need the help of the bool data type. In the presence of bool, we are able to define an operation such as >: NxNibool. Other operations may also be added such as EX(z,i,j) operation which exchanges the contents of the nodes i and j of the array configuration z. - In some cases where two data types tl and t2 are combined to get t1xt2, we may be interested in adding extra constraints to the resulting data type. We may be interested in modifying the b-expression, i.e. enrichment for change of behaviour; or it may be necessary to add extra axioms in order to define some desired equivalence relation on the set of data structure configurations, Z. As an example consider two terms of a set-of-char data type: t1 and t2. If t1 and t2 contain the same elements then they are equivalent no matter what is the order in which the

158 elements are inserted. The following rule of reduction may be introduced for the set-of-char: let seset-ofchar and a,bechar, then l(l(s,i,a),j,b)= (l(s,i,b),j,a) - Specification of parallel operations. Possible extension of the model such that each node at a given level may contain (behaviourally) different structures. In the presence of such (extended) Xstructures, we can also define a "join" TMO to combine data types.

159 APPENDIX A Examples A few examples of data type specification is presented in this appendix. More specifications can be obtained either by direct specification or by the use of TMO's. We also demonstrate specification of an algorithm, namely Binary Search Tree Insertion, using our model. array p: i<>i+l b (i): i>1 a(i): i>1 stack p: i<>i+l b (i): vj(j<i) b (i): vj(j5i) a(i): Vj(j<i) binary tree p: (i<>2i) U (i<>2i+1) b (i): i>1 ba(i): i>l1 a(i): i21

160 int There are two i-functions i0 and i. The b-expressions are: for zeint b (z)=true if z=4 =fa7se otherwise b (z)=true if z~4 =fa7se otherwise b (z)=true if z~% =fa7se otherwise. i is of course the Peano's suc function, where d is the pred function. nat The i-functions are the same as int with the following added restriction. The b-expressions are: b (z)=true if z4= =false otherwise bl(z)=true if z$o. =false otherwise b (z)=true if z~4 or zoio(%) =false otherwise bool There are two i-functions i1 and i2 and their inverse. For both i-functions ij, 1<j<2, b (z)=true if z=4

161 =false otherwise. b (z)=true if z~4 =false otherwise We may also enrich bool to include functions other than the insertion and deletion functions. For example the AND function is described as follows: Let b1, b2ebool AND(bl, b2)= if b1=i1(4) or b2=i1(4) then i1(4) else i2(+). string There are 26 constructors, each having a b-expression as follows: let zestring b (z)= true if z4 1~<j~26 = false otherwise. b (z)=true if z=4 =false otherwise b (z)= true if z~4 = false otherwise. balanced tree A balanced tree data type is one that contains binary tree elements such that every node of the latter is balanced. Where balancing of a node j means the difference

162 between the left and right subtree of j is less than or equal to 1. The specification is an extension of the binary tree specification.37 p: i<>2i U i<>2i+1 b] i>1 U Vj(((2lja+i/ki)Ak =2r +eN ) +).BF(j) 1 ) U A be: il1 U Vj(( (>j>_i/kO)Ak=2ArcEN ) IBF(j) Il ) a: i>1 where BF(j) is a function which returns the Balancing Factor of j defined as follows: Define the height of a node j as h(j)=max(h(2j),h(2j+1 ) )+1 if 1k p(k,j) U j=1 -1 otherwise. then, BF(j)=Z(h(2j))-Z(h(2j+1)) where Z(i)=i if i>0 =0 otherwise. traversable stack An example of a "traversable stack"(T stack) was cited. in [MAJ 77A], and it was demonstrated that such a simple and plausible data type is not axiomatizable using the algebraic approach. In here, we shall show how simple it is to arrive at the desired specification using our approach. It is required that the T.stack can perform the following operations: pushL- same as the conventional stack push. 3' 7x+ means the Floor value of x.

163 popL- same as the conventional stack pop. readL(z,i)- probe a node indexed i. downL(z,i)- returns the node index neighbouring to i in the direction of the bottom of the stack. returnL(z,i)- returns the top node of the stack. Such a behaviour may be obtained using the following base predicate set: p: i<>i+l b (i): Vj(j<i) b6(i): wj(ji) a(i): i~1 Note that the above behaviour is obtained by a simple adjustment to an already existing stack specification. This facility of extensibility, in our model, is most useful in such cases where an almost similar behaviour to an existing data type is needed. Now we may introduce some auxiliary operations in order to implement the desired operations: downL, returnL etc. The operation downL may be defined as: downL(z, i)=i-1 Since the range of downL is the set of node indices, it is therefore a probe function. For downL to be a legal probe function, for every i, i-le{j: a(j)}. For example, let z be a stack of length 1; then downL(z,1)=1-1=04{j: a(j)}; therefore downL(z,1)=error since a(O) is not satisfied.

164 Finally, the operation returnL(z,i) may be readily defined. This is the same as the p-function of stack introduced earlier, i.e. p(z,i)=Xi. vj(i>j)+i, p(z,i+1) As we mentioned earlier, to specify the above T.stack with the algebraic approach, an infinite number of equations may be needed. This is not of course acceptable. To get around this an extra operation is needed, "hidden" to the user, such that the T.stack may be axiomatized in a finite number of steps. A pointer, viz. shove, which is always pointing at the bottom of the stack, is used[TWW 79]. This would enable one to detect when and if the bottom of the stack is reached. It should be noted that the introduction of shove presents us with some information which is not really needed for a minimal specification, i.e. an implementation bias. Furthermore, the specification given in [TWW 79] does not include error instances, i.e. the so called error equations. To do so it would be a.formidable. task hardly worth the effort.3' 3,8 A potential use of T.stack would be as a syntax analyzer.

165 APPENDIX B Dynamic Change of Behavior Different "views" and "authorization" of data are crucial issues in data base design and maintenance[CHA 75]. The potential of b-expressions to define different views of the same data for different users is apparent. This is accomplished by having different b-expressions and aexpressions for each user. Furthermore, a more global change of behaviour may be observed if the p-expressions are modified, by the Data Base Adminstrator(say), in order to allocate distinct views to each user of the data base. This may be done by adding more, or less, constraints to the b-, p- and a-expressions already specified. Consider an array of some data items, by having different b- and aexpressions for different users distinct behaviours would be exhibited by the data items. Thus for the given array, each user would be concerned only with his own "view", i.e. his own b- and a-expressions. Another, more subtle, way of maintaining different views is to keep the a-, b- and pexpressions constant and change the numbering associated with the nodes. In the presence of a clever algorithm very efficient changes of behaviour may be observed at the cost of permuting the node numberings.

166 Maintaining different views of data for different users, concurrently, is most efficient since it saves duplication of data items into behaviourally different structures. Change of b- and a-expressions only causes behavioural reshuffling while maintaining the same underlying structure. A structural as well as behavioural reorganization may result by either a "renumbering" of the nodes of the structure or changing the p-expressions associated with the structure. Consider, for example, the problem of maintaining an ordered list of items. This problem can be handled by constantly updating the renumbering of, the nodes dynamically as the nodes are inserted. So that the nodes would be numbered in ascending (or descending) order depending on their content. Another view of the same ordered list may be held by numbering the nodes, not according to their alphabetic ordering but by some other "key" value. Hence depending on the type and frequency of the queries different numbering of the nodes may be allocated, or varying p-expressions may be adopted to imply variable relationships between the items.

167 APPEND I X C X-Notation-Basic Idea Suppose E(x) is some expression involving x such that whenever deD is substituted for x- and we shall denote the result of such a substitution by E(d)- the resulting expression (viz. E(d) ) denotes a member of D'. For example if both D and D' are set of natural numbers then E(x) could be x+1 (so E(5)=5+1=6) or perhaps xxx (then E(5)=5x5=25). For such expressions the notation Xx. E(x) denotes the function f: D + D' such that: for all dcD then f(d)=E(d) For example: i) Xx. x+l denotes the successor function. ii) Xx. xxx denotes the squaring function. iii) Xx. (x=O)-true, false denotes test-for-zero function. An expression of the form Xx. e(x) is called a Xexpression, x is its bound variable and e(x) its body. This is the central idea of X-notation. For more elaboration on this topic the reader is referred to [CHU 56,

168 KAT 48, GOR 79]. Note that the body of a X-expression always extends as far to the right as possible; thus Xx.x+l is Xx.(x+l) not (Xx.x)+1.

169 APPENDIX D More on Insertion and Deletion The characteristic set of DSO's requires inverse operations for every i- and d- functions such that terms can be reduced to their minimal form. Consider a term z=i1(d1(i1(i1(d1(i1(, 1), 1),1),2),2),2); then the question is whether we can reduce z to z=i1(i1(,1),2). Indeed this can be done by the reduction rule given in chapter 3. If the node indices match, clearly the reduction can be performed no matter what sequence of i- and d-functions exists provided that the p- and b-expressions are satisfied. There are cases however where the node to be deleted has a non-empty successor set. For example consider an array configuration, z, of length 5. The node indexed 3(say) may be deleted by the appropriate d-function d as follows: d(i(i(i(i(i(%,1),2),3),4),5),3) As a result we end up with a "gap" caused by the removal of a node indexed 3. What happens to nodes 4 and 5? They cannot exist without the presence of their predecessor node 3. In fact the resulting configuration is an element of the UFT. There are a number of alternatives as to what we should do, with the successors of a deleted node, such that the resulting term is not an error term. One alternative is

170 to make this decision at the level of the primitive dfunction. This is discussed below, giving the user the choice to either define an alternative host for the successor set of the deleted node or delete the successor nodes all together. Same philosophy is pursued when an existing node is displaced by a new insertion. A more general approach, however, is to allow the user to specify the "destiny" of the successor set by means of some "auxiliary equations." The auxiliary equations would enable us to define functions in terms of the existing ones. In the above example one can use Such a tool to define the function d' such that d' not only deletes node 3 but also specifies whether node 4 and 5 should remain in the structure, and if so what index values should they take. For example after deletion of node 3, node 4 could change to 3, and node 5 may be changed to 4. Alternatively both 4 and 5 may be deleted, i.e. their index is changed to 0. Let us assume that we are interested in the former approach of. shifting the node indices down by 1. Then d' should be composed of the following sequence of primitive operations; assume z is the original array configuration of length 5 described above: z' =i(i(d(d(d(z,5),4), 3),3),4) It can be seen that by our reduction rule the above term reduces to: z'-=i(i(i(i(4, 1),2),3),4)

171 Note that in this example, since the array nodes contain empty structures, the resulting configuration z' would be the same no matter which one of the 5 nodes is deleted provided that the successor nodes are "shifted down" by one as assumed above. Constructors with Hidden Parameters The i- and d-functions defined in here are intended as an alternative to those defined in chapter 3. They may be employed in cases where the definitions 3.2 and 3.3 are either not acceptable or not convenient to use. An example of this situation would be the FIFO queue data type, where deletion always requires "displacement" of the nodes already in the queue. Note that in this case the deletion function may just as well be defined in terms of the definitions 3.2 and 3.3. Definition 3.2' Let z=(S,u), z'=(S',U)EZL, where ZL is a data structure and let bl be the b-expression associated with level 1, 1l~<L. Also let n denote an isomorphic mapping of two Estructures. An insert ion function (i-function) is a partial mapping: i: ZL x N4 + ZL such that, 1(z, 1, k, m, n)=z'=(S', u') where 1'1<L, and the following conditions hold at level 1:3' This equation denotes: insert a node indexed m at level 1 of the configuration z in node k of the (l+l)th

172 1) bl(S ) l km 2) If 1<L then l,+1,q satisfies a1+1(Sl+l,q,k) for some q. 3) Si = {Skm}U {Sikj E(Sl'Slkj) A jtsucset(m) }U tn(Slki): (Slkiesucset(Slk i=m) b (i) - such that (Siknn(Slkm)')sui and n(Slkm)Sucset(Slkn) and (sucset(n(Slkm)'),pl)- (sucset(Slkm),pl)' 4) The m-index of the 1th level'nodes contained in nodes (k+l), (k+2),...of the (l+1)th level will change only if the maximum value of the m-indices in node k of (1+1)th level is modified. The renumbering follows from the numbering rules of Z-structures described indefinition 2.2. Also the k-index of the (1-1)th level nodes contained in m and the sucset(m) nodes in k, k+l, k+2,...(if any), would change correspondingly according to definition 2.2. The indexing of the nodes S where X=1+1 or X>l and Kzk-1, would remain the same such that for every S ES which satisfy the latter condition, there exists a node SK' ES'. Definition 3.3' A deletion-function (d-function) is a partial mapping: (following the same notation as in definition 3.2) level. The displaced sucset(m), if any, becomes the sucset(n) such that the node previously labeled S would become r(S k )', for some m. The resulting configuration is z'; and th; lnserted node is labeled S' Note that the primed sets and symbols refer to the nemconfiguration.

173 6: ZL x N4 + ZL such that 6(z,l,k,m,n)=z'=(S', u'), and the following conditions are satisfied: 40 1) b (Slkm) 2) If l<L then 01+1,q satisfies al+l(Sl+qlqk) for some q. 3) Si = {Sikj: jisucset(m)Ajom}u {n(Slki)': Slkiesucset(Slkm)Ab6(i)} - such that the latter set is a subset of sucset(Sikn) and structurally isomorphic to sucset(Slkm). If n=0, then the elements of sucset(m) will become empty nodes, hence they would be deleted. 4) The m-indices, of the 1 th level nodes, contained in (k+l), (k+2),.... nodes of the (1+1)st level, if any would change if the maximum node index in the kth node of (1+1)st changes, according to the definition 2.2. Also The k-index of the 1-1 level nodes and the levels below will change correspondingly. The nodes SXK, with X, c and ] the same as that of definition 3.2, remain unchanged. It should be noted that the extra parameter n in the above functions are really "hidden" parameters. Therefore to a user such functions would behave the same way as those'4 The deletion equation denotes: delete the node at level 1 of configuration z,indexed m, in node k of the (l+l)st level. The displaced sucset(m),if any, will become the sucset(n). The resulting configuration is z'.

174 of definitions 3.2 and 3.3. For a queue, say, the value of n in the deletion function of 3.3' is always equal to 1

175 B I BL I OGRAPHY

176 BI BLI OGRAPHY [ARB-75] Arbib, M. H.; E.G. Manes,: Arrows, Structures and Functors, Academic Press inc. 1975. [AFS-80] Afshar-Khajevand, S.: Towards a Behavioral Study of Data Structures,Ph.D. Thesis, CICE, Univ. of Michigan, Ann Arbor MI. 1980. [BRO-80] Brodie,M.L.: Data Abstraction for designing Database-intensive Applications,ACM,SIGPLAN Notices, 16, 1, January 1981. [BUR-77] Burstall, R.M.; J.A. Goguen,: Putting Theories together to Make Specification.,5th intl. conf. on AI, pp. 1045-1058, 1977. [CHA-75] Chamberlin, D.D. et al,: Views, Authorization, and Locking in a Relational Data Base System, AFIP Natl. Comp. Conf, vol 44 pp425, 1975. [CHU-36] Church, A.: Some Properties of Conversion, Princeton University, Princeton NJ.; 1935. [CHU-56] Church, A.: Introduction to Mathematical Logic,vol 1, QA-9.c55, 1956. [DAH-66] Dahl O.J. Nygaard, K.: Simula-An Algol-based Simulation Language, CACM 9, Sept. 1966, pp.671-678. [DIM-69] D'Imperio, M,: Data Structures and Their Representations in Storage.,Annual Review in Automatic Programming, S, 1969. [EAR-73] Earley, J.: Relational Level Data Structures for Programming Languages,Acta Informatica 2, pp293-309, 1973. [END-72] Enderton, H. B.: A Mathematical Introduction to Logic, Academic Press inc. 1972.

177 [GHM-76] Guttag, J.V. et. al.: Abstract Data Types and Software Validation,Information Sc. Inst. Rep. RR-76-48; Marina del Rey, Cal. 1976. [GOR-79] Gordon, M.J.C.: The Denotational Description of Programming Languages-An Introduction, Springer-Verlag, New York, 1979. [GOG-75] Goguen, J.A. et al,: Abstract Data Types as Initial Algebras and Correctness of Data Representation,Proc. Conf. Comp. Graphics, Pattern Recognition and Data Structure, 1975. [GOG-78] Goguen, J.A. et al,: An Initial Algebra Approach to the Specification, Correctness, and Implementation of Abstract Data Types,Current Trends in Prog. Methodology IV, ed. R. Yeh Prentice-Hall 1978. [GUT-75] Guttag, J.V.: The Specification and Application to Programming of Abstract Data Types, Ph.D Thesis, Computer Systems Research Group Tech. Rep. CSRG-59, Dept. of Com. SC. U. of Toronto, 1975. [GUT-77] Guttag, J.V.: Abstract Data Types and the Development of Data Structures,CACM, 20, 6, June 1977.. [GUT-78] Guttag, J.V., J.J. Horning: The Algebraic Specification of Abstract Data Types, Acta Informatica 10,27-52; 1978. [HAN-69] Hansen, W.J.: Compact List Representations: Definition, Garbage Collection, and System Implementation,CACM, 12, 9, Sept. 1969. [KAP-80] Kapur, D.: Towards A Theory for Abstract Data Types, Ph.D. Thesis MIT, Cambridge, Mass. 1980. [KAR-75] Karp, R.M. et al,: Near Optimal Solutions to a Two Dimensional Placement Problem. SIAM J. Comp. 4, 271-286, 1975. [KAT-48] Kattsof, L. O.: A Philosophy of Mathematics, The Iowa State College Press, Ames, Iowa. 1948. [LIS-74] Liskov, B. Zilles S.: Programming with Abstract Data Types,SIGPLAN Notices 9, 4, Apr. 1974. [LIZ-77] Liskov, B. Zilles, S.: An Introduction to Formal Specifications of Data Abstractions, Current Trends in Prog. Methodology, IV ed. Yeh, Prentice-Hall 1978.

178 [MAJ-77A] Majster, M.E.: Limits of the Algebraic Specification of Abstract Data Types,ACM, SIGPLAN Notices, 12,10,Oct. 1972. [MAJ-77B] Majster, M.E.: Extended Directed Graphs, a Formalism for Structured Data and Data Structures,Acta Informatica 8, pp. 37-59, 1977. [MAN-74.] Manna, Z.:. Mathematical Theory of Computation,McGraw Hill, 1974. [MOI-80] Moitra, A.: A Note on Algebraic Specification of Binary Trees,ACM SIGPLAN Notices, 15, 6, June 1980. [NOU-79] Nourani, F.: Abstract Implementations and Their Correctness, SEL, CICE Program ECE DepartmentU. of M. Ann Arbor-Submitted for Publication 1979. [RAN-72] Randall, L.S.: A Relational Model of Data for the Determination of Optimum Computer Storage Structures,Rome Air Dev. Ctr. Tech. Rep. RADCTR-72-25, Feb. 1972. [ROS-70] Rosen, B.: Tree Manipulation Systems and Church-Rosser Theorems,Harvard Univ. Cambridge, Mass. 1970. [ROS-71] Rosenberg, A.L.: Data Graphs and Addressing Schemes.,JCSS5, pp. 193-238 1971 [ROS-78] Rosenberg, A.L.: Storage Mappings for Extendible Arrays,Current Trends in Programming Methodology IV, ed. R. Yeh, Prentice-Hall 1978. [ROS-81] Rosenberg, A.L.: On Uniformly Inserting One Data Structure into Another, CACM, 24, 2, Feb. 1981. [SET-74J Sethi, R.: Testing for the Church-Rosser Property,J. of ACM, vol 21, 4, 1974. [SHN-74] Shneiderman, B.: Structured Data Structures, CACM, 17, 10, Oct. 1974. [STO-771 Stoy J. E.: Denotational Semantics: The ScottStrachey Approach to Programming Language Theory, MIT press, 1977. [TAR-66] Tarski, A.: Equational Logic and Equational Theories of Algebra,Contributions to

179 Math. Logic, Proc. Logic Colloquium, Hanover 1966. [TWW-79] Thatcher, J.W. et. al.: Data Type Specification: Parameterization and the Power of Specification Techniques, Math. Sc. Dept. T.J. Watson, RC.RC7757 Yorktown Heights, NY; 1979. [WON-75] Wong, C.K. Maddocks, T.W.: A Generalized Pascal's Triangle,Fibonacci Quarterly, 13 pp. 134-136, 1975. [ZIL-74] Zilles, S.N.: Algebraic Specification of Data Types.,Project MAC Prog. Rep. 11, MIT Cambridge, Mass. 1974. [ZIL-75] Zilles, S.N.: An Introduction to Data Algebras,IBM Research Laboratory, San Jose, Calif. 1975.

UNIVERSITY OF MICHIGAN 3 9015 03026 9677