AN EXTENSION OF A FIRST-ORDER LANGUAGE AND ITS APPLICATIONS by Dong-Guk Shin A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 1985 Doctoral Committee: Professor Keki B. Irani, Chairman Professor Andreas R. Blass Professor Yuri Gurevich Professor Arch W. Naylor Associate Professor Toby J. Teorey

For my mother and father, my sisters and brother 11

ACKNOWLEDGMENTS The author owes lasting debts to many people in completing this endeavor. My deepest gratitude goes to Prof. Keki Irani for his untiring encouragement, faithful guidance and occasional role as devil's advocate; to Prof. Andreas Blass for invaluable discussions and unforgettable aid; to Professors Toby Teorey, Yuri Gurevich and Arch Naylor for their time and acute critiques; and to Professors Norman Scott and William Root for their warm support and encouragement at the early stage of my graduate study. My thanks are extended to my colleagues and friends, Chin-Wan Chung, Mourad Oulid-Aissa, Genesio Hubscher, Carol Luckhardt, Yi-fong Shih, Suk In Yoo, and Pamela Walters, who made this endeavor more than bearable. My thanks also go to Elizabeth Olsen and Virginia Folsom who typed a large section of Part I; to Prof. Thomas Sawyer, Betty Cummings, and my friend Gretchen Antelman who helped me make the manuscript more readable. My gratitude is also extended to the people who made available the financial resources for this endeavor which include fellowships of the CICE graduate program and grants from AFOSR under contract F49620-82-C-0089. Finally, my thanks go to my parents, Byung-Sik Shin and Chun-Soo Kim, and my sisters and brother. Without their endless support, I could neither have started nor have completed this work. This work is dedicated to them. D. G. S.

TABLE OF CONTENTS DEDICATION................................................................. ii ACKNOWLEDGMENTS.......................................................... iii LIST OF FIGURES........................................................... vii LIST OF APPENDICES................................................................................... viii CHAPTER I. INTRODUCTION.............................................................................. 1 1.1. Motivation.................... 1 1.2. Objectives......................................3 PART I 5 II. PARTITIONING A RELATIONAL DATABASE HORIZONTALLY USING A KNOWLEDGE-BASED APPROACH.............. 2.1. Introduction............................................................................... 6 2.2. Related Literature...................................................................... 12 2.3. Organization.............................................................................. 19 III. MANY-SORTED LANGUAGE WITH AGGREGATE VARIABLES............................................................................... 21 3.1. Syntax of L.............................................................................. 21 3.2. Interpretation of L.................................................................. 24 3.3. E-Extensibility of L........................................27 IV. PROBLEM FORMULATION....................................... 37 4.1. Modeling a Database and a Knowledge Base............................ 37 4.2. KBDDBS Design........................................................................ 44 4.3. Knowledge-Based Approach of the KBDDBS Design............... 51 iv

V. QUERY REPRESENTATION IN L............................................ 57 5.1. Scheduled User Queries............................................................ 57 5.2. E-Normal Form as a Query Representation Formalism.......... 60 VI. KNOWLEDGE REPRESENTATION IN L............................... 66 6.1. Axiomatic Knowledge Identification....................................... 66 6.2. E-Horn Knowledge Base.................................................... 71 VII. INFERENCE PROCEDURE...................................................... 79 7.1. Inference Procedure................................................... 79 7.2. Correctness of the Inference Procedure................................... 98 7.3. Horizontal Partitioning..................................................... 105 7.4. Conclusions and Future Work............................................... 108 PART II 110 VIII. MANY-SORTED RESOLUTION BASED ON AN EXTENSION OF A ONE-SORTED LANGUAGE.............................................. 111 8.1. Introduction..........................................................111 8.2. Related Literature......................................... 114 8.3. Organization............................................................... 117 IX. ONE-SORTED LANGUAGE WITH AGGREGATE VARIABLES LQ................................................... 119 9.1. Syntax of L........................................................ 119 9.2. Interpretation of.............................................................. 121 9.3. E-Extensibilityof L............................................................... 122 X. PROBLEM FORMULATION................................................. 130 10.1. Representation of a Many-Sorted Theory in L.................... 130 10.2. Finitely Many Most General Unifiers...................................... 136 XI. UWR-RESOLUTION.............................. 139 11.1. Unification over the Weakest Range.............................139 11.2. Herbrand Theorem for LC Clauses...................................... 142 11.3. Completeness of UWR-Resolution........................................... 149 V

XII. EFFICIENCY OF UWR-RESOLUTION......................................... 162 12.1. A Hypothetic Many-Sorted Resolution...................................... 162 12.2. UWR-Resolution vs Hypothetic Many-Sorted Resolution......... 169 12.3. Conclusions and Future Work................................................... 192 XIII. CONCLUSIONS................................................................ 194 APPENDICES................................................................................................... 197 BIBLIOGRAPHY........................................ 223 vi

LIST OF FIGURES Figure 2.1. Framework of the KBDDBS Design.................................................. 11 3.1. Summary of the E-extensibility of L,................................................. 36 4.1. Modeling of a Database and a Knowledge Base................................... 39 4.2. A Computer Network of an Auto Corporation.................................... 42 4.3. Horizontal Partitioning System of the KBDDBS Design..................... 56 6.1. Derivation of a VDA axiom.................................................. 75 7.1. The URCs Revealed to the Relation DEALERS............................ 89 7.2. Bipartitions of the Relations DEALERS and SALES..................... 106 vii

LIST OF APPENDICES Appendix A. A Relational Database Example........................................................ 198 B. An Intermediate L-Version of the Herbrand Theorem.................... 200 C. Refutations by R( * ) and R )..................................................... 203 D. Alternative Approaches of R( ' )...................................... 208 E. Translation of a Formula in LI into L,........................................ 215 ViI'

CHAPTER I INTRODUCTION 1.1. Motivation When deductions are made in certain axiomatic systems involving more than one category of objects (e.g., points, lines and planes), two approachs are available: (i) a many-sorted logic in which there are distinct kinds of variables for the different categories of objects, and (ii) a one-sorted logic in which there is only one kind of variable for all categories of objects, but in which there are a special predicates to effect the range restriction of the variables to the respective categories of objects. These two approachs are equivalent in the sense that deduction made by one approach can also be made by the other approachf. In spite of their equivalence, many-sorted logic offers various advantages over one-sorted logic. For example, many-sorted logic allows the utilization of sortal information to enhance the deduction efficiency, and the language for many-sorted logic allows a more compact expressive power than does the language for a onesorted logic. These advantageous features were originally observed by Herbrand who t Their equivalence is formally shown by the Herbrand-Schmidt theorem [Herb30O, Schm381. Let T,, (n = 2, * * *, w) be a many-sorted system, and let T1(") be its corresponding one-sorted system. In [Wang52J Wang mentions "In [51 Herbrand states a theorem which amounts to the following (see [51, p.64): (I) A statement of any system T, is provable in T, if and only if its translation in the corresponding system T7") is provable in T("). However, the proof he gives there is inadequate, failing to take into account that there are certain reasonings which can be carried out in L 1(") but not in L,, In [l], Arnold Schmidt points this out and devotes his paper to giving a careful proof of the theorem." 1

2 first proposed many-sorted logic in his thesis [Herb30]. Following him, various versions of many-sorted logic were proposed and investigated by Schmidt [Schm38, Schm51], Wang [Wang521, Hailperin [Hail57], and Idelson [Idel64]. Recently, the advantages of many-sorted logic have been explored in various areas of computer science including the fields of database design and automatic theorem proving. In the database design area, many-sorted logic is used as a means of formalizing the database [McMi77, GaMi78, Reite8l], and in the automatic theorem proving area, many-sorted logic is used to increase deductive efficiency [Weyh77, Cham78, Cohn83, Walt83, Walt84a, Walt84b]. Although many-sorted logic appeals to various applications of computer science because of the advantages it offers, usage of many-sorted logic is often restricted to a certain extent. The following situation is considered: a system involving more than one category of objects is axiomatized based on many-sorted logic, and the categories of objects determine the sort structure of the axiomatized system. After the sort structure is determined and when deduction is made in the system, it turns out that a new sort is needed that does not exist in the previously determined sort structure. At this moment, the sort structure determined beforehand can be changed to accommodate the new sort, but in some situations it may not be desired to do so for various reasons. When the sort structure determined a priori is desired not to be changed, a variable ranging over a new sort cannot be introduced in the currently known many-sorted logic. In this thesis an extended predicate calculus is proposed in which the problem described previously is avoided. The extended predicate calculus is obtained by embedding a new kind of syntactic object called an aggregate variable in the first

3 order language. Then in this extended predicate calculus, variables whose interpretations are restricted by arbitrary ranges can be introduced as freely as needed during deduction without changing the sort structure determined a priori. Informally speaking, the aggregate variables are syntactically ordinary sort variables, but semantically they are variables whose ranges are restricted by unary relations instead of sorts. Therefore, whenever aggregate variables are introduced, the sort structure does not need to be changed; the system only needs to be augmented by new unary relations that will be the respective ranges of interpretation of the aggregate variables. This property of the extended predicate calculus is called Eeztensibility. When aggregate variables are introduced as part of the first-order language, they can be embedded in a one-sorted language as well as in a many-sorted language. In the former case, the resulting language is called a one-sorted language with aggregate variables, denoted by L', and in the latter case, the resulting language is called a many-sorted language with aggregate variables, denoted by L. 1.2. Objectives The objectives of the thesis are twofold: (i) to provide the theoretic foundation for the extended predicate calculi, and (ii) to demonstrate their practical usage in real applications. Concern for the first part is with the syntax of each of the two languages LE and LI, their interpretations and their E-extensibilities. For the second part, two applications have been chosen that demonstrate the practical usage of Ls and Ls, respectively. One of these applications is related to the distributed database design

4 area and the other, to the automatic theorem proving area. In the first application, LE is used as a tool to describe the user queries to the database and the knowledge about the database. Here it is demonstrated that LE offers a more compact expressive power than an ordinary many-sorted language, which therefore allows the development of a methodology to partition relations horizontally in the context of the distributed database design. In the second application, L is used as a tool to describe a many-sorted theory. In this case, it is shown that Lo allows the introduction of variables whose ranges are restricted to new sorts in the middle of refuting the many-sorted theory, which implies a more efficient many-sorted resolution scheme than the currently existing one. The rest of the thesis naturally divides into two parts: one for the application of LE and the other for the application of L. Part I deals with the application of LE in the distributed database design area and Part II, the application of LI in the automatic theorem proving area. Part I consists of Chapters I through VII and Part II consists of Chapters VIII through XII. Conclusions of the thesis are given in Chapter XIII.

PART I In this part, a knowledge-based approach is proposed with which the user reference clusters of a database are estimated which can be used in partitioning a relational database horizontally during distributed database design. Using the knowledge about the data, the user queries are converted to equivalent queries by a proposed inference procedure. The user reference clusters estimated from these revised queries are more precise than those that can be estimated from the original user queries. A many-sorted language with aggregate variables (LE) is used for the representation of the user queries and the knowledge base. The types of knowledge to be used are discussed. An example illustrates the way inference is carried out, and the correctness of the inference is also discussed. 5

CHAPTER II PARTITIONING A RELATIONAL DATABASE HORIZONTALLY USING A KNOWLEDGE-BASED APPROACH 2.1. Introduction Since the notion of a distributed system (DS), as distinct from a centralized system, was introduced, computer scientists have focused a great deal of attention on the well-defined problems of a distributed system, such as file allocation and network design [Chu69, Whit70, Case72, Chu73, MaRi78, IrKh79]. With the advent of distributed database systems (DDBS), especially when the data model is relational, data allocation in DDBS has been interpreted in a different way from that of file allocation in a DS [RoGo77]. In the file allocation problem the main issue is how to transfer the characteristics of a distributed system into the parameters of a cost optimization model so that the optimum allocation of files could be determined from the model. This view was based on the assumption that files, or relations, are independent of each other; in other words, only one file is needed to answer each query issued at each site. That means that whenever the queried file does not reside at the query site, to answer the query, either the file is transferred to the query site or the query is sent to the file site and the answer is sent back to the query site. 6

7 In the data allocation problem, however, files, or relations, are no longer regarded as independent. Due to the logical intricacy among the relations, processing a query involves one or more relations which requires costly intermediate network processing if all the relations queried are not locally available. Consequently, to include the network flow caused by the intermediate processing in a DDBS design model, the logical relationships among data should be somehow reflected in the model of data allocation. For this reason, the issue of data allocation during the design of a DDBS is different from that of file allocation. The current trend in the design of a DDBS is to partition the relations horizontally and/or vertically and to allocate the fragments of relations over a network [WoKa83, CeNW83, Ouli84]. In these studies, therefore, each local database of a DDBS consists of horizontally partitioned or vertically partitioned fragments of relations instead of complete copies of the relations. The benefits of assigning the fragments of relations have been well understood [RoGo77, TeFr82J. Partitioned fragments offer a great deal of flexibility in distributing data so that the user reference clusters (URC's) to the database at each site - which means certain portions of the relations, or files, of the database around which queries are clustered - could be faithfully reflected in the distribution of data. Thus, with the appropriate replication of fragments, total network flow is reduced and the probability of parallelism in distributed query processing is increased, while the update cost induced by replication is confined to the replicated fragments. In spite of realizing that such benefits accrue from allocating partitioned fragments, not much work has been done in this area, especially in the area of partitioning relations horizontally and distributing their fragments. A major difficulty here is

8 that there is no known significant criteria that can be used to partition relations horizontally. An often suggested practice, for example [WoKa83], is to analyze the expressions for the user queries at each site. These expressions may reveal the URC's to the database and thus these URC's can be used as a means to partition the relations horizontally. However, there is a problem even in this approach because the information contained in the user queries is not sufficient to estimate the URC's precisely. When the URC's are not identified accurately, they may result in an inadequate partitioning. For this reason, determining the URC's as precisely as possible is a well-defined issue in the horizontal partitioning problem. In Part I, an approach is suggested for better estimating URC 's by utilizing not only the user query expressions but also certain knowledge about the data itself. The intended approach is illustrated in the following example. Example 2.1.1 A database of a big auto corporation is used in this example. Let DIVISIONS, DEALERS, and SALES be the relations where DIVISIONS keeps the information about all the divisions of a big auto corporation such as assembly plants, parts plants, and headquarters; DEALERS, the information about all the dealers with which the corporation has transactions; and SALES, all the sales transactions between the plants and the dealers. Suppose there is a query originating frequently at car assembly plants that asks for information about the purchasers of car items, for instance, "What are the addresses of the dealers who were supplied item# B47, V01, or V03? " where the item#'s B47, VOl, and V03 stand for some car items. Based on this query, a

9 DB designer may try to identify the URC 's to the relations SALES and DEALERS and eventually utilize the URC's in partitioning SALES and DEALERS. However, the URC's cannot be determined precisely enough solely from the query. That is, although it is determinable that at car assembly plants the references to the relation SALES are clustered on the fraction of some transactions of car items, say SALES [B 47, VO1, V031, no cluster of references can be assessed on DEALERS because no restrictions have been imposed on the dealers in the query. Suppose there is a fact about this database expressed in English as, "All car purchasers should be car dealers," which implies a relationship between some tuples of SALES and some tuples of DEALERS, or simply between a fraction of SALES, namely, CAR_SALES, and a fraction of DEALERS, namely, CAR_DEALERS. It can then be postulated that such knowledge can be utilized for estimating better URC 's than the previous one which was obtained solely from the query. That is, by knowing that only car dealers purchase car items, it can be concluded that only the fraction of CAR_DEALERS would be queried at the car assembly plants. The preceding example shows that a DB designer can utilize some knowledge about the data in an effort to identify the URC 's as precisely as possible. In Part I, it is intended to formalize the DB designer's role by constructing what is called a knowledge-based system (KBS). The function of the KBS will then be to determine the URC's from the user provided query expressions by applying the knowledge about the database. Once the URC 's are identified, determining horizontal partitions of relations from these estimated URC's can be done straightforwardly. That is, each relation

10 can be partitioned in terms of the URC's identified for that relation. For instance, in the preceding example, DEALERS can be partitioned into CAR_DEALERS and DEALERS-CAR_DEALERS, and SALES, into SALES [B 47, V01, V03J and SALES-SALES [B 47, VOl, V031. This is a legitimate way to partition relations in the sense that as far as processing the query of the example is concerned, the other fractions of the relations are irrelevant. Once the partioning is completed, the fragments can be treated as separate objects for an optimal allocation which would assure the benefits of a horizontally partitioned distributed database design. The overall DDBS design scheme can be viewed as a conjunction of two separate subcomponents, namely, a horizontal partitioning system and a mathematical programming model. The former is a front-end system based on a knowledgebased approach that produces the unit objects to be dispersed, i.e., the horizontally partitioned fragments of relations, and the latter, a linear or nonlinear programming model that determines the optimal distribution of the unit objects. Because the knowledge-based approach is employed to determine the unit objects of distribution, this DDBS design scheme is called a knowledge-based distributed data base design (KBDDBS design). The schematic diagram for the KBDDBS design is shown in Figure 2.1. As a quantitative cost optimization model, the second subcomponent must be furnished with two key input parameters. They are: (1) The unit objects to be dispersed over a network. (2) The frequency with which each unit object is queried at each site. Then, given a set of queries, the total network flow for each allocation configuration can be estimated with some distributed query processing algorithm as discussed in

11 [Bern81, Chun83] and, therefore, the optimal configuration of the horizontally partitioned fragments can be determined. As far as developing a mathematical programming model is concerned, however, there has been much work in the context of file allocation [MaRi78, MoLe77, FiHo8O, Ouli84, DoFo82J; therefore, some adaptation of any of these studies would suffice. For this reason, the mathematical programming model part will not be taken into consideration in this work. Employing a knowledge-based approach that constitutes the first subcomponent is, therefore, the major concern in this work. The goal of employing the knowledgeSystem Parameters Q, F -> Horizontal > Mathematical DB -> Partitioning Programming -> Optimum System Model Allocation KB -> UO I I I <Q, F> Contribution of this Work Figure 2.1. Framework of the KBDDBS Design

12 based system as the front end of a DDBS design is, as stated previously, to exploit the knowledge of the data for partitioning relations horizontally to best suit distribution over a network. In realizing such a goal, there are three issues to be addressed: (1) How the user queries and the knowledge should be expressed so that the knowledge can be applied to the user queries in a deductive way. (2) What types of knowledge should be utilized for this purpose. (3) How the inference should be carried out. The rest of this part deals with these three issues. 2.2. Related Literature In this section, the current research which is related to our study is briefly reviewed. The related research is discussed in three contexts: what techniques of horizontal partitioning of relations have been developed in designing a database?; how has the notion of horizontal partitioning of relations been employed in designing a DDBS?; and finally, what are the current techniques of Al and how have the techniques of AI been used in a database design? Partitioning a relation vertically and/or horizontally is well understood [Ullm80, TeFr82, Date83]. Much has been made of the vertical partitioning of relations to achieve efficient and secure data manipulation, for example, removing redundancy and update anomalies from a database. In the.context of designing a database, however, less attention has been paid to horizontal partitioning. Most recently, although their applications are limited to some extent, there have been several attempts, to develop a theory of horizontal partitioning analogous to the well conceived normalization theory, such as [Bern76, Delo78J, so that more secure data manipulation can

13 be assured than when only vertical partitioning is applied. In [Furt8l], a technique has been developed so that a relation, some of whose key attributes are determined by a non-key set of attributes, may be converted into Boyce-Codd normal form by partitioning relations horizontally prior to the conversion which is otherwise impossible. In [DePa82], it has been shown that for some classes of relations, a larger class of functional dependencies could be revealed by starting with horizontal partitioning and, therefore, with the additionally detected functional dependencies, more powerful vertical partitioning of the relations may be accomplished. In the context of designing a DDBS, the idea of partitioning relations horizontally as well as vertically, has been initiated in the early distributed database design work [RoGo77J. In [EpSW781, a query processing algorithm which exploits a parallelism in a distributed environment has been discussed. In their algorithm, the parallelism in a query processing is sought by partitioning relations horizontally and replicating the fragments over several sites except the relation whose partitioning and replication promises the least storage cost efficiency. Most recently in conjunction with maintaining a DDBS which composed of horizontally partitioned physical fragments in a distributed environment, [MaU183] has suggested some algorithms for inserting and deleting tuples from the fragments. As the first significant work in designing a horizontally partitioned DDBS, a design methodology for a distributed database in which each local database is not a collection of relations but a collection of the fragments of relations has been initiated in [Wong81, WoKa83]. In their work, the semantics of the logical schema reflected in a class of queries are exploited as a means of partitioning relations and from this

14 partitioning, data are distributed in a specific way, which is called "locally sufficient," in order to suppress network flow by employing a high degree of parallelism in processing queries. Their method, however, has various shortcomings, especially when a real environment is not faithfully reflected in their model: first, maintaining that local sufficiency involves prohibitive levels of update cost unless the database is strictly static; second, the communication cost of collecting the final results at the site where the query originated would cost more than the benefits gained from parallel processing, unless the communication cost is far less than the processing cost; and third, while each site's response time may be shortened, the total system throughput may be decreased unless system job loads among the nodes of network are evenly distributed and managed all the time. In short, though it depends on the characteristics of a database and the system parameters, the parallelism in a query processing over a remotely dispersed computer network may not achieve the benefits which are usually obtained in the parallel processing with a tightly coupled multiprocessors, mainly because of the high costs of network communication and the maintenance of local sufficiency at all times. In contrast to the above studies, the design objective of this work is not confined to parallelism in a distributed query processing. Rather our data allocation scheme is based on the philosophy that the minimization of total network communication cost should be achieved by appropriate replication of horizontally partitioned fragments of relations instead of complete copies of relations. By doing so, the URC's at each site are faithfully reflected; and, therefore, the user queries may be processed as locally as possible; and, furthermore, the parallelism in query processing can also be achieved because of the high degree of replication. The price paid in this

15 approach is the storage and update cost of the replicated fragments. However, it is expected that since the update cost of replicated data shrinks as much as the size of the replicated parts of relations shrinks, there is much more leeway to replicate fragments than when fragmentation is not considered. The main issue in our approach is how to take advantage of the knowledge about the data in partitioning relations. As has been pointed out in the previous section, our approach resorts to AI techniques, i.e., drawing inferences from the knowledge about the data and the user queries. In the following, it is first briefly reviewed what techniques have been developed in Al and then it is discussed how the AI techniques have been used in the context of designing a database. With the assumption that all the knowledge to be used is known - aside from the problem of knowledge acquisition - the problem of AI is in general divided into two parts. One is how to represent the knowledge and the other is how to utilize the knowledge once it has been representedt. The classical approach to representing the knowledge has been formal logic. The modification of formal logic from a working tool for philosophers' and mathematicians' into a knowledge representation tool in Al has been initiated by the development of automatic theorem proving techniques, such as the resolution principle [Robi65alt, Here formal logic is used as a knowledge representation formalism and the resolution principle is used as an inference mechanism. The important features of logic are the precisenesst in expressing the t In [McHa69], the problem in AI is differentiated into an epistemological part and a heuristic part. In his classification, the problem of knowledge acquisition is included in the epistemological part. t An algorithm to find an interpretation that can falsify a given formula has been invented by Herbrand in 1930. Gilmore, in 1960, first tried to implement Herbrand's procedure on a computer which turned out to be very inefficient [Gilm60]. Few months later, Davis and Putnam published improved version of Gilmore's program which still was not efficient enough (DaPu60j. A major breakthrough was made by Robinson's resolution principle in 1965 which was much more efficient than any earlier procedure [Robi65a].

16 knowledge and the correctness in inferring any conclusion. Various Al systems based on logic have been suggested, including a general-purpose question-answering system QA3 [Gree69], a robot planning system STRIPS [FiHN72], and a proof checker for proofs stated in first-order logic FOL [FiWe76]. The current research in logic includes the development of a more efficient inference mechanism such as theorem proving via general mating [Andr81], and an extension of the first order logic, such as fuzzy logic [Lee72] - in which how common sense and intuition can be handled are major concerns. The major consideration of logic as a representational tool was how the knowledge identified as useful in the problem domain could be adequately and precisely represented. Departing from this view, a new interpretation about the knowledge has been initiated by a group of researchers, called proceduralists, who argue that the way to use the knowledge - how to make inferences - should also be explicitly included in the knowledge to be represented. A representation scheme, called procedural repreaentation, has been suggested and its emphasis was on how to express the procedural knowledge - the control information for inferences - in a better way. The advantage of this representation scheme is that the inefficiency in processing knowledge represented in logic could be avoided. Starting with PLANNER [Hewi72], a number of procedural representation-based Al programming language projects have followed, including CONNIVER [SuMc72], QA4 [RuDW721, POPLER [Davi72], and QLISP [Rebo76]. Another descriptive purpose-oriented knowledge representational formalism, called semantic networks, has been initiated [Qui168, NoRu75, AnBo73]. The t In [Haye77I, a complete discussion of this issue is presented and the advantage of logic over other representation systems on these grounds is argued.

17 problem in a semantic network is that no simple set of unifying principles is available due to its diversified development. Semantic networks, however, became very popular in Al because of the graphical representation which resembles human memory association. The first program to use semantic network techniques in AI was a question-answering system SIR [Raph68] which was followed by SCHOLAR [Carb70]. Several semantic network "languages" have been proposed which have the full expressive power of predicate calculus. The examples are network formalism [Schu76], partitioned semantic network formalism [Hend751, and the SNePS system [Shap79]. Because of its modular knowledge representation facility - describing the knowledge about what to do in a specific situation - what is increasing in popularity is production system which has been developed by Newell and Simon [NeSi72]. The basic idea of these systems is that a knowledge base consists of rules, called productions, in the form of condition-action pairs: "If this condition occurs, then do this action." The major problem of this representation formalism is the inefficiency of program execution. The strong modularity and uniformity of the productions results in high overhead when they are used in problem solving. Despite the inefficiency of programming execution, because of its naturalness - statements about what to do in predetermined situations are naturally encoded into production rules - production systems have been used as the backbone of expert systems like DENDRAL [BuFe78], MYCIN [Shor76], and PROSPECT [DuHN76]. Most recently a knowledge representation scheme, called frame, which facilitates "expectation-driven processing" has been proposed by Minsky [Mins75]. The important feature in frame is that the procedural knowledge can be easily incor

18 porated into the representations in this scheme: procedures can be attached to slots to incorporate the reasoning or problem-solving behavior of the system. The current Al systems based on frame include KRL [Bowi77], NUDGE [GoRo77], and KLONE [Brac78]. Many researchers in AI, however, have different ideas about what a frame is: there are many fundamental differences in approach among the researchers who have designed frame-based systems. As the two research areas DB and Al grow, the researchers of both areas begin to recognize the common realm shared by the two areas and start to exchange ideas and techniques [SAMP81]. In the following, it is briefly reviewed what AI techniques have been employed in the context of database system design. Historically Query-Answering(Q-A) system [Chan76, Mink78, Reit78b] has long relied on the automatic theorem proving(ATP) technique where the query to be answered is submitted as a conjectured theorem to be proved. With the advent of database management systems(DBMS), several works, such as LADDER [Hend78], PLANES [Walt78], and RENDEZVOUS [Codd781, applied AI techniques to the natural language interfaces as a part of DBMS. Most recently, [HaZd80] and [King8l1 suggested semantic query optimization processing which departs from the conventional approach of [WoYo76, Yao79]. What is noticeable in their work is that the knowledge about the data and the information about the file structure are explicitly used to transform the original query into an equivalent new one which promises far more efficient processing. One important aspect to be considered in these systems is the role of the knowledge-based system which is being employed in each system. In Q-A system, the whole system itself is viewed as a knowledge-based system, which means all the

19 facts of a conceived real world are represented in some knowledge representation language to form a database and the resolution principle is employed as its inference mechanism. In natural language query systems, however, the knowledge-based system is just a front-end mediator to support the translation of a natural language query into a formal form of query. Compared to these, the knowledge-based system of semantic query optimization is regarded as an expert system which guides the transformation of the original query into a new one whose processing cost is far less than the processing cost of the original one. As distinct from any of these systems, the knowledge-based approach in our study is to assist the design of a DDBS. 2.3. Organization The rest of Part I is organized in the following way: In Chapter III, the theoretical foundation for embedding the aggregate variables in an ordinary many-sorted language is established. Here a many-sorted language with aggregate variable (Ls) is introduced in three contexts: syntax of LX, interpretation of Ls, and the E-extensibility of L/. In Chapter IV, by using the extended calculus, a database and a knowledge base associated with that database is modeled as a logical structure and a theory, respectively. Then it is shown how the knowledge-based approach is employed by constructing, a knowledge-based system (KBS). The issues about constructing the KBS are discussed in detail. In Chapter V, the notion of scheduled user queries s introduced as a design resource. Then LZ is used as a tool to represent the scheduled user queries, i.e., a

20 specific form of LE called E-normal form is suggested as the representation formalism for the scheduled user queries. In Chapter VI, the types of knowledge to be used in the knowledge-based approach are identified in terms of five axiom schemas in LE. Instances of these schemas constitute the knowledge base, denoted KB, of the KBS. Also, the notion of a E-Horn knowledge base, denoted KBr,, is introduced as a specific class of knowledge of KB for later use. Finally, in Chapter VII an inference procedure is suggested as a tool to apply the elements in the KB., to the scheduled user queries of the E-normal form to derive the URC's. The soundness of the inference is discussed. How the relations can be partitioned based on the estimated URC's is also discussed briefly. Conclusions and the direction of future work are also given. In Appendix A, a fraction of a relational database example is shown that is used as the master example throughout this part.

CHAPTER mII MANY-SORTED LANGUAGE WITH AGGREGATE VARIABLES LZ 3.1. Syntax of L. In this chapter, a language of an extended predicate calculus, called a manysorted language with aggregate variables (L ), is introduced. Lv is a formal language obtained by embedding a new syntactic object, called aggregate variable, in an ordinary many-sorted language (Lm). In this section the syntax of Le is first introduced. In LI, two types of variables, called simple variables and aggregate variables, are available. The simple variables of L are the same as the sort variables of Lm. The aggregate variables are syntactically ordinary sort variables, but semantically they are variables whose ranges are restricted to unary relations instead of sort domains. Let L/ be with a sort index set I. Formally stated, an aggregate variable of sort i E I is of the form zngQ where Q is a unary predicate symbol of sort i in LE. Semantically z;sQ ranges over the unary relation indicated by Q which is a subset of the domain of sort i. LE with a sort index set I is formally defined in the following: 21

22 Definition 3.1.1 A many-sorted language with aggregate variables LE consists of the following: (1) | 1I infinite disjoint sets V',, V' I where the elements of V' 1 < i < 11, are called simple variables of sort i; (2) 11 1 infinite disjoint sets VI',..., V*, I where the elements of V, 1 < i< I, are called aggregate variables of sort i; (3) I I I disjoint sets C', *, C11 I where the elements of C', 1 < i < II I, are called constant symbols of sort i; (4) for each n-tuple <il * *, i,n >, {il., in ) I, a set R< '"> whose elements are called predicate symbols of sort <il,,, i,,>; (5) for each n+l-tuple <i. * *,i,i,+l>, {il.**-,, i,+1} I, a set F<'1' ''" "+'> whose elements are called function symbols of sort <il,,, i,, i,,>; (6) logical connectives -' and -.; and (7) a universal quantifier V. When it is convenient, L/ will be represented as a quintuple, Ls = < P,R,F, C,p > where P is a unary predicate set whose members are exclusively used in the superscripts of aggregate variables, R is a predicate symbol set, F is a function symbol set, C is a constant symbol set and p is the arity function such that p: R U F -. N+, where N+ is the set of positive integers. In the tuple-representation of LE, P and R may not necessarily be disjoint. If a unary predicate symbol, say Q, in R is used in the superscript of an aggregate variable, Q is also a unary predicate symbol belonging to P The syntax rule of LE is given in the following. First, the set of terms of sort i is inductively defined as follows: (i) any simple or aggregate variable of sort i or

23 constant symbol of sort i is a term of sort i, and (ii) if f is a function symbol of sort <il, in, n +i> and t1,, t are terms of sort i, ~, i, respectively, then f(tl, ~ *, t ) is a term of sort i,+. An atomic formula of Lr is defined to be a sequence of the form A (t, **, t,) where A is an n-place predicate symbol of sort <i, **,i,, > and t, 1 < j < n, is a term of sort ij. Let the set of atomic formulas of L, be denoted by Atom(L ). The set of well-formed formulas of, Form(L s), is then defined recursively as: (i) if aEAtom(L ), then a E Form(Lr); (ii) if a, / E Form (L), then so are -, a, (a -- B), and Vv a where v is either a simple variable or an aggregate variable; and (iii) nothing else, except the expressions obtained by finite applications of (i) and (ii), is in Form(L:). The definable syntactic objects U, n, _, and 3, and the standard notions such as sentences are also introduced in the usual way. The definition of a well-formed formula above guarantees the unique readability of a formula given a. If a involves many parentheses, sometimes it is possible to omit certain parentheses in a formula without introducing any ambiguity. By adopting a standard convention of precedence between logical connectives, some parentheses will often be left out at our convenience. The logical connectives fall into three groups; - n, n and U, and -- and, each of which is considered more binding than the one succeeding it. For example, according to the convention, (( - t n ) - ( u 4 )) can be written as -on N - U e unambiguously. A few more notionis of a well-formed formula such as the existential quantifier 3 (defined as n V-, ), the scope of the quantifiers, the bounded variable, the free variable, and the sentence of Le are adopted without stating their definitions explicitly.

24 3.2. Interpretation of L, A structure is needed to interpret each formula in Le. A many-sorted atrirture for L/, denoted by MS, consists of: (1) II I nonempty sets of objects D, ' *, D 11 I where D,, 1 < i < I, is called the domain of sort i of MS; (2) for each constant symbol c E C', 1 < i < | I, an element cM E D,; (3) for each predicate symbol Rt ER< '' '", {il, i, } C I, a relation Rs C D, X. * X D,; (4) for each function symbol f E F<1" '"'"+1> {i,. * *, i, i, +} C I, a function fM: DI, X - * * X D,. - D,. When it is convenient, MSa is denoted by a quintuple MSa = < {D, },l, P, R, F, C )f where {D, },el is a sort domain set, P is a unary relation set whose members exclusively designate the ranges of aggregate variables, R is a relation set, F is a function set and C is a constant set. The interpretation of a formula in the structure MS, requires a variable assignment function s as follows: Definition 3.2.1 For the set V of variables of LE and the sort domain set {DI, }El of structure MS,, e is an assignment function, 8: V - UD, such that for a simple variable z, of sort i, s(z,) = a, where a E D,; and for an aggregate variable z,EQ of sort i, 8 zO) = a, where if Q ' (QM', D,) is the unary relation intended by Q in MS, then a EQ MS t To distinguish the elements of MS, from those of Lg, usually a superscript is used such as Ru' or Fus'. However, the superscript MS, is omitted if the distinction remains clear in the context.

25 Assignment function for the terms of LE is defined as usual. For notational convenience symbol s is also used for the assignment for the terms. The validity of each formula is determined by the following interpretation rules. Definition 3.2.2 For A (ti,, t) E Atom(L), where A is an n-place predicate symbol of sort <il,, i, > and t,'s are terms, and V,;1, V2 E Form(L^), the satisfaction of the formulas with respect to s in MS, is defined by, (1) I=r: A(t1,,-,t)[s]J iff <8(t1), -,s(tn)> EA * MS6 (2) =MS, -'V'[s iff IV.S V'[s, (3) r=s;,, -- t 2 1 iff if =d s [l1 then:s 2 [8 1, (4) for a simple variable z, of sort i, ^V [s iff for any a ED,, [ 8(zi Ia) (5) for an aggregate variable z,;Q of sort i, =r"s, V '[Sq I iff for any 9 - oI(, MSS5 E Qus, k=.s *[(',~( | )l, {sVk ) if V. ym Vk where for variables vu and v, (vm I a )(vt) = if V, = V As a corollary to the definition, the interpretations of U,, n, and 3 can also be easily defined. In the following it is only shown how the existential quantifier 3 is interpreted.

28 Lemma 3.2.1 For an existentially quantified formula 3v,i I, (1) If v, is a simple variable z, of sort i, I=Ms 3z, f[sj iff for some a E D,, i==, bl [(z, Il a) MS r tS(Cg dM (2) If v, is an aggregate variable z, Q of sort i, =MS 3 z'-Q' [I] iff for some a E Q" I=M=. I ( Q I a)l Prvof. First, it is remarked that 3 v,i is by definition -* W, - *. (1) If v, is a simple variable z, of sort i, =MS 3z, + [ 1J iff it not the case that forall a E D,, MS I[s (x, I a ) iff for some a D,, Is ' (i a )1J. MS6 (2) If v, is an aggregate variable z, Q of sort i, ~t=M izQ 3 ' 1[18 iff it not the case that I= = irQ 8- I [s1 Use MS$ MS iff it not the case that for all a E Q ', iff it not the case that for all a E Q ', s. ([ (xMSQ I iff for some a E Q MS t= [ (-T1s lE"1 Q.E.D.

27 3.3. E-Extensibility of L4 As it should be clear now, the difference between Lz and L, is that in LI a new syntactic object, called an aggregate variable, is additionally featured. Introducing an aggregate variable in LE is often different from introducing an ordinary sort variable in L, for the reason that an aggregate variable's range is restricted to a unary relation instead of a sort domain. Let a unary predicate Q, for instance, be not in the alphabet of L. In such a case, the aggregate variables accompanying Q, for example vsQ, may not be used in any formula of L. However, the fact that an aggregate variable's range is determined by the accompanying unary predicate implies that those aggregate variables accompanying Q, such as vQ, can be introduced if L is extended with the unary predicate Q. In this section, the process of introducing the aggregate variables whose accompanying unary predicates are not in LE is formally described. The correctness of such process is also shown. Let Ts be a theory in LE and let al E TE be of the following form: ua= Vi ( O(y,)-. ____ ). (3.1) Assuming that a(z) is a complex formula with z being a free variable, let Q be a defined symbol such as Q x(z) z) so that ar of (3.1) can be equivalently expressed in a more compact form, say as, as follows: E W ( Q(x,) -. ___ (3.2) The preceding way of abbreviating the formula ac is not satisfactory in the sense that Q in ao is not a predicate symbol at all. A more satisfactory way of doing this is to form what is called an extension of the theory T. The first step of

28 this theory extension procedure is to augment L by a predicate symbol Q and to specify the meaning of Q in the form of V(Q (z ) ca()) (3.3) where a(z) is a formula in LE that does not contain the predicate symbol Q. The formula of (3.3) is called the defining axiom of Q. The second step is then to augment Tr by the abbreviated form as in (3.2) as well as the defining axiom of Q in (3.3). It is clear that in the extended theory as of (3.1) can be replaced by the abbreviated form as of (3.2). The preceding theory expansion procedure is described more specifically. Let Ls be L -< P R F, C, p >. Let P of LE be augmented by a set PA of new unary predicates symbols. The resulting language, denoted by L, is formally called a E-eztenaion of LE. Let A be the set of unary predicate defining axioms of the form W (Q (z) a(z )) where Q E PA and a(z) contains only the unary predicate symbols in P or R of LE. The theory TE in LE is augmented with A and in the augmented theory any formula of the form (3.1) is abbreviated to the formula of the form (3.3) by using respective defining axioms in A. The resulting theory, denoted by Ti, is formally called a E-eztension of T. For the semantics of the new predicate symbols in a E-extended language LE of LE, the unary relations corresponding to the newly introduced predicates must be introduced. If MS, is a model of Tz, then it is not difficult to show that there is a unique expansion by definition of MS,, say MS,', which is a model of TE. MS,' is formally called an expansion by E-definition of MS,. This process of expanding the structure for L is formalized by the following theorem.

29 Lemma 3.3.1 For a theory TE in LE, let MS, be a model of Tr. If T' is a Eextension of TE, then there is a unique expansion by E-definition MSa' of AIMS which is a model of TE. Proof. Let T7 have been obtained from TE by adding A defining axioms and abbreviating relativized expressions appropriately by using the respective defining axioms of A. Let kL be L,= <P,R,F, C, p >. Without loss of generality, it can be said that for some n > 0 A has been constructed in the following way: (i) A~ is the set of all defining axioms of the form Vz (Q (z) ) a(z)) where a(z) contains only the predicates in P or R; (ii) for any j > 0, A' is the union of A-'' and the set of all defining axioms of the form Vz (Q(z) a(z)) where a(z) contains only the unary predicates introduced in A'-' or the predicates in P or R of L; (iii) then, for some n > 0, A =A. It suffices to show that when TE is obtained from Te by adding A a model of T~ is obtained from MS, by expansion by E-definitions and the model obtained in that way is unique. Let A be A = { A' = 1,2, *** n}. Then it is noticed that there exists the partial ordering called " is a subset of " in A The proof is shown by induction on the ordering of A For j = 0, let T: be obtained from TE by adding the set A~ of unary predicate defining axioms to Ts and modifying relativized expressions appropriately by using A. Let MS4 be MSa = < {Di iE, P, R, F, C >. For each defining axiom l' E A~ which is of the form t' = V (Q(z) a(z)), let P of MS, be

30 augmented by the unary relation defined by a(z) in MS,, i.e., {o: = a(z) la( } E P ** (1). Let the augmented structure be denoted by MSa~ and let the defined relation of (1) be the interpretation of the predicate symbol Q in MS4~. It must be shown that MSA~ is a model of T. and MS,~ is unique as a one that is obtained in the preceding way. It is first shown that MS,~ is a model of T~. For each formula 'E A0, from the way that MS,4 was constructed it trivially follows that l.s= ~'. For each formula 0l E T:- A~, ' E Ts since TO is extended from MS,0 Ts by A~. Since MS, is an expanded structure of MS, and MS, is a model of TS, it holds that for each b' E T: -A~0, j=s, O'. Hence MS~ is a model of To. Now the uniqueness of MS4~ is shown. Let MS4' be also a model of T that is obtained from MS, by expansion by E-definitions when TO is obtained from TE by adding A~. Since MSa is a model of T, it should hold that for each defining axiom b' E A~ Os = '. This implies that for each unary predicate MS* symbol introduced in A~ its interpretation in MS,4 and MS,' are identical with each other. Since both MS,~ and MS' are expanded from MS, by using A0, the preceding result concludes that MS4a and MS, are identical. For j > O, it is assumed that when Tj is obtained from Tl-1 by adding Ai - A'- there is a unique expansion by E-definition MSa of MS,'-~ which is a model of Tj. The induction step is the following. Let T{+1 be obtained from TE by adding A'1 +-A'. Let MSd be

31 MSi = < {D, }lE, Pi, R, F, C >. For each defining axiom l' E J+' - AJ, say &' = Vz (Q (z) a(z)), let PJ be augmented by the unary relation defined by a(z) in MS,, i.e., {a: S a(z) [a] } E Pj * (2). Let the augmented structure be denoted by MS1 +, and let the defined unary relation of (2) be the interpretation of Q in MS'+. It can be shown that MSJ'+l is a model of T'+1 in a way similar to the one that showed MS,~ is a model of Tr for j =O. Showing the uniqueness of MSs+l as a model of T:+l is also similar to showing the uniqueness of MS~0 as a model of T:. If MS' is also a model of Tj+' that is obtained from Tj by adding A+l - A, then it can be shown that for each unary predicate introduced in +l - Ai its interpretations in MS1'+ and MS* are identical with each other. Since both MSY'+' and MS' are expanded from MSI only by using the defining axioms in AJ+'-A', MSA'+1 and MS' must be identical. Q.E.D. Now it is shown how a formula such as as in (3.2) can be further abbreviated by introducing aggregate variables. Once the language LE is extended with the unary predicate Q, by using the aggregate variable z,EQ the formula a" in (3.2) can be syntactically translated into a more compact form, say a', in the extended language L of LS as follows: -a' = Vz. __ z, __. (3.4)

32 Let an' replace ar in Tc. Overall, aE' E Te is derived from ad E Tr by introducing an aggregate variable along with LE being extended to L. This characteristic of L: that allows a more compact expressive power in its extended language is called E-eztensibility of LE. In the rest of this section, the Eextensibility of LE is justified, i.e., whether the procedure of translating as E T. in (3.1) into a@' E To in (3.4) is correct or not. One way of justifying the overall translation procedure is to show the following: for each formula O' E T that contains some aggregate variable(s), say A, whose accompanying unary predicates are specified in a set A of defining axioms, there can be derived a formula in L which does not contain any aggregate variable in A and whose meaning is identical with that of '. Given a formula /' E T that contains some aggregate variable(s), its translation process into LE can be done exactly in the reverse way of what a formula such as as in (3.1) was translated into aE' in (3.4). The first step is to convert b' into its equivalent relativized expression in L, i.e., translating a formula of the form (3.4) into a formula of the form (3.2). Let the relativized expression be denoted by O~. The second step is to eliminate from O~ the newly introduced unary predicates by applying their respective predicate defining axioms, i.e., translating a formula of the form (3.2) into a formula of the form (3.1) by applying the defining axiom such as (3.3). The resulting formula of the second step does not contain any aggregate variable and is totally described in LE. Formally stated, the resulting formula, say ', in LE is called a translation of 01 into Tr. The fact that I' and b' have an identical meaning is shown by the following theorem:

33 Theorem 3.3.2 For a,' E T~,if y' is a translation of O' into TE, then /' is true in MS,' iff O' is true in MS,. Proof. There are three kinds formulas in TE: defining axioms, formulas containing no aggregate variables and formulas containing aggregate variables. Here the formulas containing aggregate variables are only concerned. Let l' E Tr be a formula that contains some aggregate variables. Proof is shown by induction on the length of 4'. First let 4' be an atomic formula of the form R(vl',.* *, v, '), and let 1'1 be true in MS,' with an assignment function 8'. Then the translation A' of 4' into Tr is done in the following way: if a variable v,, 1< i < n, is an aggregate variable of the form xzQ, then replace zj7 by zj. Also along with such translation, an assignment function s for the variables in the translated formula is introduced in the following way: if v, ', 1 < i < n, is an aggregate variable of the form zQ:, then a (zj) = a' (zQ), otherwise s (v,) = ' (v,) [the assignment function s defined in the preceding is used throughout this proof]. Let the translation '~ that is obtained in the preceding way be R(v, * *, v,,). Since R =' R ', from the way s is defined it follows that l=yJU' R(V. *** v ) 18 1 <=> < 8 (V1),,* (,, ( ) >E R Mn6 <=> < 8(V), ***,8(v)> RMs < => =. R(v,, * v,,v )|I.! t If A is a formula having free variables vl, '. f, v, then sometimes A (l,, v,,) is written for A.

34 The theorem holds when f' is atomic. Suppose that the result is true for all formulas of length less than or equal to h. It is shown that the inductive step holds for the formulas of length h +1. Let f' and y' be formulas of length h, i.e., for their respective translations f' and, let it hold that for any assignment function 8' and its correspondingly defined assignment function s, ItS,. [\8 iff:=s ~ [8 and =us,' 7y[8 1 iff ts 7' [ J. Inductive step is the following. (Case I) Let s' be - '. It follows that. t'[181'J <=> =S, 'I I <=> A I 'l a' <=> fes A' I18 by the induction hypothesis <=> h- ' " [l. Since ' ~ is - A', the theorem holds. (Case II) Let t' be 0' -- 7'. It follows that t=.u,' 1 1 ' <=>. (' - 7')|81 <=> if t, f ' I s', then M, 7' s18 <> if es, f r I J then kq 'I aJ by the induction hypothesis <=> s + -* l I' Since t' is fi - ', the theorem holds. (Case III) Let A' be 5 /fi' where z? is a simple variable. It follows that

35 h=,s a' ' 'I h <=> = i V'(8' 1 <=> for any a E D, S=M. 0 [1 (xI a )I <=> for any a E Di, rse 1 (' a )] by the induction hypothesis <=> =. v [ [181 Since P' is Vz,', the theorem holds. (Case IV) Let P' be Ei, Q' where V (Q(z) ~- a(z)) e Ts. It follows that =s,^, ''('8 I <=> %8uQ #i l'ot' M. SI ~I~ ~, AfIMSa ' <=> for any a E Q s' =,s ' [1' (z, I a )1 by Q (z) a(z) and Q s C D <=> for any a E Di, if q=, a(, ) (x, ( a ), then =) a' 9 (zi I )1 <=> for any a E, i, i = a(z,) [ (z, I ), then = EDS, 1(i (si i a )] by the induction hypothesis <=-> for any a e Di, t^=, (a(z,) - }) ((z, I a: <=> S *i(a(.( )-a ) [8(zi a)l. Since O' is M* (a(, ) - B),the theorem holds. Q.E.D. The significance of the theorem is that passing from Ts to Tsj does not really do any more than express a formula in Ls more compactly by using aggregate variables. In addition to showing the compact expressive power of Ls, the above theorem suffices to justify the validity of embedding aggregate variables in a manysorted language.

36 L, Language Ls LE Theory T - - TE Structure MS, (L E) MS(' (L ) Figure 3.1. Summary of the E-extensibility of Lz In conclusion, the advantage of LE over L, is that LE offers a more compact expressive power than L,. While the same advantage can be obtained in L, if the sort structure of MS (L,) is changed, for L/ its associated structure MS (L} only needs to be expanded by E-definitions. This characteristic has been called Eeztenaibility of LE. The S-extensibility of LE is summarized in Figure 3.1.

CHAPTER IV PROBLEM FORMULATION 4.1. Modelling a Database and a Knowledge base In this chapter, it is outlined how the knowledge-based approach described in Chapter II for the KBDDBS design is adopted by constructing a knowledge base system. As a preliminary step, in this section, the two notions, a database and a knowledge base associated with that database, are modeled in formal terms. Let the terms by Gallaire and Minker be adopted who call data elementary facts of a real world and the knowledge about the data general facts of a real world [GaMi781. In general, there have been two ways of formalizing the elementary facts of a real world and the general facts of the world. One way is to view both of them as a collection of homogeneous objects, i.e., a collection of sentences in some language. In this view, the collection of the sentences describing both of the elementary facts and the general facts in some language is regarded as a theory while the real world associated with both of the facts is regarded as a model of the theory. This view has been generally adopted in Q-A systems [Chan76, Mink78, Reit78b] in which both the elementary facts and the general facts are considered as sentences in some language so that the answers to the queries which go beyond the elementary facts can be derived from the general facts. 37

38 The other way is to view the two types of facts as two heterogeneous objects, i.e., the elementary facts as a logical structure and the general facts as a theory whose model is the logical structure. The second view better fits the context of database management systems(DBMS). In DBMS, a database is a collection of structured and formatted information which means a collection of elementary facts, while the integrity constraints are neither structured nor formatted information and are above the elementary facts, and, therefore, are regarded as general facts. Integrity constraints are to a database as a theory is to its model. That is, the validity of the integrity constraints has to be enforced within the database all the time, while the truthfulness of each sentence in a theory must be verified if the structure is to be a model of the theory. The elementary facts and the general facts in DBMS, may well be regarded as two different objects, one a logical structure and the other a theory. In this work, this latter view is adopted since data and the knowledge about that data are considered as different categories of objects, i.e., data is a collection of relations to be distributed over a network whereas the knowledge about the data is used as a means of such distribution of the relations. The schematic representation of this view is shown in Figure 4.1. A database is first formalized as a many-sorted structure. The intention is then to define a first-order language associated with the structure so that knowledge about the database and the user demands for the database can be described in this language. More specifically speaking, the first-order language definedt on the structure is a many-sorted language with aggregate variables Lr. t Given any logical structure, a language associated with the structure can be easily defined. One procedure may be simply to collect all the symbols required and establish an interpretative connection between each symbol and each relation or function of the structure.

39 General Facts: KB {,...,i. } 7/^ ~ Theory Real MWorld v Elementary Facts: DB I. R. Structure Figure 4.1. Modeling of a Database and a Knowledge Base

40 The database formalized here is a collection of relations to be distributed among the sites of a network. It is formally stated as follows: Definition 4.1.1 A database application, denoted as DB, is an ordered structure DB= < {D, )},E {P, }EI, {Rj }jel, { C},(,,kEK, > with the associated function X: J —N+ such that (1) I is a domain index set where for each i E I, Di is the set of objects of it sort and each D, constitutes the ith universe of DB. (2) J is a relation index set where for each j E J, X(j) is a positive integer and R, is (j )-ary relation on (D, }, i.e., R, C D,1 X * - X D,, i, EI. (3) K, is a (possibly empty) collection of constant names where for each k E K, a distinguished element Ck is an element of D,. (4) P, is a set of unary relations where if p E P,, then p C D,. In order to illustrate the preceding definition, a hypothetical distributed database is introduced. Throughout Part I, this database is used as the master example. Consider one big auto corporation which is going to develop a distributed database system. The corporation has part plants and assembly plantsse scattered around a large area with the headquarters located some distance from the plants. Suppose the corporation is planning to install three computing sites which will be connected via a

41 computer network as shown in Figure 4.2, one in a parts plant(PP) complex, one in an assembly plant(AP) complex and one in headquarters(HQ). It is assumed that each computing site handles most of the transactions associated with it. That means, in PP the transactions of parts plants are handled, in AP the transactions of assembly plants, and in HQ the transactions typical of headquarters, for instance, the transactions of all the office items which are being consumed by the corporation. Let { DIVISIONS, DEALERS, ITEMS, SALES} be a fraction of the database to be distributed over the three sites each of whose logical schemas is DIVISIONS(di#,div_name,head) DEALERS(d#,address,djype ) ITEMS(item#,i_name,itype ) SALES(div#,d#,item# ) where div# is a division number, d# a dealer number, and item# an item number. Each instance of these logical schemas is shown in Appendix A. In the following an example is illustrated showing how this database is formalized by a many-sorted structure. It is also illustrated how LE is defined on the structure. Example 4.1.1 Given the preceding auto corporation database, the many-sorted structure defined on this database, say DB(Auto), is DB(Auto) = <{item#,d#, *}, ), {DIVISIONS,DEALERS,ITEMS,SALES }, B 47, V02, ~ ~ }>, where item#, d#, *. designate sort domains and B47, V02, - are constant symbols which are the members of item# sort domain. It is noticed that initially the unary relation set {P, } is empty.

42 FfC Figure 4.2. A Computer Network of an Auto Corporation Now since there is no unary relation in the initially defined structure DB(Auto), a Ls is introduced associated with the logical structure DB(Auto) with its unary predicate set being empty. Let such LE be denoted by L (DB(Auto)) to indicate that its unary relation set is empty. Assuming that all the symbols, needed for L (DB(Auto)) are provided, L: (DB(Auto)) = < q, { Div, De, It, Sa, {B47, V01, **}, p> where Div, De, It and Sa are the predicate symbols indicating the relations DIVISIONS, DEALERS, ITEMS and SALES, respectively; B47 and V01 are the constant symbols which belong to the item# sort domain, and; p(It )- 3 and so on. At this moment no unary predicate symbol has been yet introduced. However, it will be noticed later that

43 L (DB(Auto)) gradually evolves into L {DB (Auto)) as aggregate variables are introduced. The preceding example shows how a database is formalized as a structure and a language is defined on the structure. Now a knowledge base associated with the database is formalized using the language defined on the structure, i.e., the knowledge base is a theory whose model is the structure. Definition 4.1.2 For a database application DB, let L LDB) be the many-sorted language with aggregate variables defined on DB. A knowledge base aasociated with DB, denoted by KB(DB), is then a collection of some sentences of LE(DB) which are true in the structure DB. The following illustrates the preceding way of formalizing the knowledge base about the data: Example 4.1.2 Consider the language L (DB(Auto)) of Example 4.1.1. Let a general fact of the auto corporation world be expressed by a sentence, in L (DB(Auto)). As long as t describes a true fact of the real world which is modeled by DB(Auto), 1 is interpreted as true in DB (Auto). Therefore is an element of the knowledge base associated with DB (Auto), denoted by KB (DB (Auto )), i.e., ' E KB (DB (Auto )).

44 It is noticed that Definition 4.1.2 suggests an alternative way of defining a knowledge base. Suppose Th (DB) means the complete theory defined on the structure DB, i.e., Th (DB ) = {-: -=- '}. Then a knowledge base associated with DB is a subset of Th (DB), i.e., KB(DB) C Th (DB). In a practical environment, it is reasonable to assume that no complete theory can be defined on DB, or say a complete knowledge base. Only a necessary amount of knowledge which is useful for the intended purpose can be collected, so this can be at most a proper subset of the complete theory Th (DB), if anything like Th (DB) ever exists. Later in Chapter VI, it will be shown how the necessary amount of knowledge is gathered which constitutes the knowledge base intended to be built. 4.2. KBDDBS Design In this section the KBDDBS design is formalized in terms of a function. By doing so the role of the knowledge base KB in the KBDDBS design is manifested. First, a local database in a network is defined as a logical structure. Then two DDBS design schemes, a DDBS design scheme which does not consider horizontal partitioning and the KBDDBS design scheme which does consider horizontal partitioning, is stated formally in terms of two different functions. By these two different functions, the differences between the two DDBS design schemes is visualized. In a distributed environment, a local database is a fraction of the whole database which is a collection of relations. The fraction of the database may consist of fragments of relations if horizontal partitioning is adopted as design strategy, or complete copies of relations if horizontal partitioning is not considered. Defining a

45 local database requires the notion of a quasi-aubstructure of a many-sorted structure to be introduced because the notion of a substructure is too restrictive to describe the local database containing fragments of relations. The notion of a quasi-substructure is defined as follows: Let DBa and DB be two database applications defined according to Definition 4.1.1, DB' = {{(D,'l},, {P e,, E {R} {C }E, EK,} and DBb = ({{D,},E, {P,}E, {Rl}j~E/, {^}GE.'lKE:}-. Then DBb is a quasi-substructure of DB, denoted by DB & C DB, if the following conditions are satisfied: (i) DSb C D,, for each i E I,and (ii) R C Ra n (D,b X. X D,) ) where i, EI. The notion of a quasi-aubstructure is less restrictive than that of a substructure by the condition (ii). That is, for DBb to be a substructure of DB8, the condition (ii) should be Rf-= R, n (D,* X * * X D ) where i, E I Let L be a site index and I E L. Then in terms of a quasi-substructure a local database application at a site I is formally stated as follows. Definition 4.2.1 Let DB be the database application which is to be distributed over a network. For each I E L, where L is an index set for sites, a local database application of site I, denoted as DBi, is a quasi-substructure of DB, such as DB -= < {Do,L} EI {(RL,,},) E {I(Ck,l} E I,K,} >

46 Example 4.2.1 Let L ={AP,PP,HQ} and let the database to be dispersed be DB(Auto) of Example 4.1.1. Suppose it has been decided to store in HQ only the information about the dealers which deal with office items. Let such information be represented by a subset of the relation DEALERS, called OFFICE DEALERS, then the local database at HQ DBHQ is DBHQ = <{d#,item#,* }, 4, OFFICEJDEALERS, }, {ink, *** }> and DBxQ (Auto) C DB (Auto ). From the definition above, an allocation configuration of DB over a network could be simply formulated as a collection of quasi-substructures of DB, each of which is a local database. The only restriction on each allocation configuration of DB is that each tuple of each relation in the DB should reside in at least one site of the network. The notion of an allocation configuration of DB over a network is defined as follows: Definition 4.2.2 Given a database application DB and a site index set L, an allocation configuration of DB over L, denoted by DDB, is a collection of quasi-substructures of DB such that if DDB =- {DB: IEL} where DB -= < {DI },El, {(Rj,,l )E, {C}.~l }, k E K,} >, then D,= U D,,l, Rj= U Rtl,and C-= U C,. I E L I tL I E L

47 In the following, two extreme cases of allocation configurations are shown, i.e., one an allocation scheme with complete redundancy and the other an allocation scheme without any redundancy. Example 4.2.2 Let DDBc and DDB, be the allocation configurations with complete redundancy and without any redundancy, respectively. Let DDBc = {DBi,: I E L, then VI EL DB = DB. Let DDB, = -DB-: I L } where DBI" = < {D,"i }E, {R"I};jEI, {CEl },EI EK, >. Then VjEJ V11,12EL, if l 12, then Rj', n R,12 =" It is well known that there are two main DDB design criteria space cost and time cost, say C, and Ct, respectively. Consider a DB and a network index set -. Let DDBL stand for the collection of all the possible legitimate allocation configurations defined by Definition 4.2.1 including the two extreme cases of allocation configurations, the allocation scheme with complete redundancy and the allocation scheme without any redundancy. Let M be the index set of all the possible DDB 's, then DDB,L {DDBm:m E M. A DDBS design scheme in general is then to produce the allocation configuration from the given DB and L, say DDBft E DDB,L, in such a way that the costs C, and C: associated with DDB~, are less than any costs associated with each DDB, 6 DDB,L ~

48 The cost C, associated with a DDB may be calculated by simply adding up all the storage costs required by the DDB. The cost Ct, however, can not be decided from an allocation configuration alone because calculating Ct usually requires a known or estimated system load as an additional parameter. In a distributed database system, such system load is usually modeled by a collection of ordered pairs < a user query, the query issuance frequency >. Knowing how often each specific user query would be issued at each site is adequate information to judge a system's load. In order to precisely define what a system load is, QF is introduced as a collection of ordered pairs of the following: QF = (<q;, f >: q, is the jt' query at the site i and!; is the issuance frequency of q}. As is implicit in the description of QF, QF depends on DB and L. Given a DB and L, if QFDBL is the collection of all possible loads on the system of DB over L, then the two costs C, and Ct are functions such as, c, Ct DDB,L -' $, DDB,L X QFDBL - $. In the following, in terms of these two cost functions the two DDBS design schemes, a DDBS design without any partitioning of relations, and the KBDDBS design, are formalized. Definition 4.2.3 Let SDB and SL be a set of various database applications DB's and a set of various site index sets L 's, respectively. Given a DB E SDB and a L E SL, a DDBS design without any partitioning is a function fD,L: QFDB.L -" DDBL such

49 that for a QF E QFDB,L, DB.L (QF) = DDB, where if DB =- {,{RJ}jE, }, DDB- =-{DB":I EL} and DB,"= {..,{R,,1 E I, EL,^ }, then (i) for any j and I, RjI = or R.,, =R, (ii) for each DDB, E DDB,L whose local database application does not allow any horizontally partitioned fragments of relations (i.e., if DDB, = {DB': I E L } and DBl'= {(*,{Rj',I)JEI, EL, } then for any j and I, either R, = = or R, Rj, ), c,(DDB ) + Ct(DDB, QF) < C, (DDB,) + Ct (DDB,, QF) ). Definition 4.2.4 Let SDB and SL be a set of various database applications DB's and a set of various site index sets L 's, respectively. For a DB E SDB, suppose KBDB is a set of all possible instances of a knowledge base associated with the database DB, i.e., the collection of all KB(DB)'s. Given a DB E SDB and a L E SL, the KBDDBS design is a two place function JDBL QFDB.L X KBDB - DDBL such that for a QF E QFDB,L and a KB(DB)E KBD, fDBL (QF, KB(DB)) = DDB, where if DB = { *,{(Rj, ***}, DDB, {DB":IEL), and DBm= {.* *.,{RjL.,,,.-***} then

50 (i) for any j and I, Rj, C Rj, (ii) foreach DDB, EDDB,, (DDB ) + C (DDB, QF)< Cs (DDB, ) + Ct (DDB,, QF ). It is intuitively obvious that for a very large database, in which the queries at each site are more locally clustered at some fractions of the relations, the system designed by the design scheme of fDL would significantly outperform the system designed by the design scheme of f BN. In other word, given DB, L, QF, and KB(DB),if fDNL(QF) = DDB.m and fD L(QF,KB(DB)) = DDBK, then C,(DDBon) + Ct (DDB=,QF) > C (DDBB) + Ct (DDB.,QF). It is mainly because, in fDB, though it depends on the system parameters and the degree of replication, the additional storage requirements plus the cost of maintaining consistent multiple copies of relations at more than one site all the time could be prohibitively high and, therefore, the cost paid for replication becomes more than the benefits gained by replication. This problem of fDBNL originated from viewing the whole body of each relation as the smallest unit of data allocation. In IDB L, however, the unit object of distribution is allowed to be the horizontally partitioned fragments of relations. That means, the storage and update cost for the unnecessarily replicated portion of relations may be avoided by appropriately distributing the horizontally partitioned fragments. The major question in the KBDDBS design D,L is then "How should the relations be partitioned horizontally so that at each site the necessary portion or unnecessary portion of each relation can be faithfully reflected in their allocation?"

51 It is not difficult to se- that the URC 's at each site are necessary to partition relations horizontally. As long as it is known that the user queries at a site are clustered around only a certain fraction of a relation, it would certainly be better to allocate only that fraction of the relation at the site. Therefore, the problem of "How to partition relations horizontally?" is the problem of "How to identify the URC's at each site?" One suggested approach would be to examine the user queries at each site. However, the approach to detect the URC's precisely enough may not be feasible only by looking at the queries because the user queries are not formulated to do so. In the relational data model, users need not specify details about what they want from the database in their query expressions since the details about the database are transparent to the users. The information contained in the user queries is not sufficient to estimate the URC's precisely. Nevertheless, it is postulated that the DB designer can estimate the precise URC's to some degree by exploiting the knowledge inherited from his(her) conception of the real world. After seeing a query, what the DB designer may do for this would be to search for any knowledge which may be applicable to the query and to derive better URC's associated with the query. It is suggested that a knowledgebased approach can be employed in which what the DB designer does is mimiced. In the following section, the knowledge-based approach is discussed in detail. 4.3. Knowledge-Based Approach of the KBDDBS Design In the previous section, it was discussed that what mattered in the KBDDBS design was to determine the URC's precisely and it could be done by employing a knowledge-based approach. In this section it is discussed in detail how the

52 knowledge-based approach is employed by constructing a knowledge base system. The suggested approach is to build a front-end Knowledge-Based System (KBS) that receives the user provided queries, and revises them into equivalent versions by using the knowledge about the data which provide more precise URC 's than do the user provided queries. In building the knowledge-based system, there are two fundamental issues to be concerned with as in building any type of knowledge-based system: knowledge representation formalism and inference mechanism. These two aspects of a knowledge-based system vary depending on a system's domain of applications. No general solution exists. However, one basic philosophy of a logical formal system may be applied in designing a knowledge based system, although a knowledge based system of Al differs from the formal system of logic from the practical point of view. The basic philosophy is stated in [Shoe67]: "Clearly whether or not A is a theorem of T depends strongly on what the nonlogical axioms of T are. Hence we must expect the condition for A to be a theorem of T to refer not only to A, but also to the nonlogical axioms of T. If these nonlogical axioms are sufficiently simple, this will not be a disadvantage. For theories with complicated nonlogical axioms, it is necessary to abandon [a] general solution, and seek a solution adapted to the particular theory." If the knowledge base of any knowledge based system is regarded as a collection of nonlogical axioms with sufficient complexities, what is being implied by the preceding statement is that it would be desirable to develop a specific inference mechanism for a particular knowledge-based system rather than to develop a general inference mechanism applicable to any knowledge-based system. In this study, this philosophy is faithfully followed. When building a knowledge base for the KBDDBS design, the knowledge useful for its intended purpose is expressed in some specific types of

53 formulas in LE and an inference mechanism is developed which could be efficiently applied only to those formulas. In the rest of this section, the KBS of the KBDDBS design is formalized in terms of a knowledge representation formalism and an inference mechanism. As a preliminary step, a knowledge-based system in general is formalized as follows: Definition 4.3.1 A knowledge-baasd system in general (KBSG) is an ordered pair, KBSG = <KRF,IM>, where KRF and IM stand for a knowledge representation formalism and an inference mechanism respectively, such that from the known facts expressed in KRF some additional true fact is efficiently deducible only by using the syntactical processing based on IM. Example 4.3.1 If a propositional calculus (PC) in logic is considered as analogous to a knowledge based system in Al, then the collection of the tautologies and the nonlogical axioms of PC is considered to be a knowledge base and modus ponens with refutation procedure as an inference mechanism, therefore, PC = < syntax rule of PC, modus ponens with refutation procedure >. The practical systems in Al, the Q-A systems [Mink78, Reit78b] and QUIST system [King81] can be formalized into Q-A = < syntax of applied first order language, resolution principle >, QUIST = < syntax of QUIST query language, inference guiding heuristics >.

54 It becomes clear from the preceding examples, depending on the problem domain to which a knowledge based system is applied, KRF and IM of a KBSG vary. For instance, compare a Q-A system with the QUIST system. In a Q-A system a query is furnished as a conjectured theorem which ought to be proved, while in the QUIST system, a query is given and a set of equivalent queries may be derived. The former, therefore, naturally appeals to automatic theorem proving technique (ATP), which means KRF and IM of a Q-A system may be mapped into a formal language syntax and the refutation technique respectively. The latter, however, is not adequate for the application of ATP, because there is no conjectured theorem given a priori. Because of this reason and the fact that there may be many equivalent queries deducible which may not necessarily be beneficial, in QUIST a specific inference guiding heuristics has been chosen as its IM and at the same time KRF has been developed to fit its IM. The KBS of the KBDDBS design is similar to the QUIST system in the sense that there is no conjectured theorem, but it differs from the QUIST system by the fact that there should be only one conclusion to be deduced by its inference mechanism. Suppose the KBS is modeled by <Syntax of LE, IM '> where IM' is some inference mechanism applicable to the formulas in LE. In this case devising the inference mechanism IM' to be applied to any formulas of Lc could be extremely difficult because L/ is a very general knowledge representation formalism. Fortunately, the fact that the useful knowledge for partitioning horizontally falls into a class of specific types of specific types of formulas in allows the development of a simple efficient inference mechanism of the KBS. Following the philosophy of a

55 logical formal system quoted earlier, it is suggested that the KBS be modeled by < E-Horn formula, -- > where E-Horn formulas are some specific types of formulas in LE and H- is a simple syntactic matching procedure which is applicable only to the E-Horn formulas. The schematic diagram of the horizontal partitioning system of the KBDDBS design is shown in Figure 4.3. The KBS consists mainly of two parts, namely, the knowledge base constituting some specific knowledge about the data, and the inference mechanism.

User queries at each site Determination Determination Knowlege about L>I inference revised query RC's.the..of meanim exof the data mechanism expressions the.data. j. expe n URC 's Horizontal partitioning KBS Figure 4.3. Horizontal Partitioning System of the KBDDBS design

CHAPTER V QUERY REPRESENTATION IN L, 5.1. Scheduled User Queries User demand to the database, or simply saying user queries, is one of the fundamental design resources in most of the DDBS designs schemes. In this chapter it is shown how this fundamental design resource is expressed for the KBDDBS design by using LE as the descriptive tool. First, in this section the notion of scheduled user queries is introduced and its importance is discussed. When user queries are used for a DDBS design purpose, what information needs to be acquired from the user queries determines how the user queries should be expressed, i.e., the intended usage of the user queries determines query representation formalism. For example, in [Aper81], all the information needed from the queries is which kinds of relations appear in each query. Therefore, their query representation formalism only contains the information regarding what relations are needed to answer a query. Such representation formalism is sufficient for their intended purpose, since the issue of their study is to reflect the intermediate data flow only in terms of relations. Knowing which relations would be required to answer queries is enough to determine which distribution configuration of the relations would be the optimal. 57

58 However, in our study where the issue is to introduce a methodology of partitioning relations horizontally based on the URC's, the URC's obtained only in terms of the relations are not tight enough. The URC's should be identified in terms of fractions of relations. In order to make it so, the query representation formalism in the KBDDBS design must contain the information regarding which fractions of relations are required to answer a query. Beside the preceding requirement, there is another condition to be satisfied for the query representation formalism in the KBDDBS design. That is, since the knowledge about the data is applied to a query, the query should be expressed in a compatible way with the knowledge to be applied. In summary, there are two issues involved in representing the user queries for the KBDDBS design: (1) In the query expression, restrictions should be explicitly specified, as well as projections or joins, because the horizontal partitioning is mainly determined by restrictions. (2) The queries should be expressed in a compatible way with the knowledge so that the knowledge can be applied to the queries via some syntactic inference process. With these two issues in mind, the two notions, "user queries" and "scheduled user queries," are differentiated as follows. User queries are the instances of queries issued by the users which may be identified by a DB designer by intensive interviewing the users before attempting to design and acheduled user queries are the query expressions derived from the user queries in a "collective" way. What is meant by collective way is explained in the following.

59 First, let the notion of "same type" of user queries be defined as follows: For a given set of user queries, if any pair of user queries differ only by the values of restriction, then the queries in the set are of the same type. A scheduled query representing the set of the user queries of the same type is then any single formal expression that has the meaning of combining all the user queries in the set. For instance, suppose there are two user queries such as "Who has been supplied item#=B47?" "Who has been supplied item#=V03?" These two queries are of the same type since they only differ by the restriction values B47 and V03. A scheduled user query derived from these two user queries is any formal expression having the same meaning as "Who has been supplied item#=B47 or V03?" (5.1) Regarding the first issue, i.e., restrictions should be explicitly specified in the query expression, a scheduled user query certainly contains the information regarding the subsets of relations on which restrictions are made. For example, any formal expression for (5.1) can indicate that the restrictions of the user queries are only to the transactions whose item# values are B47 or V03. Now for the second issue, i.e., the queries should be expressed in a compatible way with the knowledge about the data, it is suggested that LE be used as the representational tool of the scheduled user queries. By using aggregate variables in LE, the collection of the same type of user queries can be compactly expressed in LE. How scheduled user queries are expressed compactly in LE is the content of the following section.

60 5.2. E-Normal Form as a Query Representation Formalism There have been many languages suggested, and implemented in practice, as tools for representing queries. Each query language has been developed for its own purpose and these languages are compared to one another on the basis of different criteria. When concern is only with a relational data model, the query languages are divided into two types: algebraic languages and predicate calculus languages. The calculus-based languages are further divided into two classes, namely, tuple relational calculus and domain relational calculus. The primitive objects of the former are tuples of relations and those of the latter are elements of the domain of the same attributes. Here LZ is used as a query language based on the domain relational calculus. In general, a query expression means the set of tuples to be returned as the answer to the query. Either in an algebraic query language or in a calculus-based query language, the syntax rules for query expressions are made up so that a query expression written according to those rules is intended to mean the set of tuples returned as the answer. Suppose there are two relations R, and Rj each of whose arity is two and which are joinable via their key attributes. If a query q is the one retrieving tuples of R, whose key attribute is the same as the one of R, then in an algebraic language q would be expressed as Ri < R. Compared to this, in a calculus-based language q would be expressed as q = { <,y>: R (,y) n R, (,)}.

61 It is clear in the above example that unlike an algebraic language, a calculus-based language allows the query expression to be built up in two layers: one the intentional qualification clause which is a well-formed-formula of the calculus; and the other the bracket "{ }" intended to mean the set implied by the qualification clause. The well-formed-formula is called a qualification clause and the complete representation of a query which is intended to mean the answer set of the query is called a query expression. When representing the queries in Ls which is a domain relational calculus, a query's qualification clause must also be explicitly distinguished from its expression. The separation of the two notions is essential in this study since the knowledge is not applied to the query expression but to the qualification clause of the query. In order to separate the two notions in the context of LE, the notion of DEF( ' ) is first introduced as follows: Definition 5.2.1 Given a structure DB, let L dDB) be a language associated with DB. If 0 is a formula in L (DB) with n free variables, then DEF(DB,) is an n-ary relation such that DEF(DB,9)-={<a, * *,a, >: I= 9Is}, where if V is the variable set of L dDB) and {D, } is the set of sort domains of DB, then s is a variable assignment function 8: V -. UD. El

62 Now the two notions, a query expression and a query clause, are formally introduced as follows. Given a database application structure DB, let (vl, *, v,) E Form (LDB)). Then a query ezpression q is an expression of the form DEF(DB, Ot(vl, *, v,)) and the query clause of q is i(vl, * v, ) of DEF(DB, v'(v,, v,)). Example 5.2.1 Suppose the following is a user query to DB(Auto) frequently issued at site AP: "What are the addresses of the dealers who were supplied item#=B47?" Then the expression of this query in LE is, q = DEF(DB(Auto), 1), where the query clause t1 of q is i1 =- z y v (Sa (,y,B47) nDe (,u,v)). As long as the notion of DEF(') is clear, from here on by simply a query it would be often meant a query clause. In the rest of the section it is shown that query clauses for scheduled user queries are of certain form in LE. As stated in the previous section, an issue of how the queries should be expressed is whether the user queries can be expressed in a compatible way so that the knowledge can be applied to the user queries in a deductive way. With this in mind, a class of formulas of Lk is defined as follows:

63 Definition 5.2.2 For a E Form(LE), a is in E-normal form if a is of the form, for some n>0, i * * * iv (vl, * * ', v,,), where {v, * * - V} c {vt, * *,}, such that (1) v,, 1 < i <n, is a simple or an aggregate variable, and (2) f(vl, '*, v,) is a conjunction of atomic formulas. From here on, the E-normal form is used to express the query clauses for scheduled user queries. The expressive power of E-normal form is illustrated by an example. Example 5.2.2 In addition to the user query clause I, shown in Example 5.2.1, suppose there are other user queries, say V2 and 3, as follows: 2 -= z 3y l IV (Sa(z,y,V01) n De (y,u,v)), and t —3= 3z 3vy v (Sa (z,y,V03) n De (y,u,v)). Then I, '2, and X3 are all of the same type. The scheduled user queries made up of 0Il, b2, and V3 is a query expression asking "What are the addresses of the dealers which were supplied item# = B47, V01, or V03?" Without using aggregate variables, one way to express the query is DEF(DB(Auto),tl U t2 U 3s). In fact, the disjunctive form V1 U V2 U Vs' can be equivalently expressed as

04 0i1 U 02 U 03= 32a 3y 3v lz (Sa(z,y,V03) n De(y,u,v)n (z=B 47 U z= V01 U z =V03)). Now it is shown how this scheduled user query can be compactly expressed in Enormal form. Suppose L LJDB(Auto)) is E-extended by a new predicate symbol J and at the same time the structure DB(Auto) is also expanded by E-definition by the defining axiom of I such as Vz (J(2) 4 (z =B47 U = VOl U =V03)). Then by introducing an aggregate variable, the disjunctively conjoined formula lI U 02 U 0s collapses into a E-normal form formula ql as follows: -, = z ly 3zE 3vI a (S,y,z) n De (y,u,v)). (5.2) Here the scheduled user query expression DEF (DB (Auto), 01 U 02 U 0s) is equivalently expressed by DEF(DB (Auto ), ql) In the preceding example, it is clear that qx of (5.2) is much more compact than the disjunctively conjoined formula 1i U 0&2 U 0a. User queries are expressed in a much more compact way in LE than in an ordinary many-sorted language. How such compact way of expressing the query allows the application of the knowledge to the query is discussed in detail in Chapter VII. From here on as long as the distinction between user queries and scheduled user queries is clear, i.e., the latter is made from the former to be used for the purpose of the KBDDBS design, by simply "queries" it is meant scheduled user queries.

65 Finally, by using the formality of the query clauses in E-normal form, the notion of the URC 's which has been introduced informally in Chapter II is defined in terms of the atomic formulas of LE as follows: Definition 5.2.3 For a query q in E-normal form, let the matrix of q be of the form R1 n * * n R where R, EAtom(L), 1 < i < m. If DEF'(DB,q) stands for the singleton set whose member is the set DEF(DB, R, ), then the URC identified by q is U DEF'(DB q) An example of the preceding definition follows: Example 5.2.3 The query q of (5.2) in Example 5.2.2 is considered. The matrix of q is Sa (z,y,zE) n De (y,u,v). The URC 's identified by q1 is then the set {DEF(DB (Auto), Sa (y,yz )), DEF(DB (Auto), De(y,u,v))}.

CHAPTER VI KNOWLEDGE REPRESENTATION IN L1 6.1. Axiomatic Knowledge Identification In this section it is discussed what kind of knowledge is included in the knowledge base of the KBS. In any knowledge-based system, what types of knowledge should be included in its knowledge base generally depends on the purpose of using the knowledge. In the KBDDBS design, the purpose of using the knowledge about the data is for the horizontal partitioning, and by doing so to eventually reap the benefits accrued from allocating the partitioned fragments instead of the complete relations. The benefits accrued when distributing horizontally partitioned fragments include: (i) during the process of queries, the selection operation is dispensed with in some degree by presuming each fragment as a preselected subrelation, and (ii) the unnecessary join operations are eliminated by knowing a priori the fact that join operations between some fragments produce a null set. Such benefits, which are sought by relying on the dispersion of horizontally partitioned relations, imply what should be derived from the knowledge base and, therefore, what should be in the knowledge base. They are mainly the two types of knowledge: (i) the knowledge which contains the notion of preselection (or, say, prepartitioned fragments) which would be of benefit to dispense with the selection 66

67 operations of queries, and (ii) the knowledge which shows the relationships between the prepartitioned fragments of relations which would eliminate any unfruitful join operations. It is postulated that these two types of knowledge are expressible in terms of five types of axiom schemas in LE. In other words, the instances (i.e., axioms) of five axiom schemas constitute the knowledge base of the KBS which is utilized for the purpose of horizontal partitioning of relations. The five axiom schemas identified are Functional Dependency Axiom schema (FDA), Relationship Axiom schema (RA), Inherency Axiom schema (IA), Ground Defining Axiom schema (GDA) and Virtual Defining Axiom schema (VDA). The reason the knowledge is classified into the axioms of five types is twofold. One is to identify the knowledge useful for horizontal partitioning via syntactic formality, and the other is to exploit the formality for developing an inference mechanism. In the following, the meaning of each axiom schema is first briefly explained and then the representation of the schema in Ln is shown. Examples of each schema are given in the following section. These examples are annotated at each schema description. To simplify the expressions, some abbreviations are adopted: if A is an index set such as A =(a{, *.., a, then XA and QXA are the abbreviations of the sequences 21, * * *,Z, and Qz, * * Qx,,respectively, where Q is either V or. (i) Functional Dependency Axiom (FDA) Functional dependency (FD) in a relation is a well known concept. Any type of FD can be expressed in the form of schema discussed below and also any axiom of this schema describes a FD (e.g., (6.3)).

68 FDA schema: Given an n-ary relation R whose attribute index set is { 1,*, n }, if there is a FD from XA to X8 where A and B are the subsets of 1, *, n, and A n B is not necessarily the empty set, then the FD is expressed as VXA BVX' Y VYB' WVY (R(XAXB,XC) n R(XA,YB',Yc) - ( y,1 n * * n z,f = y,, )) where all the variables of XA, X', YB', and Yc are simple variables, and B' = B-A, C = 1{, —,n}-(A U B), and each i' is an element of B. (ii) Relationship Axiom (RA) An axiom of RA schema describes the following types of relationships which hold between two relations: (i) whether an attribute of one relation shares a common domain with an attribute of the other relation, and (ii) if so, whether join of the two relations over the attributes of the common domain is meaningful in the sense that queries including the join of the two relations on these attributes are meaningful. Although any two relations having attributes which share a common domain are actually joinable, not every join of such relations would be meaningful. Only meaningful join of two relations is specified in this schema (e.g., (6.10)). RA schema: Given an n-ary relation R1 and an m-ary relation R2 whose attribute index sets are 1, *, n } and {1, *, m} respectively, if the range of x,, i E {1, *, n }, is identical with the range of yj, j E {1, *, m }, and the join of R1 and R2 on z, and yj is meaningful, then such relationship between the two relations R1 and R2 is expressed as:s,:X,' y y, Ys, (R (Z,,XA ') n R2(y,Ys, )n(z,==y,)) where A'={l,,n})-{(i}, B'= {l1, —., m}-{j} and all the variables za, XA,, vj and YB are simple variables.

69 (iii) Inherency Axiom (IA) An axiom of IA schema is the type of knowledge which plays the key role in the KBS, since URC's are more precisely estimated by knowing the relationships between the subsets of relations. The type of knowledge in this schema consists of the inherited facts specifying how the subsets of relations are interrelated to each other. Axioms of this type mostly carry how the relation could be subdivided and how the subrelations are interrelated (e.g., (6.4)). IA schema: Let an n-ary relation R and an m-ary relation R2, whose attribute index sets are (1,, n } and { 1,, m } respectively, be related by an axiom of RA on the attributes XA, where A C { 1, -, n }. Then a relationship between some fractions of these two relations R1 and R2 is expressed in the form of VXA ( XA' XAO (XA,XA',XA ) * 3 YB R2(XA,Y)) where some variables of X^A and at least one of each of XA, and YB are aggregate variables; A' = {1, * * *, n} - A, B = {1, *, m } -A, A* is some attribute index set of relations RA-related with R1; and (X,XA,XA4) E Form (LE ) is a conjunction of atomic formulas of LE including R,(J,,XA, ). Here it is noticed that any axiom in this schema is a E-Horn formula ( EHorn formulat is a variation of Horn formula [Horn51] in which variables in the formula may be aggregate variables). t In a more formal way, E-Horn formula can be stated as follows: A b E Form(Lr) is a basic — Horn formula iff ~ is a disjunction of formulas e,, -= e U u e, where at most one of the formulas e, is an atomic formula, and the rest of them are the negations of atomic formulas. A g-Horn formula is built up from the basic E-Horn formula with the connective n, the quantifiers 3 and V. A E-Horn sentence is a E-Horn formula with no free variables.

70 (iv) Ground Defining Axiom (GDA) In the axiom schemas of IA, some unary predicates are introduced as the accompanying symbol of aggregate variables. As previously discussed in Section 3.3, whenever a unary predicate, say P, is introduced syntactically, the meaning of-the symbol P must also be described and introduced as what is call a defining axiom. There are two ways of doing this: one by GDA schema and the other by VDA schema. If the defining axiom is explicitly stated in terms of constants, it is called an axiom of this GDA schema and if it is implicitly defined in terms of some other existing relation predicates, then it is called an axiom of VDA schema. By the GDA schema, the members of the set which is interpreted by the introduced unary predicate are explicitly defined in terms of the constants of Lf. The GDA schema is formally stated as follows (e.g., (6.1)). GDA schema: Given a unary predicate P to be introduced, the GDA schema is represented in the form, Vz (P(z) (z - c1 U U z = c")), where c, 1 < i < n, is a constant symbol of a sort domain to which the aggregate variable accompanying P belongs. (v) Virtual Defining Axiom (VDA) An axiom of VDA schema is an axiom defining a unary predicate in terms of some other pre-existing predicates. When defining the set designed by a unary predicate, it may not only be described in terms of the constants of L but also may be expressed in terms of nonunary predicates of Ls. The set designated by a unary predicate may simply be expressed in terms of VDA schema by some combination of

71 join, selection, and projection of relation predicates, instead of by only the constants of LE. The formal representation schema is the following (e.g., (6.8)). VDA schema: Given a unary predicate P to be introduced, the VDA schema is represented in the form, z (P (z) f a(z)), where a(z) E Form(L ) and z is the only free variable in a(z). 6.2. E-Horn Knowledge base In the previous section, five types of axiom schemas have been identified whose axioms would constitute the knowledge base of the KBS. The knowledge that is directly applied to the queries is in fact a special class of the axioms of the five schemas. Other kinds of knowledge are used for secondary purposes. In this section, it is discussed how the special class of the knowledge is constructed from the axioms of the five types of schemas. The knowledge base of KBS can be said to consist of two levels that are denoted by KB and KBrH, respectively. KB is the knowledge base constituting the five types of axioms that was introduced in the previous section and KBrH is a subset of the logical consequences of KB. In fact, KB is a proper subset of the complete theory KB (DB) defined on the database structure DB, i.e., KB C KB(DB). By this it is meant that KB is a collection of knowledge selected from KB (DB) that is of interest to the DB designer of a specific application domain. In our case, the knowledge needed to estimate the URC's is the content of KB. KBr_. is indeed the collection of the axioms that actually take part in the syntactic inference procedure of the KBS (how it is done is the content of Section 7.2). KB H consists of two types of axioms: [type I] the IA axioms in KB each of which

72 has its "corresponding" FDA axiom in KB (the notion of "corresponding" is defined shortly), and [type II] a class of axioms equivalent to the IA axioms of type I which are individually derived from some relevant IA, FDA, GDA, VDA and RA axioms. The axioms of type II are also IA axioms. From the fact that both types of the axioms in KBnw are IA axioms and the IA axioms are E-Horn formulas, KBrJ is called a E-Horn knowledge base. In the following it is shown by example how axioms of the five schemas constitute KB and how KBrj is constructed from KB. In this example it is also clarified why the five types of axioms are identified as useful knowledge for the KBS. First, it is illustrated that the axioms of three schema types, FDA, IA and GDA, constitute KB. Let a knowledge provided by a DB designer be "All the dealers which are supplied car items are the car dealers." Let B47, VO1, V03, and W09 stand for all the car items, and let 50 and 51 stand for all the car dealer types. If P and Q are defined, respectively, Vz (P(z ) t(z =D47 U z = V U z =V03U z = W09)), (.1) (6.1) Vz (Q(z) (z= 50Uz =51)), then, with a little exercise of imagination, the preceding knowledge is recapitulated by the following formulas in an ordinary many-sorted language: vry (o az(Sa(z,y,z})nP(z)) au 3 V3 (De(y,u,v) n Qv))) (6.2) in conjunction with a.functional dependency in the relation DEALERS from d# to d_type of the following form Vz Vy Vz Vy ' z '(De (z,,z) n De (z,y ',z ') -. z=z ') (6.3) (why (6.2) must be in conjunction with (6.3) is additionally explained in detail later

73 in Section 7.2). By introducing the aggregate variables zr and vQ, (6.2) is equivalently expressed as Vy (32:zrr Sa(Z,y,z'P) - -u 3vrQ De(y,u,vEQ)). (6.4) Here (6.3) is a FDA axiom and (6.4) is an IA axiom. (6.3) and (6.4) are elements of KB. In the process of generating (6.4), it is required to add the axioms of (6.1) in KB as GDA axioms in order to make P and Q meaningful predicate symbols. The preceding illustration shows the axioms of the three schemas, FDA, IA and GDA, are elements of KB. Axioms of other two schemas, RA and VDA, are illustrated in the context of constructing KBEH from KB. In the rest of the section, it is shown how KBr, is constructed from KB First, the notion of "corresponding" FDA axiom is introduced for each IA axiom. Let an IA axiom K be of the form K == Vz* * * V., ((2z, *..,, )j -. *** 3Z R (z.,,* * *,,, *, zk)) where {(z,,, ^, * 1..* *, z}. Let z1, * *, z be the only aggregate variables among z, * *, zk. A FDA axiom is said to be the corresponding axiom of K if the FDA axiom is of the form Vu- * Vuh y * * v.y- VY; * * * Vy (R(ul, * *, u, ty,, * *, ) n R(ul, * -,Uh,t;. * y' /) -., y. n * n, =, ), which states that there is a functional dependency in R from ul, *, uh to / wi, ' I/Wl ' KB_. is constructed by including only the IA axioms in KB each of which has its corresponding FDA axiom in KB. The IA axiom of (6.4) is considered. It is

74 clear that (6.3) is the corresponding FDA axiom of the IA axiom (6.4). Since (6.4) is accompanied by (6.3) in KB, (6.4) is a legitimate element of KBR. There can be possibly a class of IA axioms in KB which are not accompanied by their corresponding FDA axioms. For instance, (6.4) may not be accompanied by (6.3) in KB, although (6.4) is still an IA axiom and is therefore an element of KB. Such IA axiom must not be included in KBEH. The reason that IA axioms without their corresponding FDA axioms should not be included in KBEH is because they can lead to an incorrect identification of the URC 's (it will be discussed in detail later in Section 7.2). Now it is illustrated how a class of axioms equivalent to the IA axioms in KBEH is derived and also included in KBHf. In this context, the need of RA and VDA schemas is illustrated. The reason KBCH is expanded by adding new IA axioms is to allow a larger class of user queries to be handled by the KBS. These new axioms are also IA axioms and they are generated in conjunction with some relevant IA, FDA, GDA, VDA and RA axioms. First it is shown how a VDA axiom is derived from IA, FDA and GDA axioms. Let there be a relationship saying "All the values of itype in ITEMS for the car items are only 'bus', 'sedan' and 'van' " which indeed holds in the relation ITEMS [cf. Figure 6.1]. This relationship is expressed as Vz-" ( 3y 3z It (zEP,y,z) _ —,. 3:VR It (ZP,U,V,,R)), (6.5) where R is defined Vz (R(z) ~ (x =edan U =lbuu U z -=van )), (6.6) in conjunction with the functional dependency in ITEMS from item# to ijtype

75 ITEMS item# name itype The following holds: C6O white 7 paint N11 squ. 11" nut P02 distribu. engin DEF (DB, It (z,w,t )). P03 radiator engin S01 In. 8080 elect. S02 battery elect. X89 iron 9" plate DEF(DB, It (z,w,t )). B47 Eland bus V01 Astre sedan V03 Camaro sedan W0O Brat van Figure 6.1. Derivation of a VDA axiom Vz vy Vz y ' Vz '(t (z,y,z)n It (z,y,,z ) - z=z '). (6.7) Here (6.5) is an IA axiom, (6.6), a GDA axiom and (6.7), a FDA axiom. The IA axiom (6.5), the FDA axiom (6.7) and two GDA axioms, one for R in (6.6) and the other for P in (6.1), imply that the unary predicate P which was once defined in terms of constants can now be defined in terms of the predicate It. That is, from (6.1), (6.5), (6.6) and (6.7), it follows that pDB = DEF(DB, 3w 3tm It(z,w,t'r)) [cf. Figure 6.1]. The meaning of P can now be expressed as Vz (P (z ) 3ut 3tR It(z,w,tR)). (6.8) (6.8) is a VDA axiom that is therefore an element of KB Now it is illustrated how an IA axiom in KBr_ in conjunction with a VDA axiom and a RA axiom leads to another IA axiom that is equivalent to the IA axiom.

70 The IA axiom (6.4) and the VDA axiom (6.8) are considered. If the aggregate variable zr1 shown in the antecedent of the IA axiom (6.4) is unraveled, (6.4) is equivalently expressed as vy(: zSa (s(z,y,z) nP(z)) -A u vQDe(y,u,vQ)). (6.9) Then (6.9) and the VDA axiom (6.8) suggest a way to provide an axiom that is equivalent to (6.4). That is, P (z) in (6.9) may simply be replaced by 3w 3ItR It (z,W,tR) without changing its meaning as long as the equivalence of these two expressions are defined in terms of the VDA axiom (6.8). However, the replacement of such unary predicate by using a VDA axiom should not be made unless there is a RA axiom in KB which is called the "relevant" axiom to doing so. The notion of a "relevant" RA axiom is introduced as follows: Let K be an IA axiom describing a relationship between some subsets of two relations, say R1 and R2, of the form K = Vz Vz. (t('(l, *-',2z) n Rl(zi,, ) - 3z1 * * * 3k R 2(z1., * * *, Ztk)) where {zx, -.*,4e *{,,.} and x{zi, - z,,,, - * ', x} = ({z, I ', z2 }. Let z, be an aggregate variable whose range is restricted by a unary predicate, say P, where zJ, E {z,,.,z} and xj,, {z,, * -,Za}, and let this aggregate variable be unraveled as follows: Kwe Z x, *s Yo, (nl (ol, a *n,r )naRt(el, *b-,Lzt )n P(tV) izl *.. - z3 RY2(.,1, --,, '.z, z)) where 2j, is now no longer an aggregate variable. Let there be a VDA axiom of

77 the form, v,, (P(X,) 3,, - * y* * 3R(y,...-, v+1)) where {zj, y,, i,} = {y,, ''', yv+.} ~ Then it is said that a RA axiom of the form 3u,,,, * * ~ 3,w * *, v+ (Rl(ui,, *,, U,)n Rs(U,, *,,,+) n (U,,=Wug)) is the relevant RA axiom to the replacement of P (x,) in K by YV.l * * V 8,Rs(yi, * *, Yv+), where ui, E {uj, J* *, u1, w E {w, * *, +l} and wr, is the z, E {yi, * *, yv+i}. When replacing P(z,) in K' by:3yl *. * 3y,R(v *', yv,+l), the relevant RA axiom is needed because the presence of the relevant RA axiom implies the resulting formula obtained by the replacement would be useful. By definition, the RA axiom of the preceding form means join of R1 and R3 over the attribute indicated by z, is meaningful. This implies that queries including the join of R and R? over the attribute indicated by zj are meaningful, which therefore means the resulting formula obtained by the replacement can be used to restrict such queries. The use of relevant RA axiom is illustrated in the following. Let the join of the two relations SALES and ITEMS via item# be meaningful in the sense that queries including the join of the two relations on item# are meaningful. This relationship is expressed as 3z 3y 3z 3tu lw 3t ( Sa (x,y,z)n It (Iu,w,t)n z= ). (6.10) Here (6.10) is a RA axiom which is therefore an element of KB. Furthermore (6.10)

78 is the relevant RA axiom to the replacement of P(z) in (6.9) by:lw 3tR It(z,w,tR)) in (6.8). Replacing P(z) by tw 3tR It(z,w,t R )) rephrases (6.9) into the following form: Vy (3:z z -w 3t{R(Sa(:Z,,z) n It(z,w,tR)) —, 3U v:Q De(y,u,vrQ)). (6.11) (6.9) describes the same knowledge described by (6.4) in a different way "All the dealers who are supplied 'sedan', 'bus' and 'van' are the car dealers". Here, (6.11) is again an IA axiom that was intended to be derived. (6.11) is an element of KBrH. At the end of Section 7.1, it will be illustrated how the inclusion of the new IA axioms such as (6.11) enlarges the class of queries to be handled by the KBS. It is noticed that although KBur contains only IA axioms (which are all E-Horn formulas), the other types of axioms have been indirectly embedded in the construction of KBV,.

CHAPTER VII INFERENCE PROCEDURE 7.1. Inference Procedure In this section, it is described how the knowledge about the data, i.e., KBr, is applied to the queries, i.e., query clauses in E-normal form, in order to lead to an equivalent query clause which shows more precise URC's than does the original query. This process is done by the inference procedure of the KBS. Let the inference procedure be abbreviated by the symbol " -". Then " | —>" requires some preliminary steps to be made for the formulas in KBW and the query expressions in E-normal form. Each formula in KBrX is converted into an existential quantifier free form by the process known as Skolemization. Once the Skolemization step is performed, all the quantifiers can be omitted from the formulas in KBr and the query expressions. This is possible because the formulas in KBsH are only universally quantified and the query expressions are only existentially quantified. The step converting each formula in KBr_ into an existential quantifier free form is described in detail. First, the formulas in KBrH are converted into the logically equivalent prenex normal forms. Then the prenex normal forms are converted into existential quantifier free forms by the usual procedure called Skolemization. Here the Skolemization for the formulas of LE differs from the ordinary Skolemiza79

80 tion only by the fact that when a Skolem function is introduced, its range must be restricted to a unary relation which is the same as the range of the variable to be replaced by the function. An example illustrates the Skolemization process for a formula in LE: Example 7.1.1 Consider the IA axiom of (6.4). Let this axiom be b, i= Vy ( z:3z ( Sa (,y,zr E) - 3u 3v(Q De(y,u,v )). The logically equivalent prenex normal form of 0 is Vy VzV r 3 3vQ (s (S(,x,yz ) - De (y,u,vE )). Its Skolemized form is Vy Vz VzP (S (Sa,,zr ) - De (,g (y,,zP ),f (,,zP ))), where g (y,z,z ) and fQ (y,z,z ) are the Skolem functions replaced for the variables u and vQ, respectively. It is noticed that the range of fQ(y,X,zz) is denoted by the superscript Q which is the range of v. The Skolemization step needs no justification as long as the Skolemized formulas are equivalent to the formulas prior to Skolemization. Once the Skolemization step is completed, " I- " manipulates only the matrices of the Skolemized formulas in KBrH with the matrices of the existentially closed query expressions. As stated previously, as long as all the formulas in KBr8 are only universally quantified and all the query expressions are only existentially quantified, the presence of the quantif

81 iers can be made implicit in the symbol manipulation process " — ". In order to describe " I —", two notions, namely, "match" and "restrictable", need to be defined. Before defining these notions, a few notations are first introduced in the following: After all the formulas of KBrM are Skolemized and their universal quantifiers are stripped off, let the resulting set of matrices be denoted KBt. After the existential quantifiers are stripped off from all the queries concerned, let the resulting set of the query clauses be denoted by Q'. For q E Q, let SUB(q) stand for the collection of all the subformulas of q. Each formula in SUB (q) is again a conjunction of atomic formulas. Since any formula in KBr,, is of IA schema, it follows that any formula in KB". is of the form - R (t1, **,t,) where t is a conjunction of atomic formulas and R (t 1, *, t ) is an atomic formula with R being a relation predicate and some of the terms among t, * * *, t being Skolem functions [IA schema is defined in such a way that there is at least one existentially quantified variable in the consequent. See (ii) of Section 6.1]. This formality is used in defining the two notions, "match" and "restrictable". The notions of "match" and "restrictable" are the following. For q E Q" and Kj EKB n with Kj of the form j -- R(t1, -,t,),some q, E SUB(q) that does not include the predicate R in it matches (or "is matched bfy) Kj if the two following conditions are satisfied: (1) A predicate symbol is in q, if and only if the same predicate symbol is in. (2) DEF (DB, q,) C DEF (DB,j ). A query clause q E Q/m is restrictable by an element K, E KBan if the following two conditions are satisfied:

82 (1) There is q, E SUB(q) which matches K. (2) For the consequent R(tl, * *, t,) of K,, there is an atomic formula Rq(tf, *, t,) in q such that (i) R and R, are identical relation predicates, and (ii) there is a variable tlE {(t,, tn} and a Skolem function tt E {tl, *, t, such that Ran(t ) C Ran(tiQ), where by Ran(t) is meant the range of the outermost symbol of the term. Here (R, Rq) in (2) is called a restriction pair associated with q, and Kj in (1). If q is restrictable by Kj, it is said that q is restricted by Kj using the following process: for each restriction pair (R, Rq), if the variable tlq in R, and the Skolem function to in R satisfy the relationship Ran (t) C Ran (t11), then substitute til by a variable v whose range Ran(v) = Ran(t,). Here tl~ in Rq is called the corresponding variable of tt in R. The restricted q is denoted by q I K. Now the inference procedure "' — >" is introduced in the following: Inference Procedure " I — " Step 1 Let q' = q, W = SUB(q), and go to Step 2. Step 2 Let q, be an element of W, and go to Step 3. Step 3 Let MATCH(q,) be all the formulas in KB"H which match q. If MATCH(q,) is empty, go to Step 5; otherwise go to Step 4. Step 4 Do while MATCH(q,) is not empty, 1. let Kj be an element of MATCH (q,), 2. MATCH( {q,) = MATCH(,) - K,, and

83 3. let q' = q | Kj only if q ' is restrictable by K; and go to Step 5. Step 5 Let W = W -,. If W is empty, stop; otherwise, go to Step 2. The preceding procedure always stops at Step 5 either (i) with q being a revised version of q E Q' if there was any element in KBx which restricted q, or (ii) with q' being identical with q otherwise. The complexity of the preceding procedure is discussed in the following. The following notations are used: n the size of KB'H. r the number of atomic formulas in q 6(i) the number of atomic formulas in the it subformula q, of q X(i) the number of free variables in the i t subformula q, of q Ak the size of the range of the kth free variable, 1 < k < X(i), in the i t subformula q, of q v1(j) the number of free variables in the antecedent of the j knowledge K, in KB ".H B1' the size of the range of the I free variable, 1 < I < rl(j), in the antecedent of the j^ knowledge Kj in KB". ((j) the number of the Skolem functions in the consequent of the j t knowledge Kj in KBWr. Cm, the size of the range of the mth Skolem function, 1 < m < i(j), in the consequent of the ja knowledge K, in KBJ,.

84 Dm the size of the range of the corresponding variable in RI of the mth Skolem function in the consequent of the jth knowledge K, in KB J where R' is the atomic formula with which the consequent of Kj constitutes a restriction pair. At Step 1, the total number of possible subformulas of q is 2' - 1, i.e., for the set W of subformulas of q its size | W == 2' - 1. This means that the outermost loop (i.e., Step 2 -- Step 5 -_ Step 2) of the procedure is repeatedly carried out as many times as 2' - 1 at most. At Step 3, finding MATCH(q,) entails comparing each member of KB?_ with q,. This comparison consists of two types of testings. First, for each element, say the jit member K, of KBJ, it needs to determine whether all the predicate symbols of q, are in the antecedent,J of Kj and no other predicate is in tj. Since both q, and AJ do not contain any predicate symbol more than once in their expressions, the worst case of determining the preceding condition is when q, and bJ, both have 6(i) atomic formulas. Determining the preceding condition requires comparisons of no more than 6(i )2. Second, for each knowledge satisfying the preceding condition, say Kj, it needs to determine between q, and the antecedent by of Kj whether x(i) DEF(DB(Auto), q,) C DEF(DB(Auto),,i). Since | DEF(DB(Auto), gq) i < n Ak k == and DEF(DB(Auto), j)l I< In B', this determination requires comparisons of no more than

85 x(i) X(i) x(~) x(,) x(') x(I) n A;(log n A) + B(n B(logn Bj) + Maz ( tAt, n Bi) =s1 k=1 k== =li k=1 [notice that both q, and t, of Kj have X(i)(= r(j)) free variables]. Let the preceding term be abbreviated by P(i, j). Since q, is compared with each individual in KBrff, the overall complexity of generating MATCH(q,) at Step 3 requires comparisons of at most (6(Si)2 + pi, i)) 1j=1 At Step 4, the Do-while loop is processed as many times as MATCH(q, ). The restriction step, i.e., (3) in the Do-while loop, requires to test Ran (t) C Ran (t ) where tm is the mth Skolem function, 1 < m < (j), in the consequent of K, and t, is its corresponding variable in q. This test requires comparisons of at most ~~ C' D. O-i Since the Do-while loop is processed as many times as IMATCH(q,)I, and | MATCH(q,) | is at most n [ I MATCH(q,) | - n is the case when all the members of KB.J match q, ], the total number of comparisons at Step 4 is no more than j -ig =1 Hence overall the total number of comparisons to be made in the procedure " — +" is no more than 2'-1.) S E (i)2 + P (i. j )) + I cI D! I. i n34 jJ=1 j =lm Let L stand for the number of atomic formulas in the longest possible query in

86 QC. Then for any i, 1 i < 2' - 1, 6(i) < L. Let M stand for the size of the largest sort domain of the given database application structure DB. Then for any i, 1 < i <2' -1, and k, 1 < k < X(i), At < M and for any j, I < ji n,and I, 1 I < t7(j), B' M < M. Also for any j, 1 < j < n,and m, 1 < m < ((j), C) < M and DJ < M. Let K be the largest possible value for X(i) and f(j) where 1 < i < 2' -1 and 1 j < n. The following relationship holds: X(i) X\() X(i) X(i) X(i) X(S) P(i, j)- A (log n Al) + l Bj(log n B/)+ Maz( n A,H n B ) < MX('logMX(') + MX(5)logMX() + M(' ). Using O-notation the overall complexity of " | —" is concluded as follows: E 1 S (6(i)2+ P(i, j) + P CD 1 i li-l m =4 2' -1 n < E, I L2 + MK(2logMK + 1 )+ M2K < [ L2 + MK( 2ogMK + 1 )+ M2K] 0(2')O(n) = M'O (2')0 (n) for some constant M'. Although the overall complexity includes the exponentially growing term 0 (2'), it is expected that 0 (2') is limited to a certain constant value since in most cases a query does not involve more than 3 or 4 atomic formulas. Therefore, if the 0 (2 ) term is replaced by some constant, say C, and if C' = CM' for some constant C', then the overall complexity of | —> is

87 ' O(n) where n is the size of the knowledge base KB'n. The procedure " — " produces two results: if " —+" applies KB~, to q E Q" to produce q' [from here on, the whole procedure will be abbreviated by KB;; q I — q' ], then (i) q' is "equivalent" to q and (ii) if q' y q then q shows "more precise" URC's than q. By the equivalence between q9 and q it is meant that DEF(DB,gq) =DEF(DB,q'). Let q and q' be RIn ** n R, and R n.** n R,', respectively, where R, and R,', 1 < i < m, are atomic formulas of LE. Then according to Definition 5.2.3, the URC's identified by q and q' are U DEF'(DB, q) and E{(1,-,m} U DEF'(DB,, ), respectively, where for each i DEF' (DB, q) E(Il,,m} {DEF(DB,R,)} and DEF'(DB,q')= {DEF(DB,Ri,)}. By the fact that q shows more precise URC's than q it is meant the following relationships hold: (i) for no j, 1 j < m, DEF(DB, Rj) q DEF(DB, R'), and (ii) for some i 1 < i < m, DEF(DB, R,) C DEF(DB, Ri). Showing the equivalence of q' and q in a formal way is the content of Section 7.2. In the following it is first demonstrated that qg shows more precise URC's than q along with illustrating " — +" by an example. Example 7.1.2 Let the query q in Example 5.2.2 be concerned with, l-= 3y ~y 3 v (Sa (,,Zt) n De (y,u,v)). (7.1)

88 Here it is shown how the IA axiom (6.4) in KBWH is applied to ql of (7.1) to derive q in a purely syntactic way by " — ". In Example 7.1.1, it has been shown that the IA axiom (6.4) can be Skolemized into the following form: Vy Vz Vtz (Sa (,y,z") -, De (y,g(y,z,z ),fQ (,zz ))), (7.2) where g (y,z,zP) and f/ (,,z,zs ) are Skolem functions. (7.2) clearly shows how the relations SALES and DEALERS are related fragment by fragment, namely, CARSALES of SALES and CARJDEALERS of DEALERS. Now let the matrices of (7.1) and (7.2) be q and K, respectively. The followings hold: Sa(z,y,zEt) in q matches Kj, since for the antecedent Sa(z,y,zcP) of Kj it holds that (i) the predicate symbol Sa in q, is the only predicate symbol in the antecedent of K, and (ii) DEF(DB(Auto),Sa(z,y,zE:)) DEF(DB(Auto), Sa(z,y,zrP )); and q is restrictable by K, since there is a restriction pair ( De (y,g (y,z,z} ),fQ (y,,z )), De (y,u,v) ) where Ran (fQ (y,z,z )) C Ran (v). Therefore, by substituting v in De(y,u,v) by the variable w Q whose range is identical to that of fQ (v,z,z )), q is concluded to be q '=Sa (z,y,z ) n De (y,u,wQO). (7.3) The URC's indicated by q' in (7.3) are the set of the defined relations DEF(DB(Auto),Sa(z,y,z"J)) and DEF(DB(Auto),De(y,u,w:Q)). When these are compared with the URC's indicated by q, i.e., the set of the defined relations DEF (DB (Auto),Sa (z,y,z' )) and DEF(DB (Auto ),De (y,u,v)), it is clear that q' shows more precise URC's than does q (see Figure 7.1).

89 The URC's From g The URC's From q' DEF (DB (Auto ),Sa (z,y,zE )) DEF (DB (Auto ),Sa (,y,z')) div# item# div# d# item# 01AP 01A V01 O1AP 01A V01 02AP 01A B47 02AP 01A B47 04AP 01A V03 04AP 01A V03 05AP 55L V03 05AP 55L V03 and and DEF(DB (Auto ),De (y,u,v)) DEF (DB (Auto ),De (y,u,wQ )) d# address d_type d# address d_type O1A Ann Arbor 51 01A Ann Arbor 51 03A Dearborn 30 07A Flint 50 07A Flint 50 55L Flint 51 26M Cleveland 20 33B Cleveland 30 48B Rockford 31 55L Flint 51 65B Detroit 20 66L Nile 23 70A Lansing 70 Figure 7.1. The URC's Revealed to the Relations SALES and DEALERS

90 The result illustrated in the preceding example is formalized as follows: Theorem 7.1.1 If KB; q — + q' and q' 7 q, then the following relationships hold between U DEF'(DB, q) and U DEF'(DB,q'): (i) for no j, iE{l.,u)} ({1,,m) 1 j <m, DEF(DB, Rj) DEF(DB, R), and (ii) for some i, 1< i m DEF(DB, Ri,) C DEF(DB, R). Proof. When q is restricted by a formula in KB"J, only the following type of modifications is made on q:a variable, say z, in q is replaced by some variable, say w, satisfying the condition Ran(w) C Ran (v). The theorem follows immediately. Q.E.D. Having introduced the inference procedure " -+", it can be pointed out more clearly what advantages are obtained by using LI as the tool for describing queries and the knowledge about the data. This can be discussed in the context of what problems could have occurred if the queries and the knowledge about the data were expressed in an ordinary many-sorted language (L,). When the queries and the knowledge about the data are expressed in L^, symbolic manipulation of these two objects entails extra computation which is unnecessary if these two objects are expressed in LE. Such extra computation is caused by the loss of "a form of meta knowledge" which otherwise is embedded and maintained in the aggregate variables of L. The query ql of (7.1) and the IA axiom, say K, of (6.4) are considered.

91 q1 = 3 3y 3 vM 3v (Sa (,y,zE) n De (y,u,v)). K = Vy (3 t3zC Sa(x,y,zP') - 3u 3v Q De(y,u,vtQ)). Let the aggregate variables in the two formulas qI and K are unraveled into relativized expressions in L,. Let q and Km be the resulting relativized expressions equivalent to q and K, respectively. = y 3 z 3 3 v (Sa (z,y,z) n '(z) n De (y,u,v)), K - V ( V 3 3 z(Sa(z,y,z) n P(z)) -. 3u 3v (De(y,u,v) n Q (v))), where the symbol * is used to designate that the atomic formulas with * are exclusively used for the purpose of variable range restriction. For convenience, from here on the following convention is made: although, strictly speaking, the range of a variable in the relativized expressions, such as z in qa, is the sort to which the variable belongs, by the range of such a variable it will be meant the relation indicated by the atomic formula which has the variable as its only argument and is superscripted with *. The two formulas ql and ql are considered. In ql the range of zi' in Sa (z,,zTE) is determinable as the relation J from the variable itself since zEt itself contains the information about its own range. In contrast with this, in q the range of z in Sa (z,y,z) can not be determined as the relation I unless it is tested whether there is an atomic formula with * in q " which has z as its only argument and whose predicate symbol designates the relation J, i.e., J' (z). When aggregate variables are unraveled into relativized expressions, such range determination test becomes necessary since unlike the aggregate variables the variables in the relativized expressions no longer contain the range restriction information on the

92 variables themselves. Similar argument can be applied to zrP and vEQ of K and z and v of K". What has been argued in the preceding paragraph is discussed in detail. It is shown why the range restriction test means extra computation in the symbolic manipulation. First it is formally stated how a formula in LE is expressed in terms of relativized expression in L.. Let a formula oE in Lr be as 3 z EP 2 P Then ac is translated into a, in L, as follows: a, = 3z ( __ n P'(z)). In a, the symbol * is used as an aid to provide notational convenience, i.e., to indicate that the atomic formulas with * are exclusively used for the purpose of variable range restriction. Let the queries and the knowledge about the data which were expressed in LS be expressed by the relativized expressions in L. Let an inference procedure, namely - |I-m ", be developed which is applicable to the queries and the knowledge expressed in L.. Let " |-Ji " consist of five steps each of whose function is identical with its corresponding step of "I-+." Let the superscript m be used to indicate various notation used in "I —+ " so that the notations used in " I —" E can be distinguished from its corresponding notations used in " —," for instance, q and K" are now the formulas in L.. The inference procedure "1- m " is the following:

93 Inference Procedure " —"m " Step 1' Let q = q ', W" = SUB (q ), and go to Step 2. Step 2' Let q," be an element of W", and go to Step 3". Step 3' Let MATCH"(q,") be all the formulas in KB. which match q,'. If MATCH" (q,') is empty, go to Step 5"; otherwise go to Step 4. Step 4" Do while MATCH (qm) is not empty, 1. let K" be an element of MATCH" (q,"), 2. MATCH(q,) =MATCHm(q,) - K"',and 3. let q = q KJ" only if q is restrictable by K,'; and go to Step 5'. Step 5" Let W"' = W" - q,. If Wm is empty, stop; otherwise, go to Step 2". " I — " is different from " -+" by the following: Let q," be the subformula of q" which does not include any atomic formulas with *. Then at Step 1" SUB' (q") is constructed by including only all the subformulas of q,. Doing so is appropriate since the atomic formulas with * in q" are irrelevant to constructing MATCH'(q,') of Step 3". At Step 3", when MATCH"'(q,") is constructed the presence of the atomic formulas with * is ignored in q," and each member of KB`H. When determining whether all the predicate symbols of q," are in the antecedent si" of K, and no other predicate is in o1r, atomic formulas with * need not to be considered since their usage has nothing to do with determining what relations are involved in q," and,m.

94 The complexity of " — " " is discussed. Let the following notations be additionally introduced: the number of atomic formulas with * in q" a(i) the number of atomic formulas with * in the i subformula q," A(j) the number of atomic formulas with * in the antecedent of the jth knowledge K, in KBn". y(j) the number of atomic formulas with * in the consequent of the ji^ knowledge K, in KB." At Step 1", since the total number of possible subformulas of q' is also 2' - 1, | W = 2' - 1. This means that the outermost loop (i.e., Step 2" - Step 5" -- Step 2") of the procedure ".-+m, is also repeatedly carried out as many times as 2' - 1 at most. At Step 3", since the presence of the atomic formulas with * is ignored in constructing MATCH" (q,"), determining whether all the predicate symbols of q," are in the antecedent ~, of K" and no other predicate is in sb requires the same complexity as that of " —," i.e., 6(i)2. However, after the preceding condition has been tested, when determining whether DEF (DB (Auto), qm) C DEF(DB(Auto), ^sm) additional comparisons are needed. Determining DEF(DB (Auto ), q,) C DEF(DB (Auto),,j) requires to know the ranges of the variables of q,^ and l". Since the ranges of these variables are specified in terms of the atomic formulas with *, what has been called range determination test must be made for the variables in q," and tm, i.e., the ranges of the

95 variables in q,^ and gm must be derived from the set of atomic formulas with * in q,m and the set of atomic formulas with * in tPm, respectively. Since the number of the atomic formulas with * in q,m is a(i) and the number of variable in qim is X(i), determining the ranges of the variables in q,m requires comparisons of no more than () () a(i)(i r, rf=l Similarly, determining the ranges of the variables of sum requires comparisons of no more than OMj) = (j) O M - (j)(J). Once the ranges of the variables in q, and tr" are determined, the complexity of determining whether DEF(DB(Auto), q,) C DEF(DB(Auto),,j) is identical with that of Step 3 of " j-+." Thus the overall complexity of generating MATCHm (q,m) at Step 3m requires comparisons of at most S (i) + (i)^(i) + #(j)(j) + P(i j)) J -=1 Range determination test is also needed when restriction is made at (3) of Step 4", i.e., ranges of the terms in the consequent of K,m and their corresponding variables in q' need to be determined. Since the consequent of K' and q " have at most (j ) and e atomic formulas with *, respectively, the test requires comparisons of no more than rif~)+e.

96 Thus the complexity of (3) of Step 4" is at most ~ ^ ((-j) + + CD ). m=1 Let N be Maz(N1, N2) where N1 is the number of atomic formulas with * in qm and N2 is the largest possible number of atomic formulas with * in the antecedent l"' of any K" E KB'. Then for any i, 1 < i < 2 - 1, a(i)< N, for any j, 1 j n, (j)< N and (j) < N, and e<N. The overall complexity of" I-~ " is concluded as follows: 2"-1 a " % S S ((i)2+ a(i)x(i) + 6(j)l() + P(i, j) )+ S (-y) + e + C ) '-rl A jl==m=l 2-1 P < 3 S I L2 + 2NK + MK( 2ogMK + ) + (2N + M2 )K. =1 J ==1 It has been shown previously that the complexity of " A-" which corresponds to the preceding complexity of " |-+m, is 2'-1 a S E IL2 + MK(21ogMK + 1)+ M2K. Therefore, it is concluded that the complexity of "I —m n is augmented by 2' -1 E 1 4NK I. i Jol j ==1 The preceding term signifies how much extra computation is entailed in | -m " which is unnecessary in " i-." The amount of extra computation depends on the database, the queries and the knowledge about the data. At the end of Section 6.2, it has been mentioned that KB.H is expanded by a class of equivalent axioms to the IA axioms in KB to enlarge the class of queries to be handled by the KBS. Finally, in the rest of the section it is illustrated how the

97 KB r expanded by the equivalent axioms is applied to a query which otherwise may not be applied to. Suppose the user query q1 in (5.2) had been equivalently given as q2 q2= 3z w 3 s3t 3z 3 v ( It (z,w,t ) n Sa (,y,z) n De(y,u,v) ) where Vz (S(z) (z =sedan U z =bus)). Let q be the matrix of the existential closure of q2, q = It (z,w,t) n Sa (z,y,z) n De (y,u,v). (7.4) Then no subformula of q matches the IA axiom (7.2) although It(z,w,t-s) n Sa(2,y,z) of (7.4) "semantically" matches (7.2) in the sense that DEF(DB(Auto), 3w 3tr s It (z,w,t) n Sa (,y,z )) C DEF(DB (Auto ),Sa (,y,zP )). However, the revised version, say q, of q in (7.4) can still be derived by using the IA axiom (6.11) which was previously shown equivalent to (7.2). First, (6.11) is Skolemized into Vy Vz Vz Vw Vt' (Sa (z,,yz) n t (z,w,t, ) _De (y, (y,Z,W,tzR),If (y,',,2,W,t, ))), where g(y,z,w,tr ) and f (y,z,z,w,tR) are the Skolem functions replaced for the variables u and vEQ of (6.11), respectively. Then similar procedure can be applied to (7.5) and (7.4), as had been done for (7.2) and the matrix of (7.1), to conclude q ', q = It (z,w,t) n Sa (,y,z) n De (y,u,wQ). (7.6) It is clear that (7.6) shows more precise URC's than (7.4).

98 7.2. Correctness of the Inference Procedure In general, for any symbolic manipulation procedure designed to carry out inference, it must to be justified whether the result obtained syntactically is indeed valid semantically. For q E Q ', let KBn; q 1 — q'. What matters is whether the revised query q 'is equivalent to the original query q. In this section the issue of equivalence between the revised query and the original query is discussed. As a preliminary step a lemma is first presented. The following notations are used in the lemma and elsewhere in this section: For a formula a(zx, '* *, n ), let <a, ~*, a, > stand for a variable assignment such that 1=B a(z 1, - -, z) [a I, aj *. For such assignment <al,, an >, a, DB is called an assignment element, or just an element, corresponding to,. Then by <a1,,a >lI z,, ij E{i,, ',i,, it is meant the element a,, which corresponds to z,. For a subformula (z,,. z,) of a(z, "', ), {il,. ' *,i {1, *,n}, <a1, ', >,> stands for the subassignment <a,1,,a, > of <a,., a, > such that k=DB X('lk * ia.)ail **a * aSlLemma 7.2.1 For q E Q, let KB2; q H- ' q. For some assignment <al, ' m> if I=DB q [a1,, amO then there is an assignment <a, * *, a> satisfying =OB q[ a, * *, al. Proof. The inference procedure " 1-+" is a process of revising the input query q to another form by applying each element in KBJH until q can no longer be

99 revised. Therefore it suffices to show that for each Kj E KB n if Kj; q — + q then the lemma holds. Let Kj; q -- q'. Without loss of generality, let Kj ' be of the following form which is the original form of Kj before it is Skolemized and its quantifiers are dropped: K, ' == vzl.. Vz z (b,(,-, -, z,,)- 3i-'. 3z R(Z, - -,,, za,k -, )), where {z,, * *, z,)} {zl,, z, }. Let q and q' which both have m, n + k < m,free variables be expressed as q (y,, y, ) and q'(y,,yV), respectively. Let some q,(y,1,^, vs ) E SUB({q), {Vy,,,,. y} C {y,,, }, match K,,i.e., q, and iJ include an identical set of predicate symbols and DEF(DB, q,) C DEF(DB, O,). The proof is shown in a constructive way. For an assignment <a, a,,,>, let it hold that q [al, *, al. Since q, matches K,, it follows that e8 ' (<al, * *a *,a. >|qil - Since K, is true in DB, it follows that there is an assignment, say, <dl, * * *, d,el, * *, e >, satisfying tr= D& nf R dl,, * *, d, c, *,, and <dl, - ', d,,, ***,e >.j= - <dl, * -,d,> - <al, * * *,a,> l q. Let an atomic formula, say R (yl,, * ~, yv, *V *, v ), {lu, * *,v'y ', ^', v } C {,.,V^ }, in q and the consequent

100 R(z1, ' ''h Z, z I., i) in Kj ' constitute the restriction pair (R, R,) associated with q, and K,. For some {y,l,, y,} C {y,,.., y,}) in R~ and some {(z,., z,,} C {z, -, z} in R, let Ran(y,) C Ran(z,), 1 < r < I, which therefore means q' is obtained by restricting q by K, in the following way: For each r, 1 r < I, substitute y~, by a variable v, whose range Ran (v,) = Ran (z,). Accordingly, a new assignment <a;,, a,, > can be constructed from <al, *, a.> and <dl, *,d,e,, I, e> in the following way: For each r, 1< r < l, the element <al,,a > y, in <al,, a^ > is replaced by the element <d,,d, e,, e, e' > I z. Let the resulting <a1, * a,> be <a;, *, a >. From the way q is obtained by restricting q by K, and from the way <a;, *-, a, > is constructed from <a1, * > and <dl, *,d,e,, >, it follows that t=8 [ la;, ***,. as Q.E.D. The preceding lemma can be said to signify the soundness of" [ —" in the following sense: if q, and qC are the existential closures of q and q, respectively, then KB,; q — > q' implies KBE1c U {qc} t= q. The soundness of "- |-" can be easily understood by the following argument: as long as it is known that all the dealers who are supplied cars are car dealers and that there are some dealers who have been supplied items B47, V01, or V03 which are cars, then it is valid to conclude that there are some dealers who are car dealers.

101 Lemma 7.2.1 is used in showing the equivalence of q' and q. The issue of the equivalence of the revised query q' and the original query q is directly related to why only the IA axioms having its corresponding FDA axioms in KB is included in KB r when constructing KBWr_ from KB. Before showing their equivalence in a formal way, the role of FDA axioms in their equivalence is first illustrated in the following. Consider the functional dependency axiom in DEALERS from d# to dtype, Vz Vy Vz Vy ' Vz '(De (z,y,z) n De (2,y ',z ') - z=z ), (7.7) and the IA axiom depicting a relationship between fractions of SALES and DEALERS, Vy (I z 3~z Sa(z,y,z=) _- 3u 3vQ De(y,u,v'Q)) (7.8) where P is meant by Vz (P(z) (z=B47 U =V01 U z-V03 U z=W09)) and Q, Vz (Q (z) t (- =50 U =51)). It is not difficult to see that only when (7.7) is combined with (7.8), (7.8) is interpreted as "All the dealers who are supplied items B47, V01, V03 or W09 are exclusively car dealers". (7.8) alone only asserts "Any dealer who is supplied items B47, V01, V03 or W09 is a. car dealer although that dealer may deal in other items". This implies that the IA axiom (7.8) requires the existence of (7.7) in the knowledge base to guarantee that the two fragments associated with the consequent of (7.8) are disjoint, i.e., DEF(DB(Auto), De (y,u,vEQ )) n DEF(DB (Auto), De (y,u,vE')) = where Vz(Q '(2) - Qx ()). The above argument can be more realistically illustrated by entering the tuple < 01A, Lansing, 80 > in DEALERS in which case the functional dependency in

102 DEALERS from d# to d_type no longer exists. Let the database with the new tuple be DB(Auto)'. It is clear that (7.8) is still valid both in DB(Auto) and in DB (Auto )', but (7.7) is valid only in DB (Auto). In this case, the following is clear: DEF(DB (Auto, De(y,u,v O)) n DEF(DB(Auto Y, De (y,u,,v )) = {< 01A, Lansing, 80 >}. The preceding argument is formalized by the following theorem. Theorem 7.2.2 Equivalence of q and q'. For q E Qc, let KB; q I —>q. Then DEF(DB,q)= DEF(DB,q'). Proof. For the same reason stated at the beginning of the proof of Lemma 7.2.1, it suffices to show that for each Kj E KB1 if Kj; q \-> q then DEF(DB,q ) DEF(DB,q '). Let K; q -- q'. Showing that DEF(DB,q) C DEF(DB,q) is trivial, because q' is a restricted version of q. Here it is only shown that DEF(DB,q) C DEF(DB,q') holds. Without loss of generality, let Kj ' be of the following form which is the original form of Ki before it is Skolemized and its quantifiers are dropped: Kj = Vz * * Vz, (3i1(Z,, z -, 3Zi **. 3Lzt R (z,,. * *,,..., * * )), where (z,,, * * * } {Z *, Z }. Let q and q which both have m n + k < m, free variables be expressed as q(y, l, *, Y) and q'(yl,,y ), respectively. Let some qi(y, * * *, y.) SUB(q), {yl, *.,, C} C {yI, ~* * *, )}, matches K,, i.e., q, and ij include an identical set of predicate symbols and

103 DEF(DB,, ) C DEF(DB,,, ). Let an atomic formula, say Rq(, * *,,,, * * * v,', {y,) * * *, (Yu, Yv''l * *, vY } C {i, * *,, }, in q and the consequent R(zl, * * * z l, Z * *, Z) in K, ' constitute the restriction pair (R, R,) associated with q, and Kj. For some {y,, *, y)} _ {y,,,, y} in R, and some {zl, *., z,} 5C {z, * *, z}) in R, let Ran,(y,) C Ran(z (), 1 < r < i, which therefore means q ' is obtained by restricting q by K, in the following way: For each r, 1 < r < l, substitute y,, by the variable v, whose range Ran (t,) = Ran (z, ). Let the FDA axiom corresponding to Kj be of the form Vs ' V*h Vst, — I * * Vt I ''* Vt (R(, (8,I I, 1,I t)n R(81,.,,, ', *- ).- t= t l., n * * n,=t,, ) * (1) where {wu, * *,w l} {1, * *, k. (1) states that there is a functional dependency in R from 9h, * *, s to t,, * *, tr. Let <al, *, a,m > E DEF(DB, q). In order to prove that DEF(DB, ) C DEF(DB, q') holds, it must be shown that <a, * *, a, > E DEF(DB q ') holds. This is shown by contradiction. First, from the hypothesis <a, *, a,> E DEF (DB, q) it is implied that 1= F q [(a,, * a,*1 *'" (2). By Lemma 7.2.1, (2) implies that there is an assignment <a1, * *, a, > satisfying t=8 q C, * *, a! *** (3). Since q' is a restricted version of q, (3) implies that Do 4q [a;, **., a,, ** * (4).

104 To prove by contradiction, let <a, ', a^ > f DEF(DB, q ) additionally hold. This additional hypothesis implies that b q ', *' *, *** (5). Now (3) and (5) are considered. From (3), (5) and that the way <aI, * *, am > of (3) is constructed [see the proof of Lemma 7.2.1], the following holds: <a, * *, a, > and <a1, *, a,> are identical except that for some w,, w6{, * *,,,I,,<al,,,, > <a;, * ' ', am > I,, Now <a1, *,a > in(4) is considered. Let <h;,,b, c, *,c b > be <al,, a, > RI (y.i,, * *. ^, yv,, * * *V y)- Then from (4) it follows that tD R, [b;,., b, c, **,c c 1 * * (7). Now <al, *, a,> in (2) is considered. Let <l, * *, b,cl,., ck> be <a1,.*, am > I Rq(y,., * *, tuI '', Yv*). Then from (2) it follows that I=DB Rq ibl, *-*, b,, *- *, * ** (8). From (6), (7) and (8), the followings are concluded: (i) <bi, ***, b> <bl, ***,i > =and (ii) for some w,,,, E { *, },w}, <bi, b* *, b,, * *,C> I y>, y <b., cb, C', * * *, Ck > I W, (i) and (ii) implies that there is no functional dependency in Rq from Yu,,, y, to yW,, * *, y. This further implies that there is no functional dependency in R from s8, * *,, to tw, * *, ta since Rq and R are identical predicates. This fact contradicts (1). Thus it follows that <a, *, am> E DEF(DB, q) that means DEF(DB,q) C DEF(DB,q') holds. Q.E.D.

105 7.3. Horizontal Partitioning In this section two issues are discussed: how the estimated URC's are used for partitioning the relations; and how the partitions of the relations obtained by this approach should be interpreted. The former issue is straightforward. The notion of a bipartition is first introduced: Given a revised query expression, say, q, let R (q ) be an atomic formula shown in q9. Let the bipartition of the relation RDB (q) obtained by R (q), denoted by nb (R (q ')), be defined n (R(q'))= DEF(DB, R ()) DEF(DB, R (q)} where DEF(DB, R( ')) - R DB (q - DEF(DB, R(q)). When KB1 U q H q', the revised query expression q' can then be viewed as a way of obtaining a bipartition of each relation referred to by q. That is, a relation being referred to by q' is divided into two fragments, one part DEF(DB, R(q')) that is needed to answer q' and the other DEF(DB, R (q)) that is not needed. The set of these two fragments is conceived as a bipartition of the relation RDB (q ). For instance, from the revised query expression q ' in (7.3) q =Sao(,y,zr) n De(y,u,wQ), a bipartition of DEALERS, namely, CARDEALERS and NON_CAR_DEALERS and a bipartition of SALES, namely, SALESJ and SALESI can be obtained. These two bipartitions are illustrated in Figure 7.2 of the following:

108 CARDEALERS SALES I d# address d_type div# d# item# 01A Ann Arbor 51 01AP 01A V01 07A Flint 50 02AP 01A B47 55L Flint 51 04AP 01A V03 05AP 55L V03 DEALERS SALES \L NON CAR_DEALERS SALES_I d# address d_type div# d# item# 03A Dearborn 30 OAP 07A W09 26M Cleveland 20 01PP 55L S01 33B Cleveland 30 01PP 07A P02 48B Rockford 31 02PP 03A P02 65B Detroit 20 03PP 01A P03 66L Nile 23 03PP 03A S02 70A I-Lansing 70 05PP 55L S02 Figure 7.2 Bipartitions of the Relations DEALERS and SALES.

107 It is noticed that the bipartition of DEALERS is not derivable from the original query expression q of (5.2), q- =: 3 3 t 3z' t3v (Sa (,y,z~r) n De(y,u,)). In fact, more than one query can conceivably refer to each specific relation in the database. Let Q (R) be the collection of the restricted versions of queries referring to the relation RDB. One way to obtain a partition of the relation RDB is to intersect the blocks of all the possible bipartitions each of which is obtained from a restricted query expression in Q *(R). In order to be more specific, the following notion is introduced: For two partitions n' and II2 of the relation R DB, nl m n2 { S: S= B, n Bj where B, E n' and B, E I2, and S y }. Since the commutativity and the associativity hold for m, let n' fl * * * imn be written by m II'. Formally stated, the partition, denoted by n(R), of the relation:E{1,.,) RDB obtainedfrom the given Q'(R) is (R )- m nb (R (q)). EQ ' (R) At a glance, the approach of intersecting all the possible bipartitions looks like a crude way of partitioning each relation in the database. However, this approach is meaningful in the sense that the partitions obtained from the revised query expressions by this approach is more refined than those obtained directly from the user provided query expressions. This further implies that when the fragments of the partitions are dispersed over the sites of a network, the fragments of the former partitions can be more flexibly distributed than those of the latter partitions. Various data allocation algorithms such as (MoLe77, IrKh79, Aper81, CeNW83] can be used

108 to determine an optimal or suboptimal dispersion of the data by treating the fragments as the unit objects of distribution. 7.4. Conclusions and Future Work A knowledge-based approach has been described in which URC's are derived from the user queries to the database and the knowledge about the data. In order to describe the user queries and the knowledge, ordinary many-sorted language is extended. In this extended language, the user queries are expressed in a specific form, called E-normal form, and the knowledge useful for this purpose is identified by five types of axiom schemas. The knowledge is applied to each query expression via an inference mechanism to derive a revised query expression. From the revised query expressions, URC's are estimated. Horizontal partitioning can be based on the estimated URC's. The work which has been shown so far can be further extended into three directions. One direction is to expand the knowledge base of the KBS by accommodating a larger class of knowledge. Possibly more knowledge is useful for the intended purpose. It can be represented in terms of different types of axiom schemas in Ls and, in such case, it is expected that the inference procedure (| I-) may have to be modified. A more sophisticated inference procedure may be required. The second direction is to study the problem of allocating the fragmented relations. Although the fragmented relations can be distributed over a network by adopting some of the currently known data allocation models, the allocation obtained from this approach may not reflect the logical intricacy among the fragments. That means, during the process of answering queries, the relationships among the frag

109 ments may not be used fully to reduce the unnecessary preselection or join operations that are the major benefits sought by distributing the partitioned fragments. This problem results because the currently known data allocation models do not take into account the logical relationships among the fragments as a design resource. A new data allocation model is needed that combines the relationships among the fragments with a quantitative optimization model. The third direction is to investigate a distributed query optimization based on the knowledge base. When the fragments are dispersed over the network, the logical relationship among the fragments can guide various query processing strategies including how preselection operations can be dispensed with, how useless join operation can be eliminated, and how parallel distributed query processing can be scheduled over a network. When their relationships are complex enough, their role in guiding the process of answering queries can be more than what the conventional data directories usually do. The logical relationships can constitute a metaknowledge base and it can be used in conjunction with the conventional data directory in an intelligent way to optimize processing the queries.

PART H In this part, a type of problem is first identified which may occur when a resolution scheme is applied to many-sorted theory. In order to avoid such a problem, an extension of the first-order language called one-sorted language with aggregate variables is introduced. It is shown that any many-sorted theory can be converted into an equivalent theory in a one-sorted language with aggregate variables. Aggregate variables allow the introduction of range-restricted variables dynamically in the structure which is expanded by definitions. This allows the introduction of a new resolution scheme named Unification over the Weakest Range (or UWR-resolution). The completeness of UWR-resolution is shown and the efficiency of UaR-resolution is discussed. 110

CHAPTER VIII A MANY-SORTED RESOLUTION BASED ON AN EXTENSION OF A ONE-SORTED LANGUAGE 8.1. Introduction Within the field of automatic theorem proving, the advantages of many-sorted logic are well known [Haye71, Hens72, Cohn83]. A language of many-sorted logic offers more compact expressive power than the corresponding language of one-sorted logic, and so a theory is expressed with a much smaller number of shorter clauses in the former than in the latter. When a resolution scheme is used, the smaller number of shorter clauses means a shorter refutation. Furthermore, the refutation sequence is further shortened when the sortal information is used as a metaknowledge preventing irrelevant resolvents from being generated. It was only recently that a theoretical foundation for many-sorted resolution was established by Walther [Walt83, Walt84a]. Walther presented a many-sorted calculus, called ERP-calculus, in which a resolution and the so-called weakening rule are employed as the inference rules of the system. He showed the completeness of ERP-calculus and also showed how the ERP-calculus is related to its corresponding one-sorted calculus. In his sequel paper, Walther also demonstrated the power of a many-sorted resolution by an example called "Schubert's streamroller" [Walt84b]. 111

112 However, when Walther's approach is applied to a certain class of many-sorted theories, his approach still generates irrelevant resolvents which degrade the overall deductive efficiency. The many-sorted theories falling in this class are those satisfying a certain relationship among the sorts. By an example, an illustration is given of what this relationship is and what irrelevant resolvents are generated. Example 8.1.1 Let b, Xc, Zd, and x, be the variables ranging over the sorts B, C D, and E, respectively, where D C B D C C, E C B, and E C C. The theory to be refuted is given by: (1) Vzb (P( b)U 3Zd Q (b, d)), (2) Vz, -P(2c), (3) VV, c, v Q (z,, ). If (1) and (2) are chosen as parent clauses to be resolved, because Zb of P(zx) in (1) and zc of - P (z ) in (2) are unifiablet over the sort D, and if (1) is expressed as P (b) U Q (zb, (Zb)) using a Skolem function fd (z) whose range is restricted to D, the two clauses can be resolved using a most general unifier (mgu) = { Myd/b, Yd/: } where yd ranges over the sort D. The resolvent then is (4) Q (yd (yd)) (1)+(2). It is now seen that (4) cannot be resolved with any other clauses, not even with (3) because there is no sort known as a subsort of D n E. A dead end has been t A variable v is unifiable with a term t over the sort S if there is a substitution 8 that unifies { v, t ), i.e., uO = t0, and the results of the instantiations v 9 and t9 are both terms of sort S.

113 reached. The unsatisfiability can be shown either by resolving P(Zb) in (1) and - P (zi) in (2) with a variable of sort E or by resolving Q (zb, fd (b)) in (1) and -, Q (z, z ) in (3) with a variable of sort E; (4) is a useless resolvent. Generating the types of useless resolvents illustrated in the preceding example can be avoided. Had there been another sort G = B n C, (1) and (2) could have been unified over the sort G giving the resolvent (4') Q(, f )) (1)+(2), where the variable z, ranges over the sort G. The clause (4') can be resolved further with (3) over the sort E resulting in the empty clause 0. There is no dead end here. There is a problem, however, that if the sort G is unavailable, the variable z, cannot be introduced in the middle of the deduction. In an ordinary many-sorted language, a variable cannot be introduced unless the range of the variable agrees with any of the a priori fixed sorts, which is a common problem often caused by the inflexible usage of an ordinary many-sorted languaget. To alleviate such a situation of the preceding example, an extension is proposed of the one-sorted language called one-sorted language with aggregate variables (L ) into which a many-sorted theory can be translated and which may be dynamically extended to bypass the problem illustrated previously. In Part I, aggregate variables were embedded into a many-sorted language resulting in the language called manysorted language with aggregate variable. Here aggregate variables are embedded in a one-sorted language. t Some discussion about the inflexible usage of many-sorted language is found in [Cohn83] in which Cohn suggested a way to improve the expressiveness of many-sorted logic.

114 Aggregate variables allow us the dynamic introduction of range-restricted variables without revising the a priori fixed structure. Using the dynamic rangerestricting nature of the aggregate variables, an efficient many-sorted resolution scheme named unification over the weakest range (UWR-resolution) is presented, which is designed to avoid generating useless resolvents as illustrated in the preceding example. There are two issues to be discussed, the completeness of UWRresolution and the efficiency of UWR-resolution. 8.2. Related Literature In general, automatic theorem proof systems are divided into two classes: the systems belonging to the first class start with a given set of logical formulas and create new formulas by using certain inference rules until a refutation is concluded. The systems belonging to the second class do not create any new formulas but test certain conditions ensuring unsatisfiability of the given set of formulas. The former includes resolution-based proof system, and the latter includes mating-based proof systems such as Andrew's mating calculus [Andr81] or Bibel's matrix calculus [Bibe81]. Here the only concern is with the resolution-based proof systems. In order to introduce some background for the resolution-based proof systems, the introductory statement by Davis and Putnam in [DaPu6O] is quoted: "The hope that mathematical methods employed in the investigation of formal logic would lead to purely computational methods for obtaining mathematical theorems goes back to Leibniz and has been revived by Peano around the turn of the century and by Hilbert's school in the 1920's. Hilbert, noting that all of classical mathematics could be formalized within quantification, declared that the problem of finding an algorithm for determining whether or not a given formula of quantification theory is valid was the central problem of mathematical logic. And indeed, at one time it seemed as if investigations of this 'decision' problem were on the verge of success. However, it was shown by Church and by Turing that such an algorithm cannot exist. This result led to considerable pessimism regarding the possibility of using modern digital computers in deciding significant mathematical questions.

115 However, recently there has been a revival of interest in the whole question. Specifically, it has been realized that while no 'decision procedure' exists for quantification theory there are many proof procedures available." An important contribution to the area of automatic theorem proving was made by Herbrand. Herbrand proposed in his thesis [Herb30] a deductive system that later turned out to be complete and far more efficient than other previously known deductive systems. The result, known as the Herbrand theorem, was later adopted further by many researchers and led to the invention of various proof procedures. Quine presented a proof procedure for quantification theory [Quin551, and Wang and Gilmore have each produced working programs that employ proof procedures in quantification theory. Although Quine's work was restricted to theoretic aspects of the proof procedure, Wang's and Gilmore's programs were actual working programs run on computing machines, which account for important initial contributions. Both Wang's and Gilmore's programs, however, were very inefficient due to the combinatorial explosion in determining the inconsistency of the given formula, although these methods are superior in many cases to truth table methods which are the crudest way of determining the inconsistency of the given formula. Both Wang's and Gilmore's programs run into difficulty with some fairly simple examples. Wang's and Gilmore's methods were improved a few months after their results were published by Davis and Putnam [DaPu60]. Davis and Putnam proposed a new way of determining the inconsistency of the given formula while avoiding the problem of the type that occurred in Gilmore's program. However, their improvement was still not enough. A major breakthrough was made by Robinson [Robios5a who introduced the so-called "resolution principle." His resolution-based proof system was much more

116 efficient than any earlier proof procedure. However, this system was still inefficient due to the many irrelevant and redundant formulas which were generated during the derivation of a refutation. Such pitfalls in Robinson's resolution triggered the creation of various refined forms of the resolution principle in the attempt to increase further its efficiency. Some of these refinements include hyper-resolution by Robinson [Robi65b], renameable resolution by Meltzer [Melt66], the set-of-support strategy by Wos, Robinson and Carson [WoRC651, all of which were later unified into semantic resolution by Slagle [Slag67]; lock resolution by Boyer [Boye71]; linear resolution, which was independently proposed by Loveland [Love70] and by Luckham [Luck70] and which was later strengthened by Anderson and Bledsoe [AnB170], Reiter [Reit7l], Loveland [Love72], and Kowalski and Kuehner [KoKu70]; and unit resolution by Wos, Carson, and Robinson [WoCR64] and Chang [Chan70]. Recently, researchers realized that the deductive efficiency of using the resolution can be improved significantly if the deduction is based on a many-sorted calculus along with incorporating the preceding types of refinements. Deduction based on a many-sorted calculus goes back to Herbrand [Herb30]. In his thesis Herbrand established the fact that the deduction based on a many-sorted logic is equivalent to the deduction based on its corresponding one-sorted logic. Since then, various forms of many-sorted calculus have been proposed and investigated by Schmidt [Schm38, SchmSl], Wang [Wang52], Hailperin [Hail57], and Idelson [Idels64]. Several researchers suggested some practical theorem proving programs based on a many-sorted calculus without sound theoretical foundation [Weyh77, BoNMo79]. It was only recently that a theoretical foundation for many-sorted resolution was reported by Walther and Cohn [Walt83, Cohn83, Walt84al. Walther presented a

117 many-sorted calculus based on resolution and paramodulation that is called ERPcalculus, and Cohn suggested a way to improve the expressiveness of a many-sorted logic in which a many-sorted resolution corresponding to Robinson's resolution is used as an inference rule. In his sequel paper, Walther demonstrated by an example the power of a many-sorted resolution [Walt84b]. 8.3. Organization The rest of Part II is organized in a way similar to Part I. In Chapter IX, Lo is introduced: syntax of L/, interpretation of L~, and the E-extensibility of L/. L then is used as the language for describing a many-sorted theory. In Chapter IX, it is first shown how a many-sorted theory is formalized in L. It is clarified that the only concern is with a certain class of many-sorted theories. The problem that was illustrated by an example in Section 8.1 is then formally described. In Chapter IX, the UWR-resolution is introduced and the completeness of the UWR-resolution is shown. In order to prove the completeness of UWR-resolution, in Section 11.2, the L -version of the Herbrand theorem is introduced. Finally in Chapter IX, the issues about the efficiency of the UWR-resolution are discussed. To discuss the efficiency, a hypothetic many-sorted resolution is introduced, namely E-resolution, that does not employ the technique of introducing a new sort dynamically as the resolution is being carried out. The efficiency of the UWRresolution is then measured by comparing the refutation of a given many-sorted theory generated by the UWR-resolution with that generated by the E-resolution.

118 In Appendix B, some intermediate steps needed to introduce the L ^-version of the Herbrand theorem are shown. In Appendix C, two complete refutations are shown that show the inconsistency of an example many-sorted theory. One is generated by the UWR-resolution and the other, by the E-resolution. In Appendix D, two alternative approaches are given which embody the idea of unifying a pair of variables satisfying a certain condition over the weakest possible range: (i) an approach in which the theory in a many-sorted language Lm is repeatedly translated into a revised language of L, along the way the refutation of the theory is carried out, and (ii) an approach in which the theory to be refuted is expressed in a generalized version of an ordinary many-sorted language whose variable sets and constant sets are not necessarily disjoint. In Appendix E, it is shown that the generalized version of an ordinary many-sorted language which was introduced in Appendix D is as legitimate as the ordinary many-sorted language.

CHAPTER IX ONE-SORTED LANGUAGE WITH AGGREGATE VARIABLES LI 9.1. Syntax of L~ Aggregate variables can be embedded in a one-sorted language as well as in a many-sorted language. When the aggregate variables are embedded in the former, it is called a one-sorted language with aggregate variables (L ). L1 is the special case of Le where there is only one sort. In this sense formal introduction of L4 is unnecessary. Nevertheless, for the sake of clarification and for the purpose of letting Part II stand alone, L4 is fully introduced in this chapter. Syntax of L/ is first given in this section. Two types of variables are available in a one-sorted language with aggregate variables l: simple variables and aggregate variables. A simple variable of LI is the same as the ordinary variable of a one-sorted language. An aggregate variable is syntactically an ordinary sort variable, but semantically a variable whose range of interpretation is restricted by a unary relation rather than to a sort domain. Formally stated, an aggregate variable is of the form x ' in which z ' ranges over the unary relation indicated by the unary predicate symbol P, Let J, L, and K be, respectively, a relation index set, a function index set and a constant index set. In addition, let I be an index set for some unary rela119

120 tions. Let X and e be functions such that X: J -. N+ and ~: L -, N+ where N+ is the set of positive integers. Definition 9.1.1 A one-sorted language with aggregate variables L then consists of the following: (1) parentheses (, );(2) constant symbol Ck for each k E K; (3) simple EPIr, variables z, * *,, *, and aggregate variables zl, **, ',, for each i I, where P, is a unary predicate symbol; (4) a X(j )-ary predicate symbol R, for each j E J;(5) a (( )-ary function symbol F1 for each I L; (6) logical connectives - and -_; and (7) a universal quantifier V. When it is convenient, L~ is represented as a quintuple, L = < P, R, F C, p> where P is a unary predicate set whose members are exclusively used in the superscripts of aggregate variables, R is a predicate symbol set, F is a function symbol set, C is a constant symbol set and p is the arity function such that p: R U F -- N+, where N+ is the set of positive integers. Based on this language, the terms of LA are defined as usual except that each variable is now either a simple variable or an aggregate variable. The set of atomic formulas of Ls, Atom (L ), is also defined as usual. The set of well-formed formulas of L, Form (L, is then defined recursively as: (i) if a E Atom(L ), then a e Form (L; (ii) if a, orm (Ls ), then so are - a, (a - a), and Vv a where v is either a simple variable or an aggregate variable; (iii) nothing else, except the expressions obtained by finite applications of (i) and (ii), is in Form (L ). The defin

121 able syntactic objects U, fn, and 3, and the standard notions such as sentences are also introduced in the usual way. 9.2. Interpretation of LI A structure is needed to interpret each formula in LI. Let OS, be a structure for L. Then OS, = < 0f, {Pi },,, {Rj }jel { (Ft }EL, {(C }EK >t where n is the universe of OS,; P, is a unary relation P, C fi; R is a X(j)-ary relation R; C fnX(); F, is a (( )-ary function F: f(') -_ Q; and a distinguished element Ct is an element of f. The interpretation of a formula in the structure OS, then requires a variable assignment function s as follows: Definition 9.2.1 For set V of variables of L and the universe N of structure OS,, 8 is an assignment function 8: V — * f such that for a simple variable z, 8 (z)= a, where a E f; for an aggregate variable z, s(z 8 ') = a,where a E P,. Assignment function for the terms of QL is defined as usual. For notational convenience symbol 8 is also used for the assignment for the terms. The validity of each formula is determined by the following interpretation rules. t From the next section on " ' " is omitted on a symbol as long as the meaning of the symbol is unambiguous.

122 Definition 9.2.2 For Rj(to, **,tx()),, l'2 E Form(L ), where t,'s are terms, the satisfaction of the formulas with respect to 8 in OS, is defined by, (1) tos R,(to ',tx()) I iff <s(to),,s(t>(j))> E R (2) =os, -'[81 iff AosI l8 (3) 'Ps i-. t2[ 8 iff if =S ' [ls] then s== 2181, (4) For a simple variable z, ' Vo v [I1 iff for any a E,, 18 (z aI)], and (5) For an aggregate variable x ', 05, x t' [s I iff for any a E P,, ^.~, ~ ~ i( V.l)j V where for variables v, and u (, s(vm I a)(v) a (V) if vm.V a if vm =-k v As a corollary to the definition, the interpretations of U n, _ and 3 can also be easily defined. 9.3. E-Extensibility of L Two results are shown in this section: (i) how a formula in a many-sorted language is translated into Ls, and (ii) how in Ls variables ranging over a priori undefined sorts can be introduced while the structure associated with L/ is expanded by definitions.

123 As a preliminary step to showing the former result, a many-sorted language (L ) is formally defined first. A many-sorted language Lm with sort index set I consists of the followings: (1) I I I infinite disjoint sets V',.*, VI I where the elements of V', 1 < i < I|, are called variables of sort i;(2) Il disjoint sets C, * *, C I I where the elements of C' 1 < i < I I |,are called constant symbols of sort i; (3) for each n-tuple <i *, i, >, {i,', i, } C, a set R<'1.'S> whose elements are called predicate symbols of sort <i,, i>; (4) for each n+1-tuple <i, i, i+,+> {i,, in, i+} C I, a set F I', X Is,.+9 > whose elements are called function symbols of sort <i,.. *,i,,i,+>; (5) logical connectives - and -*; and (6) a universal quantifier V. For the sort index set I, let there be a partial order relation S C I x I called sort ordering, such that < i i i > E S if and only if sort ij is a subsort of d sort it. For each i E I, let SUB(i) { ip: < i,, i > E S }. The syntax rule of L, with respect to the sort ordering S is given in the followingf. First, the set of terms of sort i is inductively defined as follows: (i) any variable of sort i or constant symbol of sort i is a term of sort i, and (ii) if f is a function symbol of sort <ii,, * i, i +> and t, *' t, are terms of sort if,, i, respectively, where if E SUB (i), 1 < j < n, then f(t, *, t. ) is a term of sort i,,+. The set of atomic formulas of L. is defined to be of the form A (tf, *, tf) where A is an n-place predicate symbol of sort <i. * *, i,> and tfP, 1 < j < n, is a term of sort iP E SUB(ij). The set of well-formed formulas of L, is then defined as usual. t The syntax rule of Lm given here is similar to that of a many-sorted language given by [Walt831. Similar syntax rule is also mentioned in [Wang52J as a more general form than the syntax rule of a many-sorted language given in (Ende72, KrKr67j.

124 Definable symbols U, n, and 3 are introduced in Lm as usual and the interpretation of the formulas of Lm is also given as usual. Now it is shown how a formula in Lm is translated into Lo. Let an be a formula in L. When aU is translated into LI, let the translated formula be denoted by ar. If a sort variable, say z, of sort i E I, occurs in a,, such as Urm = V2? Z, then in a' z, is replaced by an aggregate variable, say z P, i.e., OTE --- V. z where P, is introduced as the corresponding unary predicate symbol to sort i. If a function symbol, say f of sort <i. * *, i,, i,+>, {il, * * *, i,,+, C I, occurs in a, such as am J. then in ac f is superscripted with P, +,i.e., =as, f +1 where P,+ is introduced as the corresponding unary predicate symbol to sort i,,. When n = 0, the preceding function symbol translation includes how a constant symbol in a, is translated into LQ, i.e., if c is a constant symbol of sort i that occurred in a,, then c is superscripted with the unary predicate, say P,, that corresponds to sort i, i.e., c. The preceding translation of an into ar' implies that /L is as convenient as Lm in abbreviating the relativized expressions in a one-sorted language into more compact forms.

125 For convenience, let Lm with sort index I be a quadratuple L, = < R, F, C, p > where R is a predicate symbol set, F is a function symbol set, C is a constant symbol set, and p is the arity function such that p: R U F -+ N+ where N+ is the positive integer set. Then the language LI for ac is a quintuple L - <P,R,F,C',p>. P in LE is a unary predicate symbol set whose elements are the unary predicate symbols that are introduced during the translation of a, into ac, for instance, such as P, in z and P,+ in Pa / +. F' and C ' in LQ are the function symbol set and the constant symbol set, respectively, whose members are obtained by superscripting appropriately their respective function symbols and constant symbols in F and C As far as semantics for the formula as is concerned, the structure for L, say OS (L ), can be constructed from the many-sorted structure for L,, say MS(Lm). Let MS(Lm) be a quadratuple MS(Lm) < {S,},i,, R, F, C > where I is the sort index set. Then OSa(L ) is a quintuple OS,(L))' =< n, P,R,F' C> where = J US,, P= {P,: for each i I, i E S, is assigned to Pi } and F' {f': for each function fE F, f: S,1X * XS. -S S+1, f is an arbitrary extension of f, f': n" - ) }. The following theorem is shown for the translation of a, into as: Theorem 9.3.1 A sentence am in Lm is true in MS (L,) iff as in L is true in 05, (L^.1)

126 Proof. Let the sets of terms of Lm and L/ be denoted by Term (L,) and Term (L c ), respectively. Let 8 be an assignment function 8: Term (L, ) U S, 'El Then along with the translation of a, into ac, there can be defined an assignment function 8, s:Term (L,) -- fl, such that ' (t ) = s(t) where t~ stands for the translation of t E Term (L,) into L. Proof is shown by induction on the length of a,. First, let aU be an atomic formula of the form R(t l,, tI) where t,, *, t, E Term(L,). Let the relations designated by R in MS(L,) and in OS,(L})' be RMS(Lm) and RO~s(L)' respectively. Then R MS(L) R S (L). From the way that 8' is defined, it follows that R(L,),(- - t8)[1 <=> <(t), **.*, (t,) > E R '(1") MS(Lm )( <=> <= 8e((t > ( R~ ( R (t, ~, t ~ 8 os (L )*' Since ' = R(t, * *, t,), the theorem holds when a, is atomic. Suppose the theorem holds for all formulas of length less than or equal to. Inductive step must be shown for the formulas of length h +1. When the formulas of length h +1 is obtained from the formulas of length less than or equal to h by using - or -, the proof is trivial. Only the following inductive step is shown. Let a,h be a formula in L, of length h and let a7^ be the translation of a,h into L'. Induction hypothesis implies that for any assignment function 8 and its corresponding assignment function 8a, =() Ur n 18 iff;i= _.s^l'h[. Let a, be Vz, aUon where z, is a variable of sort i. It os (L E h' follows that

127 =S(Lm) a 18 1 <> =MS(L,) V,^ 1h <=> for any a E S,, s ) [. 8s |(, a ) by the induction hypothesis, by the way the translation is made, and by the way OS (L ) is constructed from MS (L, ) <=> for anya E P, = 2 ' OS, (L E ) by Definition 9.2.2 (5) <=-> l Vz' ' 18 OS, (L A)I Since a7 = Vzia' ', the theorem holds for the formulas of length h +1. Q.E.D. The preceding theorem assures that any formula in L, can be translated into LE, only by using aggregate variables. It may well be assumed from here on that a many-sorted theory can be expressed in L, only by using aggregate variables. In addition to showing the expressive power of L/, the preceding theorem suffices to justify the validity of embedding aggregate variables in a one-sorted language. The power of L' over L, lies in the fact that in the former a variable whose range is restricted to any subset of the universe n can be introduced as needed in its extension, whereas in the latter a sort variable ranging over an a priori undefined sort may not be introduced. This means that one of the problems of L,, namely, the inflexible usage of sort variables (e.g., [Cohn83]), can now be overcome. How the inflexible usage is overcome is explained in detail in the rest of this section. Now it is shown how in LI variables ranging over new sorts that have not been defined a priori can be introduced while the structure associated with L~ is

128 expanded by definitions. Let a theory T, in a one-sorted language be equivalently expressed as a many-sorted theory, say T^, in Lm. Let the language for Tm be L,(Tm) t. Let z and z2 be the sort variables of Lm(Tm) which range over the sorts S1 and S2, respectively. An inflexible usage of sort variables is displayed when another formula in a one-sorted language, say a logical consequence <0 of T,, = Vz (Sl()z)fnS ) - (z)) (9.1) needs to be further abbreviated in L,,(T,). If a new variable ranging over S, n S2, say zk, can be introduced, 0, of (9.1) can be abbreviated to VzAt (zt) in Lm (Tm). Unless the sort equal to S1 n S2 has been defined in the sort structure for L (Tm), however, doing so requires the revision of the sort structure for L, (T,) to accommodate the sort equal to S1 n S2. Compared with this, in L/ in order to introduce a variable ranging over a previously undefined set, the structure for L' only needs to be expanded by E-definition. Let TS be the translation of T, into L. Let L (T{) be the language for TE. Let z ' and z 2 be two aggregate variables of L' that range over the relations S1 and S2, respectively. In order to introduce a variable ranging over S, n S, all that must be done is to add a new unary predicate symbol Sk to L' (Tr), abbreviate 0, by Vx r r(z r), (9.2) and augment Ts by the defining axiom v, ( S, ({) 4S ()n S2({))e t By the language of T., it is meant the language whose variables are those of L,,, and whose relations and function symbols are those which occur in T,.

129 The extended language, say L ', is formally called a E-eztension of L' and the augmented theory, say Tr, a E-extension of TE. As far as the semantics of the new predicate symbols in L' are concerned, such as S of (9.2), their corresponding unary relations must be introduced in the structure for L~. Suppose OS, is a model of Tr. In a way similar to the one shown in Lemma 3.3.2 of Part I, it can be shown that there is a unique expansion by definition of OS,, say OS', which is a model of Tr. More specifically, OS,' is called an ezpansion by — dcfinition of OS, Let the characteristic of Lo that allows a more compact expressive power in its extended language be called E-cztensibility of L. The validity of E-extensibility of Ls can be shown in a way similar to the one that shows the validity of Eextensibility of LE in Theorem 3.3.2 of Part I.

CHAPTER X PROBLEM FORMULATION 10.1. Representation of a Many-Sorted Theory in L/ The many-sorted theories of concern here are those which fall in a certain class. In this section, it is shown how these many-sorted theories can be described in LI. Let T, be a many-sorted theory expressed in a Lm with sort index set I and its associated sort ordering S which is a partial order relation in I. Let Lm (Tm) be the language for Tm. Let F< ' '""+>, {fil *, i,, i,+} C I, stand for the function symbol set of sort <iI, *, i, i,,+1> in L,,(T,). Corresponding to L (Tm), let a Le be defined as shown in Section 9.3.3. For each i E I, the L has a unary predicate Qi Let T, be translated into L' and let the translated theory be denoted by T. Two facts must be included in TE: each sort indicated by i E I is not empty and each function indicated by f E F<1' i '""+1> is well regulated over the corresponding sorts. These two facts can be described in L' in the following forms of axiom schemas: Let z, z1, X, *, be simple variables of L. Then (i) foreach i EI, 3a Q1(z),and (ii) for each function symbol f in Lm(Tm) of sort <i,,,,,, +> V1..* * *, (Qx1(2z) * *... -,Q,(z,).-. Qi,(f(,,* * x,))). When n = 0, 130

131 the preceding formula becomes the sentence Q, (c) which indicates c is a constant symbol of sort i. For the preceding types of axioms, however, their presence in Tr does not need to be stated explicitly. Since the facts described by the preceding two types of axioms hold for every many-sorted theory described in L~, their presence can be simply assumed without their explicit inclusion. For example, when a resolution principle is applied to TE, its refutation can be preceded under the assumption that the preceding types of axioms are implicit in TE. It can be said that many-sorted theories in general contain two types of nonlogical axioms, namely, type I nonlogical axioms and type 11 nonlogical axioms, that characterize each specific many-sorted theory. Type I nonlogical axioms are those that describe the relationships among the sorts. The type I nonlogical axioms can be expressed in the following form of schema in L': VZ ( Q,(z) -} Q.k(z) ) (10.1) where z is a simple variable and < ij, i > E S. The type II nonlogical axioms are any formulas of L:. The goal of this section is to show how the many-sorted theories concerned in this work are formalized by using the language L' and an additional symbol " which is introduced shortly. Although the many-sorted theories concerned here can be expressed solely in LM, for convenience the symbol " E " is additionally used. First, it is discussed that when applying a resolution principle to a many-sorted theory Tr, the two types I and II of nonlogical axioms of TE can be expressed independently by using two different representation schemes. When a resolution

132 principle is applied to T:, deductions made using the type I nonlogical axioms of Tr are distinguished from deductions made using the type II nonlogical axioms of Ts. The deductions made from the former are relationships among the predicate symbols {Q, },eI in L/ and the deductions made from the latter are the resolvents of a set of clauses in L~ which are generated by using the deductions made from the former exclusively as a metaknowledge (how this is done will be clear in Section 11.3 where the WR-unification algorithm is introduced). Such distinction between the two types of deductions implies that the two types of nonlogical axioms of Tr can be expressed independently by using two different representation schemes, one for the type I nonlogical axioms and the other for the type II nonlogical axioms. It is discussed how the many-sorted theories concerned here are formalized by using the symbol " G " and L. It is first shown how the symbol " G " is used to express type I nonlogical axioms of the many-sorted theories. It was shown that the type I nonlogical axioms of a many-sorted theory include the instances of the schema (10.1). Let the symbol " C " be used to indicate that Q,, -. Q, (10.2) if and only if Vz ( Q,J(z) -- Q,(z) ). An expression of the form (10.2) is called an ordering axiom. Let OA (acronym of ordering azioms) be a set of expressions of the form (10.2). It is clear that the type I nonlogical axioms of TE can be expressed in terms of OA. Showing how the type II nonlogical axioms of a many-sorted theory are expressed in L' is straightforward. Previously by Theorem 9.3.1 it has been shown that any formula in L, can be expressed in LJ. Let TE be a set of formulas in

133 LM. Then from Theorem 9.3.1 it is clear that all the type II nonlogical axioms of a many-sorted theory can be expressed in terms of T. In conclusion, it is said that a many-sorted theory concerned in this work is formalized by an ordered pair < OA, T >. The following is an example of the formalization of a many-sorted theory which falls in the class of many-sorted theories concerned here: Example 10.1.1 The many-sorted theory in Example 8.1.1 can be expressed by an ordered pair < OA, TE > as follow: OA t: (1) D C B, D G C, (2) E B, E G C, Ts: (3) 2Vzv (P(xE) U 3xO Q(ZB.,x)D), (4) V EC. P(z C) (5) YVzr EC. Q(zE,zEC). Ts in the previously formalized < OA, T > is a collection of formulas of LE. In the rest of this section it is shown how the TE is equivalently expressed as a collection of "clauses". First a few notions are introduced that is used throughout the rest of Part II. Literals: For any a E Atom (L, a is a literal and -~ a is also a literal. Complements: For any a E Atom(L), a is a complement of a and also t For simplicity, any ordering axiom of the form Q, _ Q,, is omitted in OA. This convention is used throughout the Part II.

134 -, a is a complement of a. The two literals a and -n a are, in either order, a complementary pair. Clauaes: A finite set (possibly empty) of literals is a clause. A disjunction of literals is used as synonymous with a set of literals. The empty clause is denoted by O. Ground literala: A literal that contains no variables is a ground literal. Ground Claueas: A clause whose each member is a ground literal is a ground clause. In particular, [ is a ground clause. Expressions: Terms and literals are the only expressions. Now the Skolemization of the formulas in Tr is considered. For each t E Tr of a many-sorted theory < OA, T>, > can be transformed into a prenex normal form where the matrix contains no quantifiers and the prefix is a sequence of quantifiers. The matrix, since it does not contain quantifiers, can be transformed in a conjunctive normal form. Let the formula t be transformed into Q1zl *- Q?,z, M (10.2) where M is in a conjunctive normal form and Q, 1 < i < n, is either V or 3. If (10.2) were a one-sorted formula, what is known as a Skolem normal form of (10.2) is obtained by the following: beginning with zx, replace each existentially quantified variable in M, say z,, 1< r <n, by a function f(xSl, *., z ), 1 < < * * * <, < r,and delete Q z, from the prefix. When (10.2) is a formula in Li, a modification is made to this Skolemization process. That is, each Skolemized function that is introduced in place of an existentially closed variable is superscripted with a unary predicate symbol that is

135 accompanied with the variable. For instance, if z, is replaced by a function f(zxs, -', Xs,) and z, is an aggregate variable accompanied with a unary predicate symbol, say R, then z, is replaced by the function fR (za, ~* *, z ). Once each b E TE is transformed into a Skolem normal form, the prefix of b is made implicit since it consists of only the universal quantifiers. After the prefix is dropped from it, t is a conjunction of clauses. Let a conjunction of clauses be used as synonymous with a set of clauses. In the rest of Part II, by a many-sorted theory it is meant an ordered pair < OA, Ts > in which Tr is a set of clauses. An example follows: Example 10.1.2 Consider Example 10.1.1. TE in the < OA, T > below is now a set of clauses. In clause (3), fD({zB) is a Skolem function that is replaced for z: OA: (1) D CE B, D C, (2) Ec B, E C, TE: (3) p(zrB) U (B Q fD (zB)), (4) -P(CCE), (5) - Q(2(E,s c). Finally, a few notations are introduced that are used in the rest of the Part II. First two defined symbols that are denoted by 9 and C, respectively, are introduced. Associated with the symbol X, the two symbols f and C are defined, respectively, as follows: Q,i g Q, if and only if it is not the case that Qj - Q,, and Q, G Q,, if and only if Q,, E Q., and Q,j Q,.

136 Now a unary predicate that corresponds to each term is introduced. The set of clauses Te in a < OA, Ts > is considered. Let Term(T:) be the set of all the terms which occur in T. For each term t E Term(Tr), a unary predicate symbol is determined by the outermost symbol of t. That is, if the outermost symbol of t is a function symbol of the form P', a variable of the form z, or a constant of the form c, then P, is the predicate symbol determined by the outermost symbol of t. Here P, is called the unary predicate that corresponds to the term t. Such unary predicate symbol P, of the preceding is denoted by Ran (t) from here on. Now the ordering axiom set OA of the < OA, T > is considered. Let,, tj E Term(T:). In the rest of Part II, statements of the following form are often needed to be mentioned: Ran (t,) Ran (t,) E OA. (10.3) When OA is fixed, the statements of the form (10.3) can be made without explicitly mentioning OA. For notational simplicity, in the rest of Part II, Ran (t,) g Ran (tj) is used to mean that Ran (t, ) Ran (tj) E OA. Accordingly, by Ran (t,) C Ran (t,) it would often mean that Ran (t,) E Ran (t ) E OA but Ran (t,) Ran (t ), and by Ran (ti) t Ran (tj), neither Ran (t,) G Ran (t ) E OA nor Ran (t, = Ran (t ). 10.2. Finitely Many Most General Unifiers The problem identified in Example 8.1.1, namely, the generation of useless resolvents that lead to dead ends, occurs only when a certain class of many-sorted theories is refuted by a resolution scheme. For instance, for instance, for thenories with the tree structure stated in [Walt84a], this problem would never occur. When

137 the tree constraint is lifted, however, this problem may appear. In this section, the conditions under which such problems may arise are formalized, this time in terms of L. In general, when the resolution principle is applied to a many-sorted theory some restrictions are required in its unification procedure. In order to describe the restrictions more specifically, the following notion is introduced: A many-sorted theory < OA, TE > is considered. Let L (Tr) mean the language of T f. Let P be the unary predicate symbol set of L~ (Tn). Given the set P, let a set of immediate predecessors of a unary predicate symbol P, E P, denoted by IMAP (P,), be defined by IMP (P,)-= Pj | Pj E P, Pj P, and if there is a PI E P such that Pj, PI C P,, then Pj = Pi or Pi = Pi }. For simplicity, from here on the superscript P in the notation IMP( {) is omitted. It can be done because if the theory < OA, T > is given the unary set P is fixed. The restrictions are then: a variable v can be unified with a nonvariable term t iff Ran(t) C Ran(v) and a variable v, can be unified with a variable vj iff IM(Ran(v,)) n IM(Ran (v,)) 7. The former restriction can be enforced easily in a many-sorted resolution by restricting that each substitution component t/v should satisfy the condition Ran (t) C Ran (v). One way to incorporate the latter restriction in a many-sorted resolution would be the followings: if there is a unary predicate symbol Sk E P such that Sk C Ran(v,) and Sk CZ Ran (v ), and, at the same time, there is no St E P satisfying Sk C Si G Ran (v) and Sk G S G Ran (v ), then { v, vj } is unifiable t By L (T) it is meant the language whose variables are those of L and whose relations and function symbols are those which occur in the set T7 of formulas. This notation is used in the rest of Part II. t In [Walt831, a similar idea of incorporating the latter restriction was implemented by the inference rule called "cweakening rule ".

138 with a substitution 0 = { zt,/v, z, /v; }, where zk is an aggregate variable accompanying with the predicate symbol Sk When the preceding method of incorporating the restrictions is directly implemented in a unification procedure, a certain situation arises that is called generation of finitely many most general unifiers (here only the case having a finite number of sorts is considered). The situation of generating finitely many mgus arises when two to-be-unified variables, say v, and vj, satisfy the following conditions: (i) Ran (v,) Ran(v, ) and Ran(v,) Ran (v,), (ii) I IM(Ran (v,))n IM (Ran (v] )) > 1. In fact, if zk is a variable such that Ran (zk) = Pk and Pk E IM(Ran (vi)) f IM(Ran (v,)), then any substitution =I { k/v,, zk/v, } is a legitimate mgu of { vi, vj }, since v, 0 = Vj1. This implies that there are possibly as many mgus for { v,, v } as | IM(Ran (v,)) n IM(Ran (v )) |. For example, consider Example 10.1.2. when the ordering axioms are D E B, D C, E G B and E G C, I IM (Ran (zB)) n IM(Ran (zE)) I = {D,E}. Two different mgus are available for {zB, tzc}, namely, {z /z, z D/z C} and {zrE/zr, z- TE/rzc} which both are legitimate mgus. As has been demonstrated in Example 8.1.1, the problem in this situation is that multiple resolvents can be derived from given two clauses and not all of them are indeed useful for the generation of the empty clause. In Section 11.1, a way to remedy this situation is formally proposed.

CHAPTER XI UWR-RESOLUTION 11.1. Unification over the Weakest Range First, a few basic notions are introduced that are needed for formal description of the resolution scheme called unification over the weakest range (UWR-resolution). These notions are concerned with the operation of instantiation, i.e., substitution of terms for variables in the clauses of L. A many-sorted theory < OA, T > is considered. Given the ordering axiom set OA and the language L (Tr), the following notions are introduced. Any expression of the form t/v where v is a variable and t is a term different from v satisfying Ran (t) C Ran (v) is a wr-substitution component. For two variables v, and Vj of L (T_) that satisfy the conditions (i) Ran(v,) Ran(vj) and Ran (vj) g Ran (v,) and (ii) I IM(Ran (v,)) IM(Ran (v)) | > 1, a pair denoted by { t/v, t/v, } is a wr-subpair, if (1) t/v, and t/vJ are wr-substitution components where t is a new variable in L (T), (2) L (Ts) is extended by including a new unary predicate symbol, say Pt, with Ran (t)=-P, and 139

140 (3) OA is augmented so that for each unary predicate symbol Q E IM(Ran(v,)) n IM(Ran(v)), (Q: Pi) E OA, and (Pt GC P,) E OA and (Pt C PS) E OA where P, and P5 are the predicates indicated by Ran (v,) and Ran (v,), respectively. A finite set (possibly empty) of wr-substitution components that possibly contain one or more wr-subpairs and none of the variables of which are same is a wrsubstitution. In particular, e denotes empty substitution. The notions such as inatantiation and composition of substitutions are defined as usual. If E is an expression and 8 is a wr-substitution, then the instantiation of E by 0 is denoted by E. If X is also a wr-substitution, the composition of 8 and X is denoted by OX. A wr-substitution 6 is called a wr-unifier for a set { El, ' *, E, } of expressions if and only if E10 = E28 = -* - = EEO. The set { El, ', Ek } is said to be unifiable if there is a wr-unifier for it. A wr-unifier a for a set { E, -* *, En ) of expressions is a moat general wr-unifier (wr-mgu) if and only if for each wr-unifier 0 for the set there is a wr-substitution X such that 0 = aX. A wr-resolvent is a resolvent that is generated by using a wr-substitution as a unifier (this notion is defined in a more formal way in Section 11.3). The E-extensibility of L, plays the central role in introducing wr-subpairs. From the definition of a wr-subpair, it is clear that wr-resolvents are not expressible in the current vocabulary of LI. They can only be expressed in an extended language of L. This idea is illustrated in the following example.

141 Example 11.1.1 The many-sorted theory < OA, T > in Example 10.1.2 is considered. OA: (1) D B, D C (2) E E B EGC, TE: (3) P(zx) U Q(zx, fD(zEB)), (4) -P(zc), (5) Q(ZEE, EC). An example of a wr-resolvent is the following: For z of P(zEB) in (3) and zrc of - P(zrc) in (4), a wr-subpair { zKI/zB, zx/.TZc} can be introduced if QL is extended by a unary predicate symbol, say K, where Yz ( K(xz) B(z) C(z)). The extension of /L requires < OA, T > to be extended also. That is, upon introducing K, OA is augmented to OA + by the ordering axioms as follows: (2+) K E B, K C, D E K, E K. Here the wr-subpair { zK/z z, z /zsC)} itself is a wr-unifier of (3) and (4). Therefore the wr-resolvent of (3) and (4) that is generated by using { zK/z B, zK C/zEC } as a unifier is Q (zEK, fD (zK)). In the following, a refutation of the < OA, T > is shown: (6) Q (CK, /D (ZK)), (3)+(4) (7) -. (5)+(6) The aboverefutation shows that the wr-resolvent (6) of (3) and (4) is resolved with (5) resulting in 0.

142 So far the syntactic notion of the UWR-resolution has been introduced. Before ending this section, in the rest, the semantic notion of the UWR-resolution is discussed in terms of the structure associated with L (T). When the outermost symbol of t is a function symbol, say f, (if the outermost symbol is a 0-place function symbol, then t is a constant) the unary relation indicated by Ran (t) is the codomain of the function indicated by f. When the outermost symbol of t is a variable, t itself is a variable and the unary relation indicated by Ran (t) is the range of the variable t. Let the unary relation indicated by Ran(t) be denoted by Ran (t). It is not difficult to see that the range of the variable t in a wr-subpair { t/v, t/v, } [i.e., Ran (t) = Ran (vi) n Ran (v) ] is the weakest range over which { v, V ) can be unified - weakest in the sense that if P, = Ran (t), then there is no other unary relation P1 such that P, C PI and { v,, v, } is still unifiable over P/. For this reason, the unification stated here is called unification over the weakest range and the resolution involving such unification is called UWR-resolution. The idea behind UWR-resolution is therefore to subsume all the possible unifications by one unification over the weakest possible range. 11.2. Herbrand Theorem for L/ Clauses As a preliminary step to proving the completeness of UWR-resolution, in this section a modified version of the Herbrand theorem [Herb30] that is called the L - version Herbrand Theorem is presented. The L -version Herbrand theorem is used as the basis for proving the completeness of the UWR-resolution in the following section. This modified version of the Herbrand theorem is needed for two reasons: first, the Herbrand theorem is originally based on a one-sorted predicate calculus, but here

143 a many-sorted predicate calculus is dealt with, and second, the original version of the Herbrand theorem cannot be used directly for proving the completeness of a resolution scheme, although it provides the theoretic basis for doing so. In the modification, the original version of the Herbrand theorem is used as the starting point. First, based on that original version, a many-sorted version of the Herbrand theorem is established that is applicable to the clauses in an ordinary many-sorted language (L,). Then, the many-sorted version applicable to the clauses in Lm is converted into another many-sorted version that is, this time, applicable to the clauses in L'. Here the former step is of no concern as long as one such manysorted version can be found in the literature. Such a version is given by Kreisel and Krivine [KrKr67] which they call "the uniformity theorem for predicate calculus with several types of variables." Thus the only concern here is to convert the many-sorted version of the Herbrand theorem by Kreisel and Krivine into another many-sorted version that suits our purpose. Converting the many-sorted version by Kreisel and Krivine into the manysorted version that suits our purpose consists of two steps: first, to convert the former into an intermediate version that is applicable to the L clauses, and second, to convert the intermediate version into the form of the Herbrand theorem that can be directly used for proving the completeness of UWR-resolution. The first conversion step is straightforward and, therefore, is shown in Appendix B. In Appendix B, the following form of the L -version Herbrand theorem is derived as an intermediate result:

144 Theorem 11.2.1 Let A (zx,.* *z,) be a quantifier free formula with free variables 2, * *,,s. Then Vai * * Vz, A(zl, * *,,) is unsatisfiable if and only if there is a sequence (t(i), * * *, t()), 1 < i < p, of n-tuples of terms of L (A )t such that A n * * * n A is unsatisfiable where A; is obtained by replacing, 1<j n, in A by l,(') This form of the Herbrand theorem further needs to be modified so that it can be directly used for proving the completeness of UWR-resolution. In the rest of this section it is shown how the further modifications are made. First it is seen what is meant by Theorem 11.2.1. This theorem says that there is a procedure verifying the inconsistency of a prenex formula, say. The formula & ' is constructed from t which, being universal, can be written in the form Vz1 * * V z, A (z, * *, z,) where A is quantifier free. Then formulas are generated of the form, for some k > 0, A(t), *- t,())n * — nA(t), *.*, (ll.1) where tg(')'s are terms of L '(A). Each formula of this form is tested in a finite number of steps to determine whether or not it is inconsistent by using a truth table, i.e., by treating each atomic formula in (11.1) as a propositional variable. Then 2 is inconsistent if and only if an inconsistent formula of the form (11.1) is found. t L: (A) stands for the language of A. By the language of a formula, it is meant the language whose variables are those of L and whose relations and function symbols are those which occur in formula A.

145 As the preceding procedure indicates, the Herbrand theorem provides a theoretical basis for the existence of a proof procedure for a quantification theory (strictly speaking, what is described is a refutation procedure rather than a proof procedure). The Herbrand theorem, however, does not address the details about how an actual proof procedure should look, for example, how terms are to be substituted for variables and how the inconsistency of the resulting formula of the form (11.1) can be checked. For developing an nactual proof procedure, the most critical issue is how these two detailed processes can be made in a systematic way, since what matters in the actual proof procedure is the efficiency. Most of the proof procedures known today tackle this issue in one way or the other. The preceding issue was first addressed by Quine [Quin551. In his paper Quine presented two proof procedures called "method A" and "method B." In method A, he suggested a way to substitute terms for variables. Given a Skolemized normal form 0, let a class of terms, say C, contain, to begin with, all those constants of & (or 'a', arbitrarily, if there is none). Further, if a non-zero-degree function symbol occurs in A, then the function, with members of C in place of the function's argument position(s), in turn belongs to C. This class C, usually infinite, which Quine called "the lexicon of Vi", is then the only set of terms that are substituted for the variables in i. This method of substituting terms for variables is restrictive in the sense that no such restriction is mentioned in the original version of the Herbrand theorem. The restrictive substitution, however, does not hamper the completeness of the proof procedure which is based on such restrictive substitution. Here the restriction is not necessary but only used as a technical aid for carrying out the substitutions. Quine's restrictive substitution strategy was later used in various

146 machine based proof procedures [Gilm6O, Robi65a]. In method B, Quine further suggested a methodology with which the inconsistency of a set of formulas can be proved without formulating a conjunction of all the formulas in the set. He also showed that doing this is more efficient than doing otherwise. These two ideas, restrictive substitutions and proving the inconsistency of a set of formulas without formulating a conjunction of all the formulas in the set, later led to a specific form of the Herbrand theorem by Robinson [Robi65a]. Robinson used this version in proving the completeness of his resolution principle. The goal of this section is to derive a modification of the Robinson's version which is applicable to the clauses in L. The modified version is derived in the rest of this section. First, two notions are introduced that are often called "Herbrand universe" and "saturation." The notion of Herbrand universe of a set of clauses in LI is given first: Let Tr be a set of clauses in L. For some index set I, let {P, },el be the unary predicate sett of LI (T). Let MIN({P, )}ie) be the subset of {P, },e such that if Pj E MIN({P,}, E) then for no PI E MIN({P,}, ) is P C. P,. Let F be the set of all function symbols that occur in TE. For each Pk E MIN({P, },), if F contains a zero-degree function symbol, say c, such that Ran (c) = Pi, then the functional vocabulary of TE is F; otherwise the functional vocabulary is the set {c} U F where c is a constant symbol arbitrarily chosen to satisfy Ran (c) = Pk. Then the Herbrand universe of Tr is the set of all ground terms in which there occur only symbols in the functional vocabulary of T. t The unary predicate set of Ls is the set of unary predicates which accompany the aggregate variables of L E.

147 Now the notion of saturation is the following: Let Tr be any set of clauses in L/ and let Ps be any set of ground terms. Then the saturation of T, over Pi, denoted by Pt(T,), is the set of all ground clauses obtainable from the clauses of TE by replacing each variable, say v,, in a clause of TE with each member, say tj, of PE which satisfies the condition Ran(tj) C Ran(v,) [occurrences of the same variable in any one clause is replaced by the same term]. The two preceding notions are illustrated by the following example: Example 11.2.1 Consider the < OA, T > of Example 10.1.2: OA: (1) D B, D C C, (2) E B, E C, Ts: (3) P(z ) uQ(z2,fD(ZB)), (4) P(c ), (5) Q (,E ECC). The unary predicate set, say UP,of L{(Tr) is {B C, D,E}. MIN(UP) is then {D,E}. Let dD and eE be the constants such that Ran(dD) = D and Ran(eE) =E. The functional vocabulary of TE is then {dD, eEU {fD}. The Herbrand universe of TE is the following: {dD, E, fD(dD), fD(eE), fD(fD(dD)) fD(fD(eE)),. } Let Pn be a finite subset of the Herbrand universe of Ts, say PS = {eE, ID(eE)}. The saturation of TE over Ps, Ps(Ts), is the following: P( TE) { P(E) U Q(eE,fD(E)), P(/D(eE)) U Q(fD(eE),fD(/f(e))),,P(eE), 'P~fD(eE)), -, 9eE~eE),, 9(, QD( {ef^ }).

148 Notice that 'Q(fD(eE),eE) and, Q(fD(eE),fD(eE)) are not included in PdT), since Ran(fD(eE)) g Ran(zxE) [notice that Ran(fD(eE)) = D ]. When the two notions, Herbrand universe and saturation, are used, Theorem 11.2.1 can be rephrased in the form given below. It is assumed that for a manysorted theory < OA, Ts >, OA is any finite set of ordering axioms and To is any finite set of clauses. Theorem 11.2.2 A < OA, T > is unsatisfiable if and only if some finite subset of H(Tz) is unsatisfiable where H is the Herbrand universe of Ts. Finally, with a little exercise of imagination, the preceding form of the Herbrand theorem can be further rephrased in the following form: Theorem 11.2.3 Herbrand Theorem for L/ Clauses. A < OA, T > is unsatisfiable if and only if for some finite subset Pr of the Herbrand universe of Tr, P( Tt) is unsatisfiable. The preceding theorem is the final L -version of the Herbrand theorem that was to be derived. In the following section this theorem is used for proving the completeness of UWR-resolution.

149 11.3. Completeness of UWR-Resolution In this section the completeness of UWR-resolution is proved. This proof closely follows the proof of Robinson's resolution principle presented in [Robi65a]. First the completeness of UWR-resolution at the ground level must be established. At the ground level, however, there is no difference between the resolution of onesorted ground clauses and the resolution of many-sorted ground clauses. For example, when two ground clauses are to be resolved, what must be determined is whether the two clauses contain a complementary pair of ground literals. In making such a decision, it is immaterial that each term in a ground clauses in L' is associated with a certain unary predicate symbol which is determined by the outermost symbol of the term. In the following, therefore, the ground resolution theorem for UWR-resolution is presented without its prooft as a modification of the ground resolution theorem for Robinson's resolution principle. First, a few basic notions are introduced. Let C and D be two ground clauses or, as defined synonymously, two sets of ground literals. Let L C C and M C D be two singletons whose respective members form a complementary pair of ground literals. Then the ground clause (C - L) U (D - M) is called a ground resolvent of C and D. Let T: be any set of ground clauses. Then the ground resolution of T', denoted by R(Tf), is the set of ground clauses consisting of the members of TE and all ground resolvents of all pairs of members of TE. The n t ground resolution of TE, denoted by R" (T), is defined for each n > 0 as follows: R (TE ) Te' and for n > 0, R^"+(Tg ) R (R{ (T )). t Formal proof can be found in [Robi65aj.

150 Theorem 11.3.1 Ground Resolution Theorem for UWR-Resolution. A < OA, Tf > is unsatisfiable if only if R"(TE) contains 0] for some n >0. By using this theorem, the Herbrand theorem for L, clauses, Theorem 11.2.3, can be rephrased as follows: Theorem 11.3.2 A < OA, T > is unsatisfiable if only if for some finite subset Pr of the Herbrand universe of Tr and some n > 0, R" (Pd(T)) contains 0. The rest of this section is devoted to showing how the preceding theorem leads to the theorem for the completeness of UWR-resolution. As the first step, the procedure known as unification algorithm must be given which shows how a mgu is derived for a set of clauses satisfying a certain condition. First, the notion of a disagreement set is introduced as usual. The disagreement set of a nonempty set W of expressions (excluding literals with negation symbol) is obtained by locating the first symbol (counting from the left) at which not all the expressions in W have exactly the same symbol and then extracting from each expression in Wt the subexpression that begins with the symbol occupying that position. The set of these respective subexpressions is the disagreement set of W. In the following an algorithm is introduced which embodies the idea presented in Section 11.1, i.e., unifying two variables which satisfy a certain condition over the

151 weakest range. This algorithm is called WR-unification algorithm and is applicable to any finite nonempty set of expressions. Unlike the unification algorithm for a one-sorted resolution, the WR-unification algorithm requires a finite set OA of order axioms as an input in addition to a finite nonempty set W of expressions. OA is needed in the algorithm since the UWR-resolution needs to know the unary predicate symbols determined by the outermost symbols of terms and variables. For example, if t, and v, are, respectively, a term and a variable that are to be unified, then the WR-unification algorithm needs to determine whether Ran(t,) G Ran (v) or what are IM(Ran (t,)) and IM(Ran (vj)). These become determinable if OA is provided as an input to the WR-unification algorithm. The following process is applicable to a finite nonempty set W of expressions and a finite set OA of ordering axioms: WR-Unification Algorithm Step 1 Set k =0, Wk = W, at =, andgotoStep2. Step 2 If Wk is a singleton, stop; at is a wr-mgu for W. Otherwise, find the disagreement set Dk of Wt and go to Step 3. Step 3 If there exist elements vt and tk in Dk such that vk is a variable that does not occur in t, go to Step 4. Otherwise, stop; W is not unifiable. Step 4 If Ran (tt) C Ran (vt), then let t+l = ak {tk /v }, Wk+l = Wk {t /Vk }, and go to Step 7. Otherwise, go to Step 5.

152 Step 5 If tk is a variable and IM(Ran(vk))n IM(Ran(tk)), then go to Step 6. Otherwise, stop; W is not unifiable. Step 6 If IM(Ran (v ))n IM(Ran (t )) I = 1, then let +.r = aUk{w/Vt, w/tt} where w is a variable satisfying Ran(w) = IM(Ran (v))nl IM (Ran (t,)), W+1 = Wk { /wt, w/tt }, and go to Step 7. Otherwise [i.e., I IM(Ran(v))nM IM (Ran(t,)) I > 1, do the following: (i) let P be a new unary predicate that is not in OA; (ii) for each Q E IM(Ran(vt )) n IM(Ran (tk )), enter Q C P in OA, and also enter P C Pv, and P C Pe in OA where Pvk and Pt are the predicates indicated by Ran (v) and Ran (tk), respectively; and (iii) let at+li k { w/Vt, w/tk } where w is a variable satisfying Ran(w)P,P Wk+l = Wk { w/vk, w/tk } and go to Step 7. Step 7 Set k k +1 and go to Step 2. There are two basic properties of the WR-unification algorithm that need to be justified. One is that the preceding process always terminates for any finite nonempty set of well-formed expressions. The other is that for a unifiable set of expressions, the outcome of a wr-mgu is always ensured. The former property can be shown in a straightforward way. The algorithm has three termination points at Steps 2, 3 and 5, respectively. If the algorithm does not terminate at any of these points and continue infinitely, the algorithm would generate an infinite sequence Wao, Wal, W2, * * *, of finite nonempty sets of expressions with the property that each successive set contains one less variable than its predecessor, namely, Wau contains vt but Waot+ does not. This is impossible since WV contains

153 only finitely many distinct variables. Therefore the algorithm must terminate in a finite number of steps. The latter property, i.e. a wr-mgu is ensured for a nonempty finite unifiable set of expressions, is formally shown by the theorem given below. This result is used in the proof of Lemma 11.3.4 and elsewhere. Theorem 11.3.3 WR-Unification Theorem. Given a finite set OA of ordering axioms and a finite nonempty unifiable set W of expressions, the WR-unification algorithm always terminates at step 2 and the last ok is a wr-mgu for W Proof. From the hypothesis that W is unifiable, there is a substitution 0 that unifies W. It suffices to prove then that the WR-unification algorithm always terminates at Step 2; and that for each k > 0 until the WR-unification algorithm so terminates, 0 = att holds at Step 2 for some substitution Xk [this is sufficient enough to say that the last ak is a wr-mgu for W because the last at possibly includes one or more wr-subpairs that would be introduced at Step 6]. This is proved by induction on k. For k =, by taking X0 =, 8 = aOO since ao =. For O < k < n, assume that 8 = a-kXk holds at Step 2 for some substitution Xt. When k = n only two cases are possible at Step 2: either (i) Wao is a singleton or (ii) Wa, is not a singleton. In case (i), the WR-unification algorithm terminates at Step 2 and a, is a wrmgu since by the induction hypothesis 0 = a, X, for some substitution X,. In case (ii), the WR-unification algorithm finds a disagreement set D, of Wa,,. The inductive step is then to show that in case (ii) the process continues and does not ter

154 minate either at Step 3 or 5, and, when k = n + 1, 68- =,+X,+ holds for some substitution X,.+1 Let k = n. It must hold that X, unifies D, from the followings: 6 is a unifier of W, 8 = aX, holds at Step 2 by the induction hypothesis, and D, is the disagreement set of Wa,. At Step 3, since W is unifiable, there must exist a variable v, and a term t, in D, that is different from v,. Here since X, unifies D,,,it holds that viX" =tnX.-. (1) From (1) it can be shown that v, never occurs in i,: If v, occurs in i,, then v, Xi, occurs in t,,. Since it is impossible that, while v, and to are distinct, v, X, occurs in tI X, and at the same time v,,X, = t X,, v, can not occur in t,. Therefore the WR-unification algorithm will not terminate at Step 3, but will go Step 4. At Step 4 the algorithm sets a,+ == a, ({t /v}, if Ran (t,) E Ran (v ) and otherwise will go to Step 5. At Step 5, as long as D, is unifiable, it is neither possible that Ran (t,)! Ran (v,) when tt is not a variable nor IM(Ran (t, )) n IM(Ran (v,))-= -. This implies that the algorithm never terminates at Step 5, but will go Step 6. At Step 6, the algorithm sets either a,,+l a,, {to, v } * * (2) or,+ = a,, {w,,/vi,,U,/It,} * * (3) where w., is a new variable that does not occur either in v, or in to. Let case (a) be when the algorithm sets i,+, as (2) and let case (b) be when the algorithm sets ~a+i as (3). In the rest of the proof, it is shown that at Step 2 in both cases (a) and (b) 6 = -,+1,.X+ holds for some substitution X,.+1.

155 Case (a): Let X,+. = X, - {(t X, )/v }. Then, =- {(t, X )/v, } U X,+1 by definition of X,+1 = {(tn X.+)/v. } U X,,+ since v, does not occur in ti = {tx /v. })X. +1.. (4) from the propertiest of the composition of substitutions. Therefore the following holds: =- a, X, from the induction hypothesis = a, {t, /vt }XXn+ from (4) =-^+1X+l1... (5) from (2). Case (b): Let X, +1= X - {(w X,)/v,, (w X )/t} ) [notice here-that t, is a variable]. Then x - {(w X, )/v, (w X,, )/t } U X,+i by definition of X,^+ = {(w X +i)/v, (w X, +)/tx } U X, + since w does not occur either in v, or in t, = {w/v,, wt/t,) }X,,+. (6) from the properties of the composition of substitutions Therefore the following holds: 0 = a, AX, from the induction hypothesis = an {w/v., w/t. }X,.+1 from (6) -a=, +1+1 * * (7) from (3). Hence from (5) and (7), it follows that for all k > 0, 0 = ak Xk holds at Step 2 for some substitution Xk. Since the WR-unification algorithm must terminate but it will never terminate either at Step 3 or 5, it must terminate at Step 2. Conset Properties of the composition of substitutions include: (1) for any expression E and any substitutions a and X, (Euo)X = E(oX), (2) for any substitutions a and X,if Ea = EX then a =, (3) for any substitutions a, X and 6, (a\)6 = o(X6), and (4) for any sets A and B of expressions and substitution X, (A U B)X = AX U BX.

156 quently, whenever the algorithm terminates at Step 2 the last at is a wr-mgu for W. Q.E.D. Now the UWR-resolution operator, denoted by Rw( '),is introduced in a similar way as the ground resolution operator R ( ') was introduced at the beginning of this section. Rw ( ') differs from R ( ') only by the fact that the former is applied to the clauses in L', whereas the latter is applied to the ground clauses. Definition 11.3.1 Given < OA, T >, the UWR-resolution of TE, denoted by Rw(TE), is the set of all clauses consisting of members of TE and all wr-resolvents of all pairs of members of TE. The n' UWR-resolution of Tr, denoted by R{(TE), is then defined as follows: R{(T)- == TE and for n > 0, R {+'(TE) = Rw(R{(Tr)). Now a lemma called the Lifting Lemma is proved. Before doing so, the notion called standardization is introduced in order to make a pair of clauses not share any common variable. If L is a clause and vl, * *, vk are all the distinct variables, in alphabetical order, which occur in L, then the z-standardization of L, denoted by L, is the substitution {z1/vi, **, b/Vk} where Ran (z)= Ran(v,), 1 < i < k, and the y-standardization of L, denoted by t1L, is the substitution {ylv,.'', Yk/vk) where Ran(y,) =Ran (v), 1 < i < k. It is said that L is z-standardized (y-standardized) to L tL ( L r ). It is noticed that for a given pair of clauses, say C and D, C (c and D rD share no common variable.

157 Lemma 11.3.4 Lifting Lemma. Given a < OA, T >,i, if P s any subset of the Herbrand universe of Tr, then R (Pd(Tr)) C P(Rw (TE)) Proof. Suppose E E R(P(Tr)). Either E E Ps(TE) or E is a ground resolvent of two ground clauses in PdTs). If E E P(TE), then E E P(Rw(Tr)) since Ts C Rw (Tr). Therefore, it suffices to show that when E is a ground resolvent of two ground clauses in PE(TE), then E E P,(Rw(Tr)). Let E be a ground resolvent of two ground clauses, say CO' and D9'. Since C', D' E P(Ts), there are two clauses, say C and D, in Ts and two substitutions, say a and f, satisfyirg the following: If a ={t/v, *, tk/vt where v, * *, vk are all distinct variables in C and 9= {wl/u, * -, w, /u,, where u, * *, ur are all distinct variables in D, then (i) C9' Ca and ' D P -D and (ii) a and / are overt P (Tr), i.e., tl, * *, tk and w1, * *, are in Pr. Then from the fact that E is a ground resolvent of C." and D, it follows that E -(C - L)a U (D -M)9, where L C C and M C D, L and M are nonempty, and L a and MB are singletons whose respective members are complements [notice that if L = C and M =D,then E is 01. Let 0 be = { tl/1, * *,*I t t, /t, Wl//l, * * *, Wm /m }. Then it follows that E =(G -L )(c (D -M)rD * * (1) t If P is any set of terms and the terms t l, *, t, of the components of the substitution - = ( {tv/l,,,t / v,} are all in P,then O is a substitution over P.

158 and that L c0 = L a and Mr D = M. Let N (L C U M D) stand for the set of atomic formulas that are members, or complements of members, of the set L ic U Ml1D [this convention is used in the rest of Part II]. Then 0 unifies N (L ic U M rD). By the WR-unification theorem, there is a wr-mgu aw unifying N(L ic U M oD ) so that for some substitution X, = aw *.* (2). Here X is over P 8 since the substitution 8 is over Pr. It follows that L.cwX =- La and MviDawX = Mf. Then since La and Mf are singletons whose respective members are complements, so are L Ucaw and M rlDaw. Let F be F =(C - L)caw U(D -M)rDw *a. (3). Then it follows that F 6 Rw (Tr) since (3) implies that F is a wr-resolvent of C and D. From (1), (2) and (3) it also follows that E = FXf. Finally, since X is over P, it is concluded that E E PE(Rw(Tr)). Q.E.D. Example 11.3.1 Consider the following < OA, T >: OA: (1) D G B, D ( C, (2) E B, E C, TS: (3) P(zB) U Q(z,EB fD(2B)), (4) -P( )C). The Herbrand universe of TE is the following: {d, eE, D(d ), fD(eE), f(fD(d)) fD(fD(eE)),.-, } t Distributive property holds for substitutions: (A U B)\ - A X U B\.

159 where dD and eE are the constants such that Ran (dD)= D and Ran(eE)=E. Let a finite subset Pr of the Herbrand universe of TE be Ps= { eE,,d fD(eE), fD(d) }. Then the saturation of TE over Ps is PT )=( P(eE)U (eE,fD(eE)), P(dD)U Q (dD,fD(dD)), P(fD(eE))U Q (fD(eE),fD(fD(eE))), P(fD(dD)) U Q (fDdD),fD(fD(dD))), " P(eE), P(dD), P(fD(eE)), " P(fD(dD))} R(P(Tr))= Pd(T;) U { Q(eE,fD(eE)), Q(d,fD(dD)), Q(fD(eE),fD(fD(eE))), Q(fD(dD),fD(fD(dD))) } On the other hand, let a new predicate K be defined as Vx (K(z) B(z) n C(z)), then Rw(Tr) = { P(zC=) U Q (Z,B D(_ZB)), p ( c), Q (-If(x)) } PdRw(T:)) PrdT) U{ Q(eE,fD(eE)), Q(dD,fD(dD)), Q(fD(eE)fD()fD(C E)), Q(fD(dD),fD(fD(dD))). It clearly follows that R (PE( TE)) C PdRw (T)). The following is a corollary to the Lifting Lemma which shows that the nth UWRresolutions are also semicommutative with saturation. Corollary 11.3.5 Given a < OA, T >, if Pz is any subset of the Herbrand universe of Ts, then R"(PdTr)) C P Rw(Ts)). Proof. Proof is by induction on n. For n =0, R~(P(T()) = P(Ts) = P(RO{(T])). For n > 0, let R"(PL(TE)) C P(Rw(T)) be hold. Then the

160 inductive step is the following: R "+I(P Ts)) = R (R " (P (T))) by definition of R"+, C R (P{(Rw(Tr))) by the induction hypothesis, C P(RW (R (T:))) by Lemma 11.3.4, Pd(R{+'(T:)) by definition of R +. Q.E.D. Now the following form of lemma is concluded: Lemma 11.3.6 If a < OA, TE > is unsatisfiable, then for some finite subset P, of the Herbrand universe of TE and some n > 0, Pt(Rw(TE)) contains [. Proof. By Corollary 11.3.5, it is immediately obtained from Theorem 11.3.2. Q.E.D. Finally, the final version of the Herbrand theorem for L1 clauses is proved, which assures the completeness of UWR-resolution. Theorem 11.3.7 Completeness of UWR-Resolution. A < OA, Ts > is unsatisfiable if and only if Rr(TE) contains [ for some n >0. Proof. The "only if" part is proved first: Lemma 11.3.6 is considered. Here mere replacement of variables by terms cannot produce a for a nonempty clause, i.e., PRRw{(Tn)) will contain D if and only if Rw(TE) contains 0. Therefore, by

161 simply replacing P(Rw(Tr)) of Lemma 11.3.6 by R[(Tr), the "only if" part of the theorem is immediately obtained. Now the "if" part is proved: Let R,(T:) contain [ for some n < 0. Suppose < OA, T > is satisfiable. Since any resolvent of two clauses is a logical consequence of the two clauses, any structure satisfying < OA, T > should also satisfy 0. The empty clause i is never satisfiable by any structure. Hence < OA, T > is unsatisfiable. Q.E.D.

CHAPTER XI EFFICIENCY OF UWR-RESOLUTION 12.1. A Hypothetic Many-Sorted Resolution In this chapter, the efficiency of UWR-resolution is discussed. Informally speaking, the efficiency of UWR-resolution is due to letting a wr-resolvent subsume a class of resolvents that would otherwise need to be generated. For example, when a pair of clauses satisfying a certain condition is resolved, if UWR-resolution is employed, only one resolvent is generated by using one or more wr-subpairs, whereas otherwise more than one resolvents must be generated. In the latter case, resolvents can be generated that only lead to dead ends. In order to discuss of the efficiency of UWR-resolution in an organized way, the situation described informally in the preceding paragraph must be formalized. As a way of doing this, a hypothetic many-sorted resolution scheme, namely E-resolution, is introduced. Informally speaking, the E-resolution is a many-sorted resolution that is identical with the UWR-resolution except that in the former wr-subpair is no longer used, but a pair of wr-substitution components called E-subpair is used in place where a wr-subpair is to be used. Unlike wr-subpairs, E-subpairs can be introduced without extending the language /1 that is currently being used. The formal notion of a E-subpair is introduced shortly. Although here the efficiency of the 162

163 UWR-resolution is discussed by comparing it with the efficiency of the E-resolution, similar comparison can be made with the many-sorted resolution scheme such as that of Walther's [Walt83]. Both Walther's scheme and the E-resolution have the problem of generating useless resolvents as the ones described in Section 8.1. Formally speaking, the E-resolution differs from the UWR-resolution only by the following. Let two variables vi and Vj satisfy the following conditions: (i) Ran (v,) Ran (v) and Ran (v,) Ran(v,), and (ii) IM (Ran (v, )) n IM (Ran (v,)) | > 1. Then in the UWR-resolution the variables v, and vI are unified by a wr-subpair {w/v,, w/v,} where w is a variable satisfying that Ran(w) is a new predicate with which IM(Ran (v,)) n IM(Ran(v,)) =1 and Ran(w) E IM(Ran (v,)) n IM(Ran (v,)). Compared to this, in the E-resolution the variables v, and Vj are unified by a pair of wr-substitution components {w '/vi, w '/v} where w ' is a variable satisfying Ran (w '))E IM(Ran (v,)) n IM(Ran (v,)). The pair of wr-substitution components {w 'ivy, w '/vj} is called a S-aubpair. It is said that {w '/v,,w /v, } is a Esubpair corresponding to the wr-subpair {w/v, w/v,}. For such E-subpair ({ 'l/v,,w '/v }, let Ran(w 1) stand for the unification predicate of {w '/v,, w '/v, }. Then it is easy to see that for the wr-subpair {w/v,, w/v, } there are as many corresponding E-subpairs as IM(Ran (v,)) n IM(Ran (v)) whose unification predicates differ from each other. Once the notion of E-subpair which corresponds to that of wr-subpair is introduced, the notions E-substitution, E-unifier, E-unification and E-mgu of the Eresolution can also be introduced, repectively, in the same way as the notions

164 wr-substitution, wr-unifier, wr-unification and wr-mgu of the UWR-resolution were introduced. Their formal definitions are omitted to avoid possible redundancy. The unification algorithm for the E-resolution, namely the E-unification algorithm, can also be introduced identically with the WR-unification algorithm except some modification of Step 6. Step 6 of the E-unification algorithm is: Step 8 If IM(Ran(vk))n IM(Ran(tk)) = 1, then let ak+l = a7k {w/ v, w/t} where w is a variable satisfying Ran(w) = IM(Ran (vt)) n IM(Ran (t)), Wk+l = Wk {w/vt, w/t }, and go to Step 7. Otherwise [i.e., I 1(Ran (vk))n IM(Ran (tk)) 1 > 1], let k+ = oat { w/ k, w/tk where w is a variable satisfying Ran (w) E IM (Ran (vk)) n IM (Ran (tk)), Wk+1 = Wk {w/vk, w/tk}, and go to Step 7. Accordingly, the basic properties of the the E-unification algorithm can be justified with a theorem, namely E-unification theorem, i.e., for any finite nonempty set of unifiable expressions the E-unification algorithm terminates and for a unifiable set of expressions a possible outcome of E-mgu is always ensured. Formal introduction of the E-unification theorem is omitted to avoid the possible redundancy. The completeness of the E-resolution, however, is shown indirectly in the following section where the WR-resolution is compared with the E-resolution in terms of efficiency. In the rest of this section the E-resolution is discussed in more detail. First, it is shown, in the form of a lemma, how a wr-mgu and a E-mgu are related to each other. This lemma is used in the proof of a lemma in the following section.

165 The following notion is first introduced: A finite set 0 of wr-substitution components is called a variable-for-variable substitution if for each wr-substitution component v/vw E 0, both vE and vw are variables. Lemma 12.1.1 Given two clauses C and D, let N (L Ec U MrD )t be unifiable where L C C and M C D, and L and M are nonempty. If aw and ac are, respectively, a wr-mgu and a E-mgu, each of which unifies N(L (c U MrD ), then there is a variable-for-variable substitution 0 satisfying (i) for each vE/vw E, Ran (v) CE Ran (vw), and (ii) awe = a'. Proof. Let Ew and Er be the wr-resolvent and the E-resolvent of C and D which are generated by using rw as a wr-mgu and au as a E-mgu, respectively, i.e., Ew = (C - L )(caw U(D -M)7Daw, EC =((C -L)ecus3U(D -M)?-D EE= -( C - L )(C aE U (D - M )IlD Or - It is noticed that once C and D are z-standardized and y-standardized to C ec and D rDo, respectively, then,.* *, zk and y r, * *, y are all the distinct variables in C c and D rD, respectively. Let aw and a be, respectively, aW =- { t/.,t, * * /y, t', /,Y,/ * * Y Wm/ym }. E = {,l/l, '"* * *, U/, V/y, /, Vm/m } - t Previously in Section 11.3 N(L fc U MrtD) has been defined as the set of atomic formulas that are members, or complements of members, of the set L (c U Mp.

166 If some two-substitution components ti /,, I /yj E w, where t, = w, constitute a wr-subpair {t( /,, w, /yj}, then there is a E-subpair {u, /z,, vI /y }, where u, /z,,, V/yj E a and u = vj, which corresponds to the wr-subpair {t, /z,, t ly }. From the way that a wr-subpair and a E-subpair are defined, the following relationship holds between {t, /z,, Wjl yj and {ui /z,, vj / y,: ({t/:, w,/y,}\ =,/zX, v /yJ}, where X = {u,/t,} and Ran(u,): Ran(t,). Now let a substitution 0 be constructed in the following way: (i) if {t /z,, j /y,} C aw is a wr-subpair and {u, /, v, v /y } C ao is its corresponding E-sub pair, then {u, /t, } is an element of, and (ii) no other wr-substitution components than those identified by (i) are the elements of. Then it follows that aw = ar7 and for any substitution component vElvw E 0, Ran (v) E Ran (w). Q.E.D. Now, the s-resolution operator R( ') is introduced in a way similar to that for the UWR-resolution operator Rw( ) was introduced: Definition 12.1.1 Given a < OA, T >, the E-resolution of T, denoted by RZ(Tc), is the set of all clauses consisting of members of TE and all E-resolvents of all pairs of members of TE. The nth E-reolution of Ts, denoted by R (TE), is then defined as follows: R~ (TE) = Te, and for n > 0, R+ (TE) -=R{R (T7)).

167 In the following it is illustrated how Rw(') and R (') are carried out for a given set of L~ clauses. Here the result of Lemma 12.1.1. is also illustrated. Example 12.1.1 Let the < OA, T > of Example 10.1.2 be augmented as follows: OAt: (1) D B, D E C, (2) E B, E C, (3) FC D, F G E, (4) G C D, G G E, (5) H ED, H E, TE: (6) P ( ) U Q ( E,yE) URf (f/ (z B),y=), (7) - P(zc), (8) Q(,,fJ( )), (9) R(E,zEID). (i) Rw( ) is performed as follows: R'(Ts) = Rw(TE) U { Q (zx,y') U R (fF(z ),y' ), P(zE) U R (fF (z),fH(nz )), P(CB) U Q(z^t, ) }, where two new predicates K and J are defined by Vz (K(z) _ B(z)n C(z)) and Vz (J(z) D (z) n E(z)): w(Tr-) = (r_)U( T ( (fF(z), f "(zE)), Q(,(z ),.. }) Here Q (zZ,xI) is a wr-resolvent of - p (yIc), P(zB ) U Q (zxB,zE) E RW(TE). t For simplicity, the ordering axioms that are derivable from OA, such as F C B, G C B, HC_ B, F C, C G C and H G C, are omitted in OA. This convention is used in the rest of Part II.

168 (ii) R ' ) is carried out as follows: R (T) - R (T~) U { Q (z,yR) U R(fF (z),y), Q(ZEE,yE) U R (fF (zE),yE), P (xE ) U R (fF (E E),fH ( E)), P (zrB) U Q (,2r), sr t r) U Q (zT ad te ) P (P )U Q (T o ot R R (T)) U { RI ( f (w rE),f(wr)), Q (w~D ~,wr ), Q (wr,,w ), Q (W(E,~w ), R (F (W ),fH (W )), R (fF (Wrc ),fH (WrG )), R (fr(w:H),fH(wr-'H),... ). Now consider the wr-resolvent P(z-B) U Q(z B,ZrT) E RW(TE) and the Eresolvent P(y EB)U Q(yrB,yrF)E R ( Ty) each of which is obtained by resolving (6) and (9). The wr-resolvent is obtained by using the wr-mgu {fr(zB)/Izr, xzI/zr D, xz,/yE }, say aw, and the E-resolvent, by using the E-mgu {fF^(z)/z^_, ZF /zD, z^ /y }, say ar. Here {lZ /xZ, Z/= /I } is a wr-subpair in the wr-mgu aw and {z I/zD, zr /y IE} is its corresponding E-subpair in the E-mgu ar. It follows that there is a variable-for-variable substitution {zx /z}') satisfying aw {(zEF/z}) = where Ran (zCr) C Ran (z ). The additionally generated Rw(Tr) and Rs (T,) will be used later in Example 12.2.1 and in Example 12.2.2.

169 12.2. UWR-Resolution vs Hypothetic Many-Sorted Resolution In this section the efficiency of UWR-resolution is discussed. The efficiency is discussed by comparing UWR-resolution with E-resolution in a certain way, i.e., for a given many-sorted theory < OA, T >, by using Rw(') and Rs' ) two refutations of the theory are derived. Both refutations are generated by the method called level-saturation, i.e., the resolution operators Rw( ) and R( ') are consecutively applied to the results of the application at the previous level until 1J is derived. Each refutation is then the alignment of all the resolvents ordered according to their generations. The way the two resolution schemes are compared here is, in fact, to contrast the longest possible refutation sequences that can be generated for a given theory by the two schemes Rw(') and R {'). This approach is meaningful in the sense that at each level it is not known a priori to the resolution which two clauses should be best resolved. The worst case may result in generating all the possible resolvents at each level. The approach adopted here consists of two stages. The first stage is to show that given a many-sorted theory < OA, T > the length of the shortest (by level) refutation generated by Rw(') is identical with that generated by R ' ). Then the second stage is to show that the total number of wr-resolvents generated by Rw(') is smaller or equal to that generated by R t'). When the results from the two stages are combined, the intended comparison of the two resolution scheme is obtained. In the following, the first stage is introduced. First, the notion "subsume" is introduced: Let C1 and C2 be two clauses that differ only by their variables. Let {vl, * * *, v} and {u, * *, u,} be all the distinct variables in C1 and in

170 C2, respectively, where v, in C1 corresponds to u, in C2 for 1 < i < n When it is convenient, the notation C21 vk is often used to mean the vk's corresponding variable in C2, i.e., uk. C1 8ubsumes C2 if for any i {1, * * *, n Ran (u,) Ran (v,). Lemma 12.2.1 Given a < OA, T >, if there is a clause E1 E R1+' (T) - R (Ts), i > 0, then there is a clause E2 E Rw+l(T) - Rw (Tr) that subsumes E. Proof. Poof is by induction on i. For i = 0, the proof is trivial since RW(TE) = Rs (Tr). For i > 0, it is assumed that the following holds: For any clause E1 E R T (T) - R'- (T), there is a clause E2 E Rw(T) - RW-(Tr) which subsumes E1. The inductive step is then the following: Let El E RE+' (Tr)- R:(TE) be a resolvent of two clauses C1, D1 E RS(Tz). Let the abbreviated notations &, and trj be used for (c, and r7D, respectively [this way of abbreviating the z(y -standardization is used in the rest of this section]. Then there are nonempty subsets L 1 C, Ml C D such that El = (C1 - L i)laE U (DI - Ml)la * * * (1), where aE unifies N(L l U M1i7l). By the induction hypothesis, there are two clauses C2, D2 E RW(TE) which subsume Cl and D1, respectively. Let { 1', a*n, z,1 } 2, * * *, Z2 } be all the distinct variables in Cl1. and C2(2, respectively, and let { yV,.*, y, } and { y2, * *, y,2 } be the distinct variables in D vlr and D2/2, respectively. Let 6c and ED be

171 =C { z1/21, zn /2n }, =D { Yi/, *, Y m/ }. Then since C2 and D2 subsume C1 and D1, respectively, it holds that Cll = C2.23c and DlfI - D212S D, for any i E { 1, *, n } Ran(xz,) g Ran(x,2) and for any j E { 1, -, m }, Ran(y,) C Ran(y2). It also holds that there are two singleton subsets L2 C2 and M2 C D2 which are subsumed by LI and M1, respectively. It follows that L l =l L 226C and Mil- = M22D. Therefore, from (1) it follows that El = (C2 - L2) 26C a U (D2 - M2)1126D Oa *.. (2). Let 6 be - = {(z1/2z,,* * *X/X y?/yl, ', /y2}. Then it follows that I, I,, /Y I Ym E1 = ((C2 - L2)62 U (D2 - M2)72)E.* *.. (3). It also holds that N(L ll U Mll) = N{(L226C U M2?126D) =N ((L 22 U M22)6). Therefore, when N((L ll U Mli1l)6) is unifiable by ao, N(L{22 U M212) is unifiable by ba. Here 6as is still a E-mgu for N(L 22 U M272) since 6 simply replaces variables for variables and for each v/v2 E 6 Ran(v1) _ Ran(v2). Now by the WR-unification theorem, it follows that there is a wr-mgu, say aw, unifying N(L2{2U M22). Then between aw and 6aE, by Lemma 12.1.1, there exist a variable-for-variable substitution 0 such that aweO = s * * * (4) where for each substitution component vl/vw E 0, Ran (vs) G Ran (vw). Let E2 be the wr-resolvent of C2 and D2 that is generated by using aw as the wr-mgu, i.e.,

172 E2= (C2 - L2)2w U (D2- M2)nW... (5). Then E2 E R4~+1(T) - Rw(T). Finally, the following holds: El = ((C2 - L2)2 U (D2 - M2)f12) 6 as from (3) = ((C2 - L 2)2 U (D2 - M2)2) aw 0 from (4) = E28 from (5). Since for any substitution component v /vw E 0, Ran (v ) G Ran(vw). So it follows that there is E2 E R1+'(TR) - Rw(Tr) which subsumes El. Q.E.D. Example 12.2.1 Consider Example 12.1.1. Let E1 E R (Tn) - R{ (Tn) be E1= R(fF(w ),fH (w )). Here E1 is a E-resolvent of Q(zxs,fH(zsx)) e (T7), say C1, and Q(zx,yE) U R(fF(z),yr) E R s(Tr), say D1. There are two clauses Q (Z E,f (Z )). Q ( K,yE) U R F ( K ( ),y ) e RW(Tr), say C2 and D2, respectively, which subsume C1 and D1, respectively. A wr-resolvent of C2 and D2, say E2, is then E - R (fF(zE),ffH(zf ( )). E2 subsumes E1. It can be verified from Example 12.1.1 that E2 E R (Th)- Rw(TE. The following is a corollary to the preceding theorem which assures that the shortest deduction sequence generating 1 by Rw( ') is not longer than that by Rd * ).

173 Corollary 12.2.2 Given a < OA, T >, if n is the smallest non-negative integer for which R (T,) contains 3, then Rw(Te) also contains U. Proof. Let n be the smallest non-negative integer for which R:(T,) contains l. Then 1 is a resolvent of two clauses, say C,, DI E R~-' (T). Since I3 is a resolvent of C1 and D1, the following holds: C and D are singletons, and there is a E-mgu, say a, unifying N(C1 I U Di 7l). It follows that Cl Ear U D flaf -= 0 where Cl1caE and D l/al are singletons whose respective members are complements. By Lemma 12.2.1, there are two clauses in R-1(TE), say C2 and D2, which subsume C1 and D1, respectively. Here C2 and D2 are also singletons. It can be shown that N(C2(2 U D2i2) is unifiable in a way similar to the one that showed N(L22 U M2v12) was unifiable in the proof of Lemma 12.2.1. Let aw be a wr-mgu unifying N(C22 U D272). Then C2z2aw and D2rzaw are singletons whose respective members are complements. It immediately follows that C2& w U D 2f =W -. Rw(T:) - Rw-~(T:) contains 0. So does R^(T). Q.E.D. Now the result in other direction to the result of Corollary 12.2.2 is derived. First, a few more notions associated with "subsume" are introduced. A set of clauses, denoted by SBSM(C1(vt)), 1 < k < n, is aubsumed by C1 over Ran(vk) if for each S E IM(Ran (v,)) there is a clause Cs E SBSM(Cl(v~)) satisfying (i) C1

174 subsumes Cs, (ii) Ran(C, I vk) = S, and (iii) Ran(C, I v*) = Ran(v,, if i 3L k This notion is further generalized as follows. Let vv,, v, be the variables in alphabetical order in C1 and let {vi,.., v,}C {v, *, v }. Let r be the index set of IM(Ran(vl)) X *. X IM(Ran(v,)). A set of clauses, denoted by SBSM(C1(vl, * -, v,)), {vl,, v, t} C {vi,..., v,), is subsumed by C1 over Ran (v),), Ran (v.), if the following is satisfied: For each <S,',, S, > E IM(Ran(v,,)) X *** X IM(Ran(v;,)), I E, there is a clause Cs E SBSM(Ci(v,1,, v*, )) satisfying (i) C1 subsumes Cs, (ii) for each v^, h E { vV, **, }, Ran(C, I vA)= S, and (iii) for each v,, Vh,' {(vl,,*,t - {,V, *., V }, Ran (C, I v,)= Ran (v,). In the rest of this section a symbol <> is used to designate that C1 <> C2 means C, and C2 subsume each other. The following notation is also used: Given a < OA, T >, let A'(OA ) stand for the set of all the unary predicate symbols that were newly defined from the 0th UWR-resolution Rw{( T) to the ith UWRresolution Rw(Tr). Then a variable v in a clause C E Rw(T:) is a d(i,j)variable if Ran(v) EA'(OA ) - A(OA ). Lemma 12.2.3 Given a < OA, T >, let there be a clause E E RW+T(T) - Rw(T). If there is no d(i+,O)-variable in E,, then there is a clause E2 E R'+1 (TR) - RE(TE) such that E1 <> E2. If there are some d(i+l,o)-variables, say e, * * *, el,in El, then there is a SBSM(Ei(e, i, * * *, )) C R (T) - R(T). Proof. Proof is by induction on i. For i = 0, let a clause

175 El E Rw(T_) - RI(T:) be a resolvent of two clauses C1, D1 E R4(T). Then it follows that for some two singleton subsets L1 C C and M1 C D, E = (C1 - L 1)Ziaw U (D1 - MI1)77aw '' (1), where aw is a wr-mgu unifying N(L{,l U M11l). (Case I) If there is no d(1,0)-variable in E1, then aw does not contain any wr-subpairs. Hence aw is also a S-mgu unifying N(L llU Mllr). This implies E E R (T) - R (T) since R(T) = R (Ts). It trivially holds E <> E. Consequently, by letting E2 E, there is a E2ER (Tn)-R, (Tr) such that E2 <> E. (Case II) If there are k d(1,0)-variables e1,..-, e in E1, then the wrmgu aw in (1) must contain k wr-subpairs since El E RW(TE) -RR(TE). Let { 1, **, z,1 } and { y, * *, y } be all the distinct variables in Cll and D 'l, respectively. Let the k wr-subpairs in aw be { e /z), e/ )} I, * * 1 { e/Z ( ), ek /yd() } where {Zc (), an*, zc() } Z,, and {Yd(I), * * d() } {l, * *m } Let be the index set of IM(Ran(el)) X *.. X IM(Ran(ek)). For each {S',,, S} e IM(Ran(el)) x ~* X IM(Ran(e )), I E r, let X' be a substitution of k substitution components {vI/e l,* ', viC} such that Ran(v) — SJ, 1 j < k. Then when ow unifies N(L1(l U Mq}1), awX' also unifies NV(Ll U Mlti7) since X' simply substitutes variables el, * *, ek by vl, * *, v, respectively. Here aw X is no longer a wr-mgu but a E-mgu since each wr-subpair { el/zc), ej/yd(), 1 < j <k,in aw is replaced by { vj/zC:), j/y4d) } in awX' which is now a E-subpair. Let E' be

176 El = (C1 - L l)aw X' U (D - Ml)\law X' It is clear that E E R I(T) - Rs-(T) since R (Tr) R= R:(T) and awX' is a E-mgu of C1, D1 E R{(Tr). The following relationship holds between El and E1: E = ((C - L ),I w (D,- M,),1w )X' = EX'. It follows that E1 subsumes El since for any substitution component vrjvw E XL it holds that Ran(vs:) G Ran (vw). Finally, the following is concluded: for each {St, ***, Sk} E IM(Ran(e1)) X ** X IMf(Ran (ek)), I E 7, there is a clause El E RI (To) - R (Ts) such that (i) E1 subsumes El, (ii) Ran (El j e) S, 1< j < k, and (iii) for any variable v in E1 other than e1, I, e, Ran (El I v) Ran (v). Therefore, it follows that there is a SBSM(El(e, *, k)) C R (T) - Rs ( It is now assumed that for i > 0, the induction hypothesis holds. In the rest of the proof, the inductive step is shown. Let E1 E R'+1(T) - Rw(Ti) be a resolvent of two clauses C1, D1 E Rw{(T). There are only three possible cases: (i) there is no d(i,O)-variable in either C or D, (ii) some d (i,0)variables are in either C1 or D, and (iii) some d (i,)variables are in both C1 and D,. The inductive step for case (i) is similar to what was shown in the induction basis. The inductive step for case (iii) includes that for case (ii). Therefore in the rest of the proof it is only shown that the inductive step holds for case (iii). First, some preliminary steps are given which are needed in the rest of the proof. From the fact that E1 is a wr-resolvent of C1 and D1, let E1 be

177 El = (C1 - L1)Iaw U (D1 - Mi)7aw, (2) where L C C and Ml C Dl, LI and M1 are singletons, and aw is a wr-mgu unifying N(LL, U M1,). Here let zx,,,, and y1,, y, be all the distinct variables in Cl and D tl, respectively. Let z,, *, i {1, n }, and y, *,, {i * id } C {1,, m}, be the d (i,O)-variables in C1l and D ll7, respectively. By the inductive hypothesis, there are SBSM(Clxl(z,, * I, z, )), SBSM(D l{l(y,, y )) C R By definition of SBSM(Clix(z,, * ~, z' )), it holds that for each clause, say ac, in SBSM (Cl(x(z, *,,, )) there is a substitution, say c, such that acc = CllbeS. Hence let Ac be a set of substitutions such as d Ac = (c: CiSc E SBSMA(Cl(x,, *., X ))} An observation is made on Ac as follows: Let Fc be the index set of IM(Ran (z )) X X IM(Ran (zx,)). By definition of SBSM(C l(z, l (,, )), it follows that for each I E c, if '',S't > E (MR <S, > IM(Ran ()) * * X IM(Ran (z,)), then there is the corresponding substitution 6c E Ac such that ClSlb E SBSM(C(x,(z, *, )) and for each k, il < k < i, Ran (CllSc I zk ) = Si. Similarly, let AD be defined from SBSM(D ljl(yl, *, yJ)) as follows: AD = (D: D1l1lD E SBSM{(Dll(i(y,, y ))} An observation is made on /AD as follows: Let rD be the index set of IM(Ran(y )) x * X IM(Ran(y,.4)). By definition of

178 SBSM(Dil(y,, * *., y )), it follows that for each I, if corresponding substitution 6D E AD such that D 0ltj E SBSM (Dl(yl,, y, and for each k, i1 < k < id, Ran(D ltSl6D I tl) =. (Case I) Let there be no d (i +l,O-variable in E. Then aw does not contain any wr-subpair. The pair of clauses which can be resolved among SBSM(C{x(z,, * *, )) xSBSM(Dl1{y(y,, y')) is first identified. It is done by constructing a set of substitution pairs, denoted by RESOL1, which is a subset of Ac x AD. RESOL1 is constructed from A1r X AD and the wr-mgu aw of (2) by the following rules: (i) A substitution pair <6c, D> E Ac X AD is a member of RESOL' if for each wr-substitution component t/v E aw it satisfies the condition that (a) if t is in ClE, and v is in D il,then Ran(t Sc) ( Ran (v D), or (b) if t is in D,,1 and v is in Cl,, then Ran(t 6D) E Ran(v 6c). (ii) No substitution pairs other than those identified by (i) are in RESOL. It follows that RESOL is not empty. [ The nonemptiness of RESOL ' can be shown as follows: Without loss of generality, let i be in Cl, and v be in Dll. Suppose * is used to indicate that k means k is either a d (i,0)-variable itself or a term containing d(i,0)variable(s) in it. There are four kinds of substitution components in aw: t /v, t /v, t/v' and t'/v. In order to prove the nonemptiness of RESOL, it suffices to show that for each kind of substitution components the following holds:

179 (a) t /: Since Ran(t Sc) = Ran(t) and Ran(v 6D) = Ran(v), for any <Sc, 6D> E Ac X AD, Ran (t Sc) G Ran(v 3). (b) t/v: If t is a nonvariable term, then for any <Sc, SD > E Ac X AD Ran (t ' c) i_ Ran (v D ) since Ran (St c) = Ran (t ) and Ran(v D) = Ran(v). If t itself is a d(i,0)-variable, then for any S E IM(Ran(t )), S G Ran(v) since S $ Ran (t ) and Ran (t') G Ran (v). (c) t /v': Since t can not be identical with v, Ran (t) C Ran(v). For any Sc E Ac, Ran(t Sc) = Ran(t). Since v is a d(i,0)-variable, there is a S e IM(Ran(v')) such that Ran(t) C S G Ran (v). (d) t* /v: If t' is a nonvariable term, then it is an identical case with (c) since Ran (t 'c)= Ran ( ) for any 6c E Ac. If t itself is a d (i,Ovariable, there are two possible cases: (i) if t - = v, then for any Sc E IM(Ran (t )) there is a SD E IM(Ran(v')) such that Sc = SD, or; (ii) if t' 7 v' for any Sc E IM(Ran(t')) and for any SD E IM(Ran(v')), Sc G SD. ] From the way RESOL1 is constructed, it follows that for each <6c, SD> E RESOL' and for the two clauses Lll, M1lr7 of (2) N (L lli6C U M17j SD) is unifiable, which further implies that C~xl/c E SBSM ( C,(z,, * *, z )) and D,(,6D E SBSA (D 11(y, * are resolvable. Let <Sc, SD > E RESOL (notice that there is at least one pair of substitutions in RESOL' since RESOL' is not empty). C(il, L 11, D r7l and MAf1r of (2) are considered. It is shown how the two clauses C~lSc and D lr16D which are derived by using the <Sc, SD> E RESOL' are resolved. First, a substitution 0 that unifies N(Lllc U MlllD) is constructed from aw of (2) and the

180 substitution pair <6c, D > in the following way: For each substitution component t/v E ow, (i) if t is in C I and v is in Dnli, then tSc/vS D E, or (ii) if t is in D Zl? and v is in C11, then t 6/vSc E 0, and (iii) no other substitution components other than those identified by (i) or (ii) are the elements of 0. Without loss of generality, let t be in Cl^l and let v be in DlTl. Then accordingly there are tSc in C lESc and v D in D 111SD which correspond to t in C xl and v in D ti, respectively. If t and v are unified by aw, i.e., taw = vaw, then it follows that t Sc and v D are unified by 0 since t Sc6c t c and v D = V D {t Sc/v 5D } = I c * Therefore, it follows that when aw unifies N(L 11 U Mlxil), 8 unifies N(L liSc U Mlt1S6D). Furthermore since the construction of 0 from aw does not introduce any additional wr-subpair into 0, when aw does not contain any wr-subpair, 6 does not contain any wr-subpair either. Therefore 0 is a E-mgu as well as a wr-mgu. To indicate that 0 is now a E-mgu, let the notation ar be used for 0. Let- E2 be the E-resolvent of C,1~1c and D 11_VD that is generated by using ao as the E-mgu, i.e., E2= (C1 - L i)eSc a U (D 1 - M l)l76o Then from that ClelSc E SBSM(CilE(zx, ~*,* z)) C R (Tr) and DlDo E SBSM (D il,(y, *,, v, )) C RS(Tr), it follows that e2 e R +1 (Tr) - R (Tr). Now E1 <> E2 is shown. The variables in E1 of (2) are considered. Since there is no d (i+i,0)-variable in E1 there is no d (i,0)-variable in E1. Therefore, no d(i,0ovariable is in (C,-L L)law. This means that although C1 may contain d(i,0)-variables they are eliminated in (C1 - L )Eiaw. Consequently, for any variable, say v, in (C1 - Ll)l]^w, it should hold that either

181 vE,,,,' * 1, I }. Each variable v in (Cl - L1)law is compared with its corresponding variable in (C1 - L )ilecas, i.e., (C1 - L )i1Scs ar v, in the following two cases: (i) When v E{z *,,z,,} - {, *, i }, (C -L i)lco Iv =v c In this case, v S is a variable in C 1Sec. By definition of SBSM(Cll(z, *.., )), it holds that Ran(v 6c) = Ran (v) [by definition, for a clause C, E SBSM(C{(v1, * —, v)), if u is a variable in C, and u f {vl, - vk },then Ran(u) Ran(C, I u)]. (ii) When v E(1 Y - {1, * ( *, y- -, * *, yi ( - L l v = v E In this case, v ca r is a variable in D llSD. By definition of SBSM(D lZI(y4, * ', y, )), it holds that Ran(v Scc) = Ran(v). Therefore, it follows that (C1 - L )law <> (C1 - L )1SEc u. Similarly, it can also be shown that (D1 - Ml)aow < > (D - Ml)SID oa. It is concluded that E1 <> E2 - (Case II) Let there be d(i+l,0)-variables el, * —, ek in E. Among these variables some are d(i+l,i)-variables and the rest are d(i,O)-variables. It is first identified the pairs of clauses which can be resolved among SBSM{(C {x,(z, * * *,; )) x SBSM (D {(y,, * *, Vy )). It is done by constructing a set of substitution pairs, denoted by RESOL2, which is a subset of Ac X AD. RESOL2 is constructed from Ac X AD and the wr-mgu aw of (2) by the following rules: (i) A substitution pair <c,SD> E Ac X AD is a member of RESOL2 if for each wr-substitution component t/v E ow it satisfies the condition that

182 (a) if t/v E aw is not a substitution component constituting a wr-subpair in aw, then either Ran(t 6c) C Ran(v D) if t is in C1,1 and v is in D1li7 or Ran(te 6) G Ran(vSc) if t is in D1t71 and v is in C,1E, or (b) if {t/v, t/v2} is a wr-subpair in aw, then either Ran (vS1c) _ Ran(v25s) if v, is in C,(, and v2 is in Dlrl or Ran (vl8D) _ Ran(v26c) if vl is in DlP1 and v2 is in Cll. (ii) No other substitution components other than those identified by (i) are in RESOL2. It follows that RESOL2 is not empty. [The nonemptiness of RESOL2 can be shown in a similar way as the nonemptiness of RESOL was shown. This time, however, in addition to the four kinds of substitution components shown previously in the proof of nonemptiness of RESOL2, cases for the following four kinds of wr-subpairs are also needed to be considered: {t /v,, t /I t, { /v,t /V2}, (t /vlt /v}, and {t /vI, /vI}. ] From the way RESOL2 is constructed, it follows that for each <6c,D> E RESOL2 and for the two clauses L,11 and Ml, of (2) N (L (l6c U Mrlal ) is unifiable, which further implies that C llc E SBSM(Cl(z,l, *, z,l )) and D, 6D E SBSM (D,l(y, *,, )) are resolvable. Now let r, be the index set of IM(Ran(e,)) X *.. X IM(Ran(ek)). For each <S, *, SI> E IM(Ran(e,)) X * * X IM(Ran(ek)), I e r, let X' be a substitution of k substitution components {vj/ex, '*, v/ek} such that Ran(v) = Si, 1 < j < k. Then owX' also unifies N(L,., U Mt,). Here E,Xl is

183 the resolvent of C~,, and D l7, that is generated by using aw X as the mgu, i.e., ElX' =((C- Ll)lw U(D - Ml)law)\ *... (3). In the rest of the proof, it is shown that for each X', I E r, a clause, say E, can be derived such that El 6 RE+ (T) - R (Tr) and ElX' <> E. First, it is shown how Et is derived. Let el, ', e, be the d(i,0)-variables among e, * * *,. Let v-,, v, {,, v, } C {v l,, v}, be the variables such that v,/ei, El', 1 < j < h. Then each element of {el, ' ', ek}- {el, * I e*, A} is a d(i+l,i)-variable. Let RESOLS be a subset of RESOL2 such that if <c, SD > E RESOL, then <S c SD > satisfies the following condition: (i) if e, 1 < _< h, is in Ci, then Ran(CClSc I e,,) = Ran(v, ), or (ii) if e, < h is in D^1,, then Ran (DlrillD I e,) = Ran(v,'). The preceding conditions are used later in showing ElX' <> E2 Let <Sc S, D> E RESOL. Now a substitution 8 is constructed from aw of (2), X' of (3) and the pair of substitutions <c, SD > in the following way: (i) If t/v E aw is not a substitution component constituting a wr-subpair in aw, (a) if t is in C1 i and v is in Dltl,then tSc/vSoD E6, (b) if t is in Dl1nl and v is in C~If, then tS6/vSc E. (ii) If {(tt/, t/v2} is a wr-subpair in aw and both or either of v1 and v2 is a d (i,)-variable, (a) if v1 is in C1a1 and v2 is in Dst1, then either vl6C/V2SD 60 if Ran(viSc) E Ran(V2pD) or v2SD/v1Sc E 8 if Ran(vSDo) _ Ran(vSc),

184 (b) if vl is in D 1,1 and v2 is in C11, then either vS6D/v26c E 0 if Ran(vlSD) E Ran(v2Sc) or V2Sc/vvlD E if Ran(v3c) _ Ran(vl6D). (iii) If {t/vl, t/v2} is a wr-subpair in aw and neither of v, and v2 is a d(i,O)variable [notice that t is a d(i+l,i)-variable], then {t '/vl, t '/v2} C 0 where t ' is a variable such that if v/t E X' where v is some variable then Ran(t ')= Ran(v). (iv) No other substitution component other than those identified by (i), (ii) or (iii) are elements of. In a similar way as was shown in Case I of the inductive step, it can also be shown that 8 unifies N(L LSc U M1r/1D) and 0 is a E-mgu. Let the notation ar be used for 0 to indicate that 8 is now a E-mgu. Let Ed be the resolvent of C1,1Sc and D 1j1SD that is generated by using as as the E-mgu, i.e., E' (C - L )Sec o, U (D - M l)lSDT a. Then from that ClSc6c E SBSM(CiE1(zx,,, )) R{(T). and D1i1,5D E SBSM(Dxy(,, * * *, ', yi )) C R{(Tc), it follows that E' E Rnr+ (T) - R t (T). Now it is shown that ElX' <> E2 holds. (C1 - L 1)lc as and (C1 - Ll)(lwX are considered. It is noticed that (C - Ll)Elw subsumes (C1 - L l)iaw X' since for each v/e E X, Ran (v) ( Ran (e). There are only three kinds variables in (C1 - Ll)aw: d(i,0-variables, d(i+l,i)-variables, and nond (i +l,0)-variables. For each kind of these variables the following holds: (i) For any d(i,)-variable e, E {e,l, * *, e}, if e, is in (C1-L L)aw, then,

185 from the way that RESOL3 is constructed, it follows that Ran ((C1 - L i)ie 6s I e,) = Ran (uv ) = Ran ((C1 - L )aw X | e,) where V, E {v/,*,, v, } is the variable such that v, /e,6 E X (ii) For each d(i+l,i)-variable e E {ei, * *,E e}-{e, *,,^}, if e, is in (C - L )Iaw, then from the way that ar (i.e., 8) is constructed [see (iii) in the constructing stage of ], it follows that Ran ((C1- Ll)lcar I e,) = Ran (v, ) = Ran ((Cl - L l)lowX' I en) where v. E {v,, - v, } is the variable such that v//eI E X' (iii) For each non-d(i+l,0)-variable, say u, if u is in (C -L)law then Ran((Cl -Ll)leca E ) = Ran(u)= Ran((C - L)Ejaw\X I u) since C1~Sc E SBSM(Cl(zi,.*, z')) and uX' = u Therefore (C1 - Ll)(l5c <> (C1 - Ll)lawX'I. Similar arguments can be applied to (DI-M1)lUDarc and (Di -Ml)lawfwX to conclude that (D1- Ml)S6Dar < > (D - MM)aw X'. Consequently, it follows that E1X' <> E. This conclusion is sufficient enough to say that EIX' subsumes E' because for each vie X', Ran(v) G Ran(e). Since the preceding argument has been made for each X', I E r, the following is finally concluded: for each <S, *, S> IM(Ran(el)) X ' X IM (Ran(e )), I E rF, there is a clause Et() E R +' (T) - R (T,) such that (i) E1 subsumes E(), (ii) Ran(Et I ea) = S, 1 < h < k and (iii) for any variable v in El other than ek, * * * Ran(E I\ v) = Ran(v). Hence it follows that there is a SBSM{El{e,, e,}} R {TCOC RT (T^-R:(T). Q.E.D.

186 Example 12.2.2 Consider Example 12.1.1. The case when there is no d(i+l,0)-variable in E1 is considered. Let El be R(f/(zzE),fH(zrE)) E Rw(T) -Rw(T). No d(2,0)variable is in El. Let El be a wr-resolvent of -, Q(zZE,fH(zE)) E Rw(Tr) and Q (zlK,ylE) U R (fr(zr),yE) E R (Ts). Let the two resolvents be C1 and Di, respectively. The variable zrK in D is a d (1,0)variable. There is a SBSM (D,(z )) C R (Tr) such as SBSMI (D( )) = {XQ (D,yE) U R (fF ( ),yyE), Q (zE,yE) U R(fF(zt ),y ))}. Two clauses Q(zXD,fr(z E)), Q (EE,ySE)UR(fF (ZEx),E) E R (Tr) are considered. Let these two clauses be C2 and D2, respectively. It is noticed that D2 E SBSM(D (zK)). A E-resolvent, say E2, of C2 and D2 is E2= R (F (w F ), f(wlSF )). It is clear that E2E R (Tr) - (Tr) and E<> E1. Now the case when there are some d (i+l,0)-variables in E1 is considered. Let El be Q(ztz')E RR(Ts)- Rw(TE). The two variables zr, zE in E1 are d (2,0)-variables. Let El be a wr-resolvent of two clauses - P(zEC), p(Zs) U Q(zrE,zSJ) E Rw(T). Let these two clauses be CI and DI, respectively. The variable z1J in Dl is a d(1,0)-variable. It is seen that there is a SBSM(D 1(zx')) C R (TE) such as SBSM (D 1(z M )) = { P (zB ) U Q (zxB,z~ ), P (zx) U Q (xB,x.G

187 Let C2 be - P(zrC) E R( (To). Let D, D, and DS be P(ZB) U Q(Z_,ZE), P(ZB)U Q (zB, zE), P(ZEB)UQQ(ZB,ZSH) E R '(T), respectively. It is noticed that D2 E SBSM (D (zx )), 1 < i < 3. A E-resolvent is derived from C2 and each D2, 1 < i 3, as follows: if the resolution operator R( ) is used, R(c2, D ) = {Q(W",F), Q (wE, wF)}, R(C2, D 2) = {Q(wE,wcG), Q (wIE,w: )}, R(C2, D ) = {Q(wED,wrIH), Q (wrE,wwH)}. Let E2= U R(C2, D2). It is clear that E2 is a SBSM(E1(xr-, z a)) and 1<1 <3 E2C5 R 2(rT) - R (rT). A corollary follows to Lemma 12.2.3 which assures that the length of the shortest deduction sequence for R( ) is not longer than that of Rw(). Corollary 12.2.4 Given a < OA, >, if n is the smallest non-negative integer for which RW(TE) contains,, then R (Tr) also contains 0. Proof. Let n be the smallest non-negative integer for which R (TE) contains 0. Then 0 is a resolvent of two clauses in RW-'(Tn), say C, D. There are three possible cases: (i) there is no d(n-l,)-variable in either Cl or D, (ii) some d(n-l,0)-variables are in either C1 or D1, and (iii) some d(n-l,0)-variables are in both Cl and D1. Again only the most general case is considered: case (iii). Since

188 I is a resolvent of C1 and D, the following holds: C1 and D1 are singletons and there is a wr-mgu, say aw, unifying N(C 11 U D?1l7). Then it follows that C llrw U DUl lW = - i, where C1laW and Dlrlalu7 are singletons whose respective members are complements. Since 0 does not have any d(n,0)-variables in it, the proof here is similar to the proof of Lemma 12.2.3 which was shown for the case when there is no d(n,O)variable in E. Let zl, - a, x'l and yl, ",,y be the d(n-l,0)-variables in C1 and DI, respectively. By Lemma 12.2.3, there are SBSM(C (z, * *, zxi)), SBSM(D1(yl, y ')) C R-' (T). Let Ac and AD be defined in the same way as Ac and AD was defined in the proof of Lemma 12.2.3. Let <6c, SD> E Ac X AD. CjlE and Dlrl subsume Cl16c and DTlSD, respectively. As was shown in the proof of Lemma 12.2.3, it can be shown that there is a E-mgu, say ar, unifying N(C1~Sc U Drl7SID). From the fact that Cjfj and D I, subsume C 516c and DltloD, respectively, it follows that C11Sc and Dl1t7SD must be singletons. From the facts that C116Sc and Dlt1SD are singletons and that as unifies N(C{CjSc U DjtjISD), it immediately follows that ClilSc ac U D l1n6D a =.R{(Ts)- R -1(Ts) contains [. So does R{(Tr). Q.E.D. From Corollary 12.2.2 and Corollary 12.2.4, the following is concluded:

189 Theorem 12.2.5 Given < OA, T >, if n is the smallest non-negative integer for which Rw(Tr) contains 0 and m is the smallest non-negative integer for which RS(TE) contains J,then n = m Proof. The theorem immediately follows from Corollary 12.2.2 and Corollary 12.2.4. Q.E.D. The preceding result is the conclusion of the firsts stage, i.e., given a many-sorted theory < OA, Tr > the length of the shortest refutation generated by Rw( ) is identical with that generated by Re('). The preceding result is illustrated by an example at the end of this section. The second stage of the comparison of Rw(') and R ( ) is now given. First it is shown that, given a many-sorted theory < OA, Tr >, the number of wrresolutions generated by Rw( ) at each level is smaller or equal to that generated by R-( ') at the same level. Lemma 12.2.6 Given a < OA, T >, for each i 0, Rwl(T- RW ( Tr) < IR+ (TE) - R (Tr Proof. This result is an immediate consequence of Lemma 12.2.3. For each i 2 0, let E1 be a clause in RW+1(T)- RW(Tr). Let there be no d(i+l,0)

190 variable in El. By Lemma 12.2.3, there is a clause E2 E R 1+ (T) - R (T) such that El <> E2. Let there be some d(i+l,0)-variables ee,, ek in E. By Lemma 12.2.3, there is a SBSM(Ei(e,, ek)) _ R:+ (Ts)- R:(TE). It follows that ISBSM(E(el, * —, et))l < IM{(Ran(el))l X *.* X IM(Ran(e))l. Since | IM(Ran(e,)) > 1, 1 < i k, I SBSM(Ei(e, *, ek)) > 1. Since for each E1 E Rw+1(Tr) - R 7(T) there is either a corresponding clause E2 E R.+' (T) - R (TE) such that E, <> E2 or a corresponding set of clauses SBSM(E1(e, **, e )) C R +1 (Ts) - R (TE), it holds that R'+1( TE) - Rw( T) ' < I R s+ (Tc) - R p' (T,) Q.E.D. The overall efficiency issue is concluded by the following theorem: Theorem 12.2.7 Given < OA, TE >, if n is the smallest non-negative integer for which Rw(Tr) and R{ (T) both contain F,then I R1(Tr)I < | R (Ts). Proof. This result immediately follows from Lemma 12.2.6. Q.E.D. The preceding theorem indicates that Rw( ) is more efficient than R ( ). The result of Theorem 12.2.7 is illustrated in the following example.

191 Example 12.2.3 Consider the following many-sorted theory < OA, Te >: OA: (0.1) D C B, D G C, (0.2) E G B, E _ C, (0.3) F r D, F E, (0.4) G E D, G E E (0.5) H D, H ( E, (0.6) 1 C E, Te: (0.7) P(zB) U Q(zm,z ) U R(f|(zr -),z:E)U W(zB), (0.8) _P(zCC), (0.9) -'Q(,gH(2)) (0.10) -R(zE,z, ), (0.11) -' W(( z). Here a pair of numbers (i. j) is used to identify each clause. Each clause preceded by (i. j) is the j'" clause at the i't resolution operation. For example, for i = 0, R{(TE) =- {(0.7), * *, (0.11)}. Same notation is also used to identify the clauses for the E-resolution R * ), i.e., R{(Tr) = {(0.7), * —,(0.11)}. It is shown which of Rw(') and R(' ) is more efficient by generating Rw(TE) and RS(Tr) each of which contains [J. When the members of R "(Tr) and R{(Tr) are appropriately aligned, they are two refutations, one generated by Rw( ) and the other generated by R d'), respectively. Both refutations are so lengthy that their complete sequencies are shown in Appendix C. The results obtained from the two refutations are summarized in the following table that shows the numbers of resolvents generated at each level:

192 Comparison of Rw( ) and R ) No. of Resolvents Generated Leve.l Rw(.). R ) 0 5 5 1 4 7 2 12 29 3 12 24 4 4 5 Total 37 70 The following observations are made from the table: first, in both refutations, ] turns up at the same level, i.e., at level 4 (cf. Theorem 12.2.5); second, at each level, i < i 4, | RW+(T) -Rw(TE) < R1 +(Tr)- R(Ts) (cf. Theorem 12.2.6), and; third I R4(Ts) < I R(TE) I (cf. Theorem 12.2.7). More details about the refutations can be found in Appendix C; for example, what unary predicate symbols are dynamically defined in the refutation by Rw(') and which Eresolvents are indeed useless. 12.3. Conclusions and Future Work First, a problem was identified that might occur in the currently known manysorted resolution, such as the one reported by Walther. This problem can be avoided if new sorts are introduced while the resolution is being carried out. However, doing so is not possible if the theory to be refuted is expressed in an ordinary many-sorted language and the language is not to be revised along the way refutation is carried out.

193 To alleviate such a situation, the language called one-sorted language with aggregate variables (L ), which is obtained by embedding aggregate variables in a one-sorted language, was proposed. A many-sorted theory was then expressed in L' and a new approach, called UWR-resolution, was presented. In this resolution new sorts are dynamically introduced as needed using the aggregate variables. The completeness of this resolution was shown and the efficiency of the resolution was discussed. The preceding approach that has been shown throughout Part II is not the only way of embodying the idea of unifying a pair of variables satisfying a certain condition over the weakest range. Alternative approaches are available. Two alternative approaches are discussed in Appendix D. There are two ways of extending the work discussed so far. One way is to study what extension should be made if the theory to be refuted by the UWRresolution is expressed in the language L' which is obtained by including the "equality symbol" in L. In this case, it is expected that an inference rule, what is often called "paramodulation", must be additionally introduced. The other way is to study the effect of combining with the UWR-resolution various control strategies used in a one-sorted resolution. These strategies include those used in the one-sorted resolution such as lock resolution, semantic resolution, linear resolution, and unit resolution.

CHAPTER XI CONCLUSIONS The implications resulting from the two applications, one in Part I and the other in Part II, are two fold: extending the first-order predicate calculi by embedding aggregate variables in their languages is theoretically sound and the extended calculi are practically useful. These two implications are summarized in this chapter. First, the theoretic soundness of the extended calculi is summarized. The two languages for the first-order predicate calculi, a one-sorted language and a manysorted language, were extended by embedding a new type of syntactic object, called aggregate variables. The aggregate variables are syntactically ordinary sort variables, but semantically they are variables whose ranges are restricted to unary relations instead of sorts. Therefore, whenever aggregate variables are introduced, the sort structure determined a priori remains intact, although the system itself is augmented by new unary relations that will be the aggregate variables range of interpretation, which is the process known as expansion by definitions (e.g., [Shoe67]). This property of the extended predicate calculus is called E-eztensibility. The Eextensibility of L/ and the E-extensibility of LI were shown in the form of theorems. These E-extensibilities of the extended calculi assure that one of the problems of an ordinary many-sorted language, namely, the inflexible usage of sort variables (e.g., [Cohn83]), can now be overcome. 194

195 In the rest, the practical usability of the extended calculi is discussed. The Eextensibilities of the extended calculi implied the flexible usage of aggregate variables (contrasted to the inflexibility of the ordinary sort variables) which led to two applications, one in the distributed database design area and the other in the automatic theorem proving area. Based on these two applications, the practical usability of the extended calculi can be generalized. Before such generalization is made, the significance of the extended calculi in each application is reviewed. The significance of L, in the KBDDBS design is given first. The significance of L, in the KBDDBS is twofold: (i) LE provides more compact expressive power than does L, and, therefore, (ii) to a certain extent, it became possible to develop a simple syntactic matching process as the inference procedure involving specific formulas in 1,. The compact expressive power of Lg over L, was due to the fact that LE permitted the introduction of the aggregate variable whose ranges were restricted to subsets of sort domains, something that could not be done in Lm. This compact expressive power of LE allowed an easy way of endowing dual semanticst to the formulas of Ls a la [Kowa74], which otherwise might not have been possible. The knowledge about the data was able to be expressed in a special form called the E-Horn formula, queries were expressed in the E-normal form, and a syntactic matching process was able to be developed as the inference procedure with which the knowledge of the E-Horn form was applied to the user queries in s-normal form in an inferencing manner. t In [Mylo81l Mylopoulos mentions "An interesting departure from logical representation schemes has been proposed by Kowalski [Kowa74] who argues in favor of a dual semantics for logical formulas of the form Bin B2n'f fN B. -. A. The first is the traditional Tarskian semantics. The second is a procedural semantics which interprets the formula as "If you want to establish A, try to establish B1 and B2and -' and B." ".

196 Now the significance of L/ in the UWR-resolution is discussed. The significance of LI in the UWR-resolution is that LI allows the introduction of the variables ranging over the intersection of two specific sorts determined previously in the middle of refutation. Introducing variables over such a sort could also have been done even if the theory to be refuted is expressed in L^, since all the likely-to-beused sorts can be determined before refutation begins, so the sort structure of the theory can be modified to include the all the likely-to-be-used sorts. The problem here was that unnecessary additional axioms for the theory must be generated for no usage. This problem will not be encountered if a variable ranging over a new sort is dynamically introduced using aggregate variables. Based on these two applications, the practical usabilities of the extended calculi can be generalized to a certain extent. The following situation is considered: (i) a system involving more than one category of objects is axiomatized, (ii) a need arises to introduce a variable ranging over a sort that does not exist in the sort structure determined by the categories of the objects, and (iii) it is not desired to change the sort structure determined a priori. In this situation, the system can be axiomatized based on the proposed extendedded calculi; variables ranging over new sorts can be introduced as needed.

APPENDICES 197

198 APPENDIX A A Relational Database Example An Auto Corporation Database DIVISIONS div# div name head 01AP Buick Patrick 01HQ Finance Joyol 01PP Elextra Shin 02AP Pontiac Lee 02PP Body Rieger 03PP Engina Meltzer 03PP Trans Frege 04AP Frantana Frege 05AP Omnus Gelperin DEALERS d# address d_type 01A Ann Arbor 51 03A Dearborn 30 07A Flint 50 26M Cleveland 20 33B Cleveland 30 48B Rockford 31 55L Flint 51 65B Detroit 20 66L Niles 23 70A Lansing 70

199 ITEMS item# i_name itype A01 ink stationery A02 note pad stationery B47 Eland bus C05 blue 5 paint C06 white 7 paint N11 square 11" nut P02 distributer engine part P03 radiator engine part S01 micro proc. elect. part S02 battery elect. part V01 Astre sedan V03 Camaro sedan W09 Cabriolet van X77 iron 7" plate X89 iron 9" plate SALES div# d# item# OIAP 01A V01 01AP 07A W09 O1PP 55L S01 O1PP 07A P02 02AP 01A B47 02PP 03A P02 03PP 01A P03 03PP 03A S02 04AP 01A V03 05AP 55L V03 05PP 55L S02

200 APPENDIX B An Intermediate L -Version of the Herbrand Theorem An intermediate L -version of the Herbrand theorem is derived. In [KrKr67], the following form of the Herbrand theorem is given as an exercise along with its solution. 3. Refinement of the Uniformity Theorem (for predicate calculus with several types of variables). a) Show that if:321 * *' z, A, where A is quantifier free, is a theorem then there is a sequence (t'), * *, tn(')) (1 < i < p ) of n-tuples of terms of the language of A such that A U * * U A is a theorem, where A, is obtained by replacing zj in A by t('). In the preceding statements the following notions are used: Let L(A) be the language of A t with k types (or sorts in this context) of objects. Then there are k infinite disjoint sets V1,.*, V where the elements of V, 1 < i < k, are called variables of type i of L (A ). Then each variable j, 1 < j < n, belongs to some V, 1 < i < k. Let Term be the set of terms of L(A). Term is divided into k disjoint sets Terml, *, Term. Then each term t,') (1 < i < p, 1 j < n) belongs to some Term,,1 < 1 k. This theorem only states the necessary part of the condition. If the sufficient part of the condition is also combined, the preceding exercise can be rephrased in the following form of a theorem: t By the language of a formula, it is meant the language whose variables are those of L and whose relations and function symbols are those which occur in formula A.

201 Theorem It Let A (z,,,,,) be a quantifier free formula with free variables Z1, *,,. Then:iz *,z, A (X1,,,) is a theorem if and only if there is a sequence (t('), *, t,(')) (1 < i < p) of n-tuples of terms of the language of A such that Al U * * U Ap is a theorem, where A, is obtained by replacing j, < j < n,in A by tj(). The formalism used in the theorem proving is based on the notions of unsatisfiability and refutation rather than upon the notions of validity and proof. The following dual form of the theorem must be derived. Theorem 2 Let A(zx, **,z ) be a quantifier free formula with free variables 2, *,z,. Then Vz * * VAz, A(z1, *,,,) is unsatisfiable if and only if there is a sequence (t('), *,,,(')) (1 < i < p) of n-tuples of terms of the language of A such that A, n * * n A, is unsatisfiable where A, is obtained by replacing zj, 1 < j < n,in A by tj('). Proof. It is sufficient to put A(z:,, * z, = - B(z,, -, z ) where B(zx, * * *,, ) is a quantifier formula with free variables z, * *,,,. Then by Theorem 1 and B(z1,,* * * -A(z:, --,, ) it holds that t Kleene adequately points out what the Herbrand theorem indicates. Quoting Kleene [Klee67], "We may summarize Herbrand's theorem by saying that it reduces the question of the provability of a particular formula with quantifiers (in the first instance, a prenex formula) to the question of the validity (or provability) in the propositional calculus of some one of a countably infinite class of quantifierfree formulas (the Herbrand disjunctions)."

202 txl... 32, - A(z1, *., z2) is a theorem if and only if there is a sequence (t('), * *, t(')) (1 < i < p) of n-tuples of terms of the language of A such that - A U.*. U -, Ap is a theorem. It immediately follows that Vz1 Vz Z,, A(zl * — z,,) is unsatisfiable if and only if there is a sequence (t'), -, t() (1 < i < p) of n-tuples of terms of the language of A such that Aln f nf A, is unsatisfiable. Q.E.D. Now, based on Theorem 9.3.1, the formula A can be expressed in L. That is, the language of A, denoted by L (A), is the language whose variables are those of L and whose relations and function symbols are those that occur in formula A Finally, Theorem 2 can be rephrased in the following form. Theorem 3 Let A(zi, **,z, ) be a quantifier free formula with free variables 21,. *, X,. Then V1 * * Vz, A (x, * * -, z) is unsatisfiable if and only if there is a sequence (t '), * * *, t()), 1 < i < p, of n-tuples of terms of Li (A) such that A n f... n Ap is unsatisfiable where A, is obtained by replacing j, 1 j < n,in A by t(').

203 APPENDIX C Refutations by Rw(') and R (') Two refutations of a given many-sorted theory are shown. One is generated by Rw(') and the other is generated by R '). Consider the following many-sorted theory < OA, Tr > given in Example 12.2.3: OA: (0.1) D C B, D C (0.2) E c B, E _ C (0.3) F D, F E, (0.4) G E D, G E, (0.5) H E D, H E, (0.6) I E, Ts: (0.7) P (z ) U Q (zrB,zxE) U R (fFr (zrB),x E) U W (zxB), (0.8) -P((C), (0.9) - Q(zs,gHO(z)), (0.10) RR(zE,x rD), (0.11) - W(z2). First the refutation of < OA, T > generated by Rw () is shown and then the refutation generated by R (') is shown. Remember both Rw(') and R(.) are level-saturation schemes. Each refutation is an alignment of the resolvents generated at each level, i.e., a sequence. In each sequence, the first column shows the numbering for the deduction sequence. The numbering stops when the first f turns up. The second column contains the identifier of each resolvent. The first digit of each identifier indicates the level at which the resolvent is generated. The third column contains the identifiers of the parent clauses of their corresponding resolvent. The

204 fourth column shows the resolvents. The fifth column is used to show the dynamically introduced sorts if the refutation is the one by Rw( ). If the refutation is the one by R( ' ), then it is used to indicate useless resolvents. [1] The refutation generated by Rw( ') ded. res. parent resolvents note seq. id. clauses 1 (1.1) (0.7),(0.8) Q (z,z R)U R(fF(zK),zs )U W (z) K B C 2 (1.2) (0.7),(0.9) P (yE ) U R (fF (yEE),gH(yE)) U W(z') 3 (1.3) (0.7),(0.10) P(zB) U Q (zB,z') U W(z TB) J D n E 4 (1.4) (0.7),(0.11) P(z E) U Q (z,zE3 U R(fF(xz),z() 5 (2.1) (11),(0.9) R (f'(y ), z) U W(y=) 6 (2.2) (1.1),(0.10) Q(zxK,zx')U W(zK) K, J 7 (2.3) (1.),(0.11) Q(z,zE)UR(fF(z),z) 8 (2.4) (1.2),(0.8) same as (2.1) 9 (2.5) (1.2),(0.10) P(yE) U W(zEE) 10 (2.6) (1.2),(0.11) P(y') U R(f (y ),gH(yr )) 11 (2.7) (1.3),(0.8) same as (2.2) 12 (2.8) (1.3),(0.9) same as (2.5) 13 (2.9) (1.3),(0.11) P ( ) U Q (z",z") 14 (2.10) (1.4),(0.8) same as (2.3) 15 (2.11) (1.4),(0.9) same as (2.6) 16 (2.12) (1.4),(0.10) same as (2.9) 17 (3.1) (2.1),(0.10) W(yrE) 18 (3.2) (2.1),(0.11) R (fF(yr ),9H( yE)) 19 (3.3) (2.2),(0.9) same as (3.1) 20 (3.4) (2.2),(0.11) Q (z2,z') J 21 (3.5) (2.3),(0.9) same as (3.2) 22 (3.6) (2.3),(0.10) same as (3.4) 23 (3.7) (2.5),(0.8) same as (3.3) 24 (3.8) (2.5),(0.11) P(VJ) 25 (3.9) (2.6),(0.8) same as (3.2) 26 (3.10) (2.6),(0.10) same as (3.8) 27 (3.11) (2.9),(0.8) same as (3.4) 28 (3.12) (2.9),(0.9) same as (3.8) 29 (4.1) (3.1),(0.11) (4.2) (3.2),(0.10) 0 (4.3) (3.4),(0.9) C (4.4) (3.8),(0.8) U

205 [2] The refutation generated by R( ) ded. res. parent resolvents note seq. id. clauses 1 (1.la) (0.7),(0.8) Q (yD,xz ) u R (f (yr ),z ) u W(y-D) useless 2 (1.lb) (0.7),(0.8) Q (yE,E) U R (f(y'E),rE) U W(yrE) 3 (1.2) (0.7),(0.9) P(yv ) U R (F(y E),gH(y )) U W(ZE) 4 (1.3a) (0.7),(0.10) P(z B)U Q (zB,zF) U W(zEB) useless 5 (1.3b) (0.7),(0.10) P(z~B)U Q (ZB,zc) U W(zB) useless 6 (1.3c) (0.7),(0.10) P(z ) U Q (zB z H) U W(Z ) 7 (1.4) (0.7),(0.11) P(zxu) U Q(ZzSr) U R(fF(zU),zsE) 8 (2.1a) (1.la),(0.9) R(fF'(yV'), g (yF)) U W(yu ) useless 9 (2.1b) (1.la),(0.9) R(fF(y C),g H(yC)) U W(yEG) useless 10 (2.1c) (1.la),(0.9) R(fF(y U),gH (y )) W y(vH) useless 11 (2.1d) (1.lb),(0.9) R(f (y ), g H (y rE)) U W(yVE) 12 (2.2a) (1.la),(0.10) Q (y,z) U V(y D) useless 13 (2.2b) (l.la),(0.10) Q(y,z'Gc) W(yDW ) useless 14 (2.2c) (1.la),(0.10) Q(yDr,Z H)U W(yrD) useless 15 (2.2d) (1.lb),(0.10) Q (yrxz ) U W(yE) useless 16 (2.2e) (1.lb),(0.10) Q(y,zG) u W(yrE) useless 17 (2.2f) (1.lb),(0.10) Q(yE,zr) U W(ysE) (l.la),(0.11) not resolvable 18 (2.3) (l.lb),(0.11) Q (zr,z ) U R (f (z ),z ) 19 (2.4) (1.2),(0.9) same as (2.1d) 20 (2.5) (1.2),(0.10) P(yr) U W(zr) 21 (2.6) (1.2),(0.11) P(yr ) U R (f(y ),g ( )) 22 (2.7a) (1.3a),(0.8) same as (2.2a) 23 (2.7b) (1.3a),(0.8) same as (2.2b) 24 (2.7c) (1.3b),(0.8) same as (2.2c) 25 (2.7d) (1.3b),(0.8) same as (2.2d) 26 (2.7e) (1.3c),(0.8) same as (2.2e) 27 (2.7f) (1.3c),(0.8) same as (2.2f) (1.3a),(0.9) not resolvable (1.3b),(0.9) not resolvable 28 (2.8) (1.3c),(0.9) same as (2.5) 29 (2.9a) (1.3a),(0.11) P(za) U Q(z r,z") useless 30 (2.9b) (1.3b),(0.11) P(z ) U Q (zx,zc) useless 31 (2.9c) (1.3c),(0.11) P(a)U Q(z J,zrH) 32 (2.10) (1.4),(0.8) same as (2.3) 33 (2.11) (1.4),(0.9) same as (2.6) 34 (2.12a) (1.4),(0.10) same as (2.9a) 35 (2.12b) (1.4),(0.10) same as (2.9a) 38 (2.12c) (1.4),(0.10) same as (2.9a)

206 37 (3.1a) (2.1a),(0.10) W(y") useless 38 (3.1b) (2.1b),(0.10) W(yEC) useless 39 (3.1c) (2.1c),(0.10) W(yr ) useless 40 (3.1d) (2.1d),(0.10) WV(yE ) (2.1a),(O.11) not resolvable (2.1b),(0.11) not resolvable (2.1c),(0.11) not resolvable 41 (3.2) (2.1d),(0.11) R (fF (y ), gH (y )) (2.2a),(0.9) not resolvable (2.2b),(0.9) not resolvable 42 (3.3a) (2.2c),(0.9) same as (3.1a) 43 (3.3b) (2.2c),(0.9) same as (3.1b) 44 (3.3c) (2.2c),(0.9) same as (3.1c) (2.2d),(0.9) not resolvable (2.2e),(0.9) not resolvable 45 (3.3d) (2.2f),(0.9) same as (3.1d) (2.2a),(0.11) not resolvable (2.2b),(0.11) not resolvable (2.2c),(0.11) not resolvable 46 (3.4a) (2.2d),(0.11) Q (xr,zx') useless 47 (3.4b) (2.2e),(0.11) Q (zxr,zTC) useless 48 (3.4c) (2.2f),(0.11) Q (z,zI) 49 (3.5) (2.3),(0.9) same as (3.2d) 50 (3.6a) (2.3),(0.10) same as (3.4a) 51 (3.6b) (2.3),(0.10) same as (3.4b) 52 (3.6c) (2.3),(0.10) same as (3.4c) 53 (3.7) (2.5),(0.8) same as (3.1d) 54 (3.8) (2.5),(0.11) (yr ) 55 (3.9) (2.6),(0.8) same as (3.2) 56 (3.10) (2.6),(0.10) same as (3.8) 57 (3.11a) (2.9a),(0.8) same as (3.4a) 58 (3.11b) (2.9b),(0.8) same as (3.4b) 59 (3.11c) (2.9c),(0.8) same as (3.4c) (2.9a),(0.9) not resolvable (2.9b),(0.9) not resolvable 60 (3.12) (2.9c),(0.9) same as (3.8) (3.1a),(0.11) not resolvable (3.1b),(0.11) not resolvable (3.1c),(0.11) not resolvable 61 (4.1) (3.1d),(0.11) I (4.2a) (3.2),(0.10) 0 (3.3a),(0.10) not resolvable (3.3b),(0.10) not resolvable (3.3c),(0.10) not resolvable

207 (4.2b) (3.3d),(0.11) 0 (3.4a),(0.9) not resolvable (3.4b),(0.9) not resolvable (4.3) (3.4c),(0.9) 0 (4.4) (3.8),(0.8) U

208 APPENDIX D Alternative Approaches of Rw( ) In Part II, it has been shown how a pair of variables satisfying a certain condition can be unified over the weakest range in a E-extended L. Such idea of unifying a pair of variables satisfying a certain condition can also be embodied by alternative approaches. They include: (i) an approach in which the theory in a many-sorted language Lm is repeatedly translated into a revised language L, of Lm along the way the refutation of the theory is carried out, and (ii) an approach in which the theory to be refuted is expressed in a generalized version (LO) of an ordinary manysorted language (L, ) whose variable sets and constant sets are not necessarily disjoint. These two alternative approaches are described in this appendix. D.1. Refutation in a Revised Language L, of Lm The first alternative approach is described. A many-sorted theory TO in a many-sorted language L is considered. Let T, be the theory to be refuted. Let two clauses o, b~0 E T, contain two variables v, and Vi, respectively, and let these two clauses be resolvable if v, and vj are unified. When I IM(Ran (v,)) nIM(Ran (vi)) > 1, if the two clauses were expressed in L,, L, can be extended to unify the two variables v, and vj over the weakest possible range, i.e., the intersection of the ranges of v, and vj. When the two clauses are expressed in Lm, however, unifying v, and vj over the weakest range is not allowed, although it becomes possible if the two clauses b~, and ^~ are translated into a new language in which a variable ranging over the intersection of the ranges of

209 v, and vj can be introduced. That is, L can be revised into another manysorted language L^I that is identical with L~^ except that in L~ an additional variable set exists whose members range over the sort identical with the intersection of the ranges of vi and vj. Then T can be translated into T I in L^I including the translations of Lo and ~o into L I, say ~1 and I, respectively. A resolvent, say,1, of p 1 and t' can be derived as a clause in L. Let the overall process described so far be abbreviated by Tl2[Tmo I L 'I I.) where T rlT,,'2 L 'J means T ~ in L ~ is translated into T 1 by using the revised language L 1 of L ~ and the symbol " i —)" means the deduction process that derives 1 in L I as a resolvent of a pair of member of T, [T~ I L 11 (specifically, in the preceding example the pair would be O' and ^). The preceding example describes only a snapshot of the continuous process that is employed in this alternative approach. For example, ~I and tP1 can also be resolved in a way similar to the one used in resolving tpO and bo and if this is done, then resolving i ~ and bo means that a deduction process such as Tm2[;l | L; 2 1), 2 immediately follows the previous deduction process T I[Tm~ L\ — ) t1 ^, where the symbol ";" is used to mean "in addition to". In summary, in this alternative approach (i) a theory and a logical consequence of the theory, i.e., a resolvent of a pair of clauses in the theory, are translated into a new language, (ii) another logical consequence is derived from the theory and the logical consequence in the new language, and (iii) the processes (i) and (ii) are repeated one after another until the

210 intended logical consequence ] is derived. In general, the refutation obtained in this approach can be viewed as a sequence of deduction processes of the following form: Tn \LT\I L. ' TmP -1lT P -2;pP -2 | p-ll P - The preceding sequence of deduction processes makes it clear that in this alternative approach the inconsistency of the theory TO is not proved in L^t but in LP after various stages of theory translations in newly revised languages have been made. It must be justified whether the inconsistency proof of TP made in LP can be carried over to the inconsistency proof of the original theory T~ in L^,. Since refuting a theory by this alternative approach entails a series of theory translation in a new language and derivation of a logical consequence from the translated theory, the following theorem can be shown first: Theorem D.1 If T' [T-1;'l L4 IJ ) I,',X i >, then there is a translation of &' into Lt, denoted by 0'1[" I L-1]j, and for the translation i\ ' I\ L,^1 there is a proof procedure" I -e) "such that T.I1, I where I L Im. where +~ I>.

211 In the preceding theorem, the existence of" R?) " means that ';[l' | L^-'l is a logical consequence of T^-1;$'-' although Il[ft I L`-1J is obtained indirectly via "L n-() A and the translation of ' into L^~1. The following theorem must further be shown: Theorem D.2 When TT^['; L\, i > 0, if ' is I then [l I L'-l is also ]. By combining Theorem D.1 with the preceding theorem, it is implied that even if F is derived from T^[T'-,;1I-'IL'] in L P, for some p > 0, ] is also a logical consequence of T. D.2. Refutation in a generalized version LI of Lm In this approach, the theory to be refuted is expressed in a many-sorted language (L9) that is a more general version than an ordinary many-sorted language (L^) such as mentioned in Section 9.3 or given in [Ende72, KrKr67]. L, is more general than L, in the sense that in L, neither its variable sets nor its constant sets need to be disjoint. The language LI is first defined. A many-sorted language LI with aort index set I consists of the following: (1) I I infinite sets V1,, V II (not necessarily disjoint) where the elements of V, 1 < i < 11, are called variables of sort i; (2) 11 sets C1,, C1lI where the elements of C' 1 < i < I, are called constant symbols of sort i such that

212 C'n... n C' f, {i,**,in }CI, if and only if V' n. n V' F; (3) for each n > 0 and each n-tuple <i,.., in >, {i,..., in } C I, a set R<'' '" > whose elements are called relational symbols of sort <i, *, i >; (4) for each n > 0 and each n +1-tuple <il, '*, i, i+,>, {il, *, i,, i,+} C I, a set F< ' g'"'"+l> whose elements are called function symbols of sort <i,, *, i,, i,+>; (5) logical connectives -, and -; and (6) a universal quantifier V. Definable symbols U, n, and 3 are introduced in Li as usual and the syntax rule of L, is also given as usual. The interpretation of the formulas of L, with sort index set I is given as follows. Let MS(L,) stand for a many-sorted structure associated with L. MS(L9) consists of: (1) I[ I nonempty sets of objects SI, *, Si where 5, is called the domain of sort i of MS(L9) such that S, n s, S{ix,', i, } C, if and only if V'l" fn V' n V f; (2) for each constant symbol cE C' n '- n C, {i, * *, i,,} C I, an element cMS E S, n *... n Sf; (3) for each predicate symbol P of sort <il -, i,,>, a relation puM C Si x * * X 5,; (4) for each function symbol f of sort <il, i,, in+l,>, a function fS: S x X... XSi, -. A variable assignment function 8 is given as follows: If V =U V' where sEl V' is a variable set of L,, then a is an assignment function, e: V - U 5, such that for a variable z, E V'l n * n V", {i,,* * *, i, s(z,) = a, where a E6 S, n * * * n S5. Assignment function for the terms of L, is defined as usual. The validity of each formula in MS(L}) is determined as usual.

213 As long as L, is a more general version than Lm, it trivially follows that any formula in Lm can be expressed in LI. However, the converse must be shown to assure that L9 is as legitimate as LM. The converse is shown in Appendix E. It is shown how the second alternative approach can be carried out. Let a theory To in a one-sorted language (L0) be equivalently expressed as a many-sorted theory, say T., in an ordinary many-sorted language Lm with sort index I. Let the language for Tm be L,(Tm). Let S, and Si, i, E I, be two unary predicate symbols in L, which are defined correspondingly to sort i and j of Lm (T ). An inflexible usage of sort variables of Lm is displayed when another formula in L,, say a logical consequence 60 of T, 0o -v ( S() S,(z) -* t(z)) (D.1) needs to be further abbreviated in Lm (T). The syntax of Lm does not allow the one-sorted expression.0 to be abbreviated into a many-sorted expression that is compact enough to carry out the idea behind Rw ( '), for instance, as compact as the form (D.2) below, unless Lm (T ) is appropriately revised to do so. Let L (Tm) be the LJ defined to be equivalent to Lm (Tm), i.e., L,(Tm ) is identical with Lm (T.) except that in L I(T,( ) its variable sets and its constant sets do not need to be disjoint. Let a variable z,, belong to sorts i and j, i.e., zj VE and z,j E VI where V; and VgJ are variable sets of sort i and j in Lig(Tm). Then >0 in (D.1) can be abbreviated into the form iVn L.ixjdi ) (D.2) in L (T, ). It is noticed that abbreviating the one-sorted expression of the form

214 (D.1) into a many-sorted expression of the form (D.2) is the only type of abbreviation needed when embodying the idea behind Rw( '). Therefore an alternative approach of Rw ( ' ) is obtained by expressing the theory to be refuted in Lg.

215 APPENDIX E Translation of a Formula in LO into L. It is shown that any formula in the many-sorted language L, that was defined in Appendix D can be translated in an ordinary many-sorted language Lm. Showing this implies that LI is as legitimate as L, which is commonly given in various literature such as [KrKr67] and [Ende72]. Given a LI with sort index set I, its corresponding ordinary many-sorted L, is constructed. A few preliminary steps are given first. Let V = { V' i E I} where V' is a variable set of L,. A set V, of disjoint variable sets is derived from V as follows: for each k E, if Uk =- {V, U V - Vk} then fEl VO = mT Uk t. Let I, be the index set for Vo so that each element of Vo is k EI expressed by VO, k E I,. A relationship holds between V and V,: If e be a function: I -. N+ where N+ is the positive integer set, then for each Vi E V there exist uniquely ((i) variable sets in V, such that V' = VlU *~ U V'l"), {jl, ' * *, j ()} C I0 In the preceding relationship, {j1, ~*, jf,)} C I, is said to be the i's, i E I, corresponding index subset, denoted by CI(i), of I. The ordinary many-sorted language L. with the sort index set I, that corresponds to the LI then consists of: (1) II I infinite disjoint variable sets V, I, V,* * *, such that {V': i E I,} = V,; (2) 1 I, I disjoint constant sets t The definition of " [T " was given at Section 7.3 as follows: For two partitions n1 and n2 of a set, n m fn2 = (S:S = B, n B where B, E nIll d B E n2, and S # ) }. Since the commutativity and the associativity hold for [, let nl m ~ ~ ~ nI be written by m 1.' iE({l, -,n)

216 C, ', Col0'I such that to each C', i E I,of L if Cl(i)= {j1,, je,)} then Ci'= ClU *.* U C0o'); (3) to each predicate symbol P of sort <i-, * * *, i,>, {i, i **, i,} C I, of L its corresponding ((i,)x * * x ((i,) different relation symbols, whose collection is denoted by CP (P), such that for each k, 1 < k < ((il) X * x ((i,), Pk E CP(P) is a relation symbol of sort < it,, i,,> E CI(i) X * X CI(i.), and; (4) to each function symbol f of sort <il, *, in >, i,,i, of LI its corresponding ((i ) x *.. x (i.+l) different function symbols, whose collection is denoted by CF(f ), such that for each k, 1 < k < {(i,) X * * X ~(i,+1), fA E CF(f ) is a function symbol of sort <it,, i, it+ > E CI(i ) X * - X Cl(i,+). A few notation are introduced. Let TERM(L9) and TERM(L.) be the sets of all the terms of L and Lm, respectively. For each t E TERM(Lg) its corresponding terms in TERM(L,,), whose collection is denoted by CT(t), is defined inductively as follows: (i) if t is a variable z E V', i E I, and Cl(i) = {j,, * j)}, then CT(t) = {z * * *, z } where for each k E {jl,,j,), z E V; (ii) if t is a constant c of sort i, i E I, and Cl(i) = {jl,, je )}, then CT(t) = {c~,*, c } where for each k E {jl,,,* * *, c,) E C; (iii) if t is a term of the form (t,,, n) where f is an n-place function symbol of sort <i, * * *, i,, i,+>, {ii, *,i, i,+}I, then CT(t) ={f (ti, *., t ) f0) E CF(f ) and t E CT(t,),l j < n}. Now let the terms of a formula, say p, be defined as follows: (i) if P(t,l,..., t,) is an atomic subformula of ~b where P is an m-place relation symbol and,l, * * *, t, are terms, then t, * * *, tm are terms of t, and (ii) no

217 terms other than those identified by (i) are terms of s. When it is convenient, V; is expressed by ([l,, t,] if tl,, t, are all the terms of qt in the order of their appearance in q [notice that there can be duplicate terms among tl,. **,t ]* It is shown how a formula in LI with the sort index set I is translated into the Lr with the sort index set I, which was constructed correspondingly to the L. Let ar be a formula in L, with the sort index set I. Let a, be of the form,,[itl,.**,tl. (b.l) Let k = CT(t1) X.** X I CT(t) I. The translation of am into am in Lm with the sort index I, is then. = UU,[tL(j),..., t.o (j)M (b.2) (b.2) where for each j, 1 < j < k, a[tf(j), t., (j)] is constructed in the following way: (i) ti, * *, t of aU are replaced by tL(j), *, tL(j), respectively, where < (j )., tf(j )> is the jt element of CT(t) X X. X CT(t); (ii) if P(tul,. *,,), {u,, u, }C{, (1, l, is an atomic subformula of an such that by the step (i) the terms tU, *, tu, are replaced by tU (j), *~,t (j), respectively, then P is replaced by P~ E CP(P) of sort <jl, ji,, {jr j,, j EI0, where each j, 1 < h < r, is the sort to which t~ (j) belongs, and; (iii) if Qz where Q is either V or a is a quantifier appeared in ar and by step (ii) the variable z appeared in tL,, t of am is replaced by z, then Qz is replaced by Qx~ in a.

218 As far as semantics for the formula a, of (b.2) is concerned, the structure for Lm, say AS~ (Lm), can be constructed from the many-sorted structure for Lg, say MS(L,). Let MS(L{) be a quadratuple MS(L) < {S, }-I, R, F C > where I is the sort index set, {S, },El, the sorts of MS(L), R, the relation set, F, the function set and C, the constant set. Then MS~(Lm) consists of: (1) I I I nonempty disjoint sets of objects So, *, Soi I such that for each S, i e I, if Cl(i)= {, * *, j -)}, then S, = S,; U.. S(; (2) for each constant symbol c of sort i, it E I, an element c MS(Lm) such that MSo (LI) MS (L,) MS(L, ) c S(L= cS(, where c ( E C; (3) for each predicate symbol P{ in L of sort <it, * * ink> {it * * *i, } C I, if P{ E CP(P), then the relation sort, i I, k, MS (L) L MS (L )MS(LI) p ~n (s.t * x...) ^ where P ME R; (4) for each function symbol f~ in Lm of sort <i, *, i >, {i,, i } C I, if s<i,,,so_, f E CF(f), then the function ft:SI' X *** S, -* S such that a ** *, ) **(*, ),where (^) F. Finally, the following theorem is shown for the translation of the formula a, of (b.1) into the formula a ~ of (b.2): Theorem B.1 Asentence am in L is true in MS (L) iff oa in L, is true in MS~(Lm). Proof. Proof is by induction on the length of a. First, proof is given for when ar is atomic. Let am be an atomic formula of the form R(t1,..., t,) where R is an n-place relation symbol of sort <il, * * *, > {il * *, in} E I, and

219 ti's are terms. Let am be true in MS(Ly) with an assignment function 8. The transi-aton oC a-rm nto a, 'n L, Is the foYowing: ==R (ti (1)(, * * *, t(1)) U * * * U R (t ( ), * * *, t:(k)) where k = I CT(t)l X. X I CT(t,)I and for each j, 1< j <, R;E CP(R) and <t (j),*, t,(j)> E CT(t) X — X CT(t,). Along with the preceding translation, an assignment function 8~ for the variables of Lm is introduced as follows: 8~ is an assignment function 8~: U VO - U S. such that if z in am is replaced by z~ during the transla'El, Eel, tion of Ur into r, then (z) = 8 ~ ( ~ (the assignment function 8s defined here is used throughout this proof]. For notational convenience the symbol 8~ is also used for the assignment for terms. It can be trivially shown that for a unique j, 1< j k, <8(t), * * * (t, )> = <8 (t[(j)),. *, * ~(tn(j))> * * * (1). Let L be the index set for CP (R) so that each member of CP (R) is expressed by RI, I E L. Then from the way that MS' (L,) is defined, it follows that RMS(L, ) U RIMS (Lm) (2). R ^UR, **...(2). IEL From the way a is constructed, it follows that U R S(L'= U R MS(L, (3) IEL 1< < k where each R 0, 1 < j < k, is an atomic subformula in a'. From (1) the following also holds: For each h, 1 < j _ k, if for some ji 1< ji' < k,

220 < ~ (t h)), ' '. ~(t~ j))> E RjMs'(L') then for any j, i' j and 1 < j < k <80(t (JA}*))^\> R^ ~ Ms~(LM "(4) < ~(t~ ( j, )), ' ' ', 8~ (t(jA ))> f R MS~(,) Consequently, the following holds: j== R (t,, t,) [18 MS(LY) 's(L(L <=> <8(tl), —, 8(t, )> R () <=> forsome j, 1<j_<k, <s~(t (j)),,, {(t,(j))> E R s (Ln) from (1) <=> for some j, 1< j < k, < s (t{ j)),.., 8~,(t(j ))> E U RM (m) from (2) IEL <=> for some j, l<j <k, <8s'{t(j)), -. e~(t( U))> U. R Ms~(Lm) from (3) <=> U { forsome, 1< j <k, < (t ( j )) * *,~ (t(j ))> E R, ' } 1<1 <k <=> U {<s~(t(j)),, 8~(tO(j))> E R, } from (4) <=> U { RI(t (j), *, t ~'(j)) [8~ } t i (Lm) <<tt It is concluded that when a~ is atomic, I= iff = a. MS(L, ) MS (L, ) Suppose that the result is true for all formulas of length less than or equal to h. It is shown that the inductive step holds for the formulas of length h +1. Let ar,l and aU,2 be formulas of length h, and for a,l and a,2 it holds that af f if ' iff ^ and = 2 iff Inductive MS (L ) MSo (L) M (L) MS() step is the following:

221 (Case I) Let ar be - ar,l. It is trivial to show that I — v\~\ <=>, = —. l'I MS(L,) MS (L) (Case II) Let am be a, 1* or,2. It is trivial to show that == m [8J <=> =- o r2- r,2[31. MS(L~ ) MS' (L) (Case III) Let aU be Vxz,a,, where z, E V. The followings holds: If Cl(i) =i {i *, i,()}, then S, =, U * * * U S, (5) and for any i,,, E (i, * * * i)} s, n s, =<i (6) Let t, ***, t, be the terms of ra,, and let a<,r = a)U,i[t'(j),*..,,) where k = I CT(t)I x.- x I CT(t,)l. From the induction hypothesis, it follows that: for each j, 1 j k, if z,~(j) E CT(zi), z,0(j)E V', it E6 {i, *, i(,)}, and a(j) ES,, then.. [, |(~. I a)] <=>;,, 0o,_<,_ I-';,,~.~ Ia ld" U, I [s~(, 1 ~(i)lIa(i))l *-. (7). MS(L ) )MSO(L) )1<<k Let rY be a function r7: {1, * *, k}-{i. * *, i,)} such that for each a~(j), 1 < j < k, in (8), if a (j ) E S, then,n(j)-=. Finally, the following holds:

222 am l I <=> for ya Em,,.[[ MMS(L ) 'MS (LI) )l <=> for any a 6 S, ( = mi a, 1(X, I a )1 <=> for any a E S, U U 5 = [ (. a) MS(L, ) from (5) <=> U (for any a'(j) E ) ISs(L) = ~ ([(J) I a (j))J} 1IJ <k MS (Lm) from (6) and (7) <=> U { V( —).[~} 1< <t MS S(Lm ) <=> = U v,~(j),,l l[0~l. MSO(Lm) 1<j<tk Since U Vz,~(j),i is a, it follows that i s(, iff ( m Q 1<j k MS(L) M (L,) Q.E.D.

BIBLIOGRAPHY 223

224 BIBLIOGRAPHY [AnB170] Anderson, R. and Bledsoe, W. W., "A linear format for resolution with merging and a new technique for establishing completeness," J. ACM, Vol. 17, pp. 525-534, 1970. [AnBo73] Anderson, J. and Bower, G., Human Associative Memory, Winston, Washington D.C., 1973. [Andr81] Andrew, P. B., "Theorem proving via general matings," J. A CM, Vol. 28, No. 2, pp. 193-214, April 1981. [Aper81] Apers, P. M. G., "Redundant allocation of relations in a communication network," Proc. Berkeley Workahop on Distributed Data Management and Computer Networks, Lawrence Berkeley Lab., The University of California, Berkeley, 1981. [BaFe81] Barr, A. and Feigenbaum, E. A., The Handbook of Artificial Intelligence, Vol. 1, 2, and 3, William Kaufmann, Inc., Los Altos, CA, 1981. [Bern76] Bernstein, P. A., "Synthesizing third normal form relations from functional dependencies," ACM Transactions on Database Systems, Vol. 1, No. 4, Dec. 1976. [Bern81] Bernstein, P. A. et. al., "Query processing in a system for distributed databases(SDD-1)," ACM Transactions on Database Systems, Vol. 6, No. 4, pp. 602-625, March 1981. [Bibe81] Bibel, W., "On matrices with connections," J. ACM, Vol. 28, No. 4, pp. 633-645, Oct. 1981. [BoWi77] Bobrow, D. G. and Winograd, T., "An overview of KRL, a knowledge representation language," Cognitive Science, Vol. 1, pp. 3-36, 1977. [Boye71] Boyer, R. S., "Locking: A restriction of resolution," Ph. D. Dissertation, The University of Texas, Austin, 1971. [Brac78] Brachman, R. J., "A structural paradigm for representing knowledge," Rep. No. 3605, Bolt Beranek and Newman, Inc., Cambridge, MA, 1978. [BuFe78] Buchanan, B. G. and Feigenbaum. E. A., "DENDRAL and MetaDENDRAL: Their applications dimension," Artificial Intelligence, Vol. 11, pp. 5-24, 1978.

225 [BuHa79] Buckles, B. P. and Hardin, D. M., "Partitioning and allocation of logical resources in a distributed computing environment," Distributed System Design, Tutorial IEEE Catalog No. EH0261-1, 1979. [Carb70] Carbonell, J. R., "AI in CAI: An artificial intelligence approach to computer-assisted instruction," IEEE Transactions on Man-Machine Systema, Vol. MMS-11, pp. 190-202, 1970. [Case72] Casey, R. G., "Allocation of copies of a file in an information network," Proc. AFIPS Spring Joint Comput. Conf., Vol. 40, AFIPS Press, Arlington, VA, pp. 617-625, 1972. [CeNW83] Ceri, S., Navathe, S. and Wiederhold, G., "Distributed design of logical database schemas," IEEE Transactions on Software Engineering, Vol. SE-9, No. 4, pp. 487-504, July 1983. [Cham78] Champeaux, D. de, "A theorem prover dating a semantic network," Proc. AISB/GI Conf., Hamburg, West Germany, 1978. [Chan70] Chang, C. L., "The unit proof and the input proof in theorem proving," J. ACM, Vol. 17, No. 4, pp. 698-707, Oct. 1970. [Chan76] Chang, C. L., "DEDUCE - A deductive query language for relational data bases," Pattern Recognition and AI, Academic Press, New York, pp. 108-134, 1976. [Chan78] Chang, C. L., "DEDUCE2: Further investigations of deduction in relational database," Logic and Databases (H. Gallaire and J. Minker, eds), Plenum Press, New York, pp. 201-236, 1978. [Chen76] Chen, P., "The entity-relationship model -- toward a unified view of data," ACM Transactions on Database Systems, Vol. 1, No. 1, pp. 9-36, March 1976. [Chu 69] Chu, W. W., "Optimal file allocation in a multiple computer network," IEEE Transactions on Computer, Vol. C-18, No. 10, pp. 885-889, Oct. 1969. [Chu 73] Chu, W. W., "Optimal file allocation in a computer network," Computer-Communication Network (N. Abramson and F. Kuc, eds), Prentice-Hall, Inc., Englewood Cliffs, NJ, 1973. [Chu 76] Chu, W. W., "Performance of file directory systems for databases in star and distributed networks," Proc. AFIPS Conf. NCC, Vol. 45, AFIPS Press, Montvale, NJ, pp. 577-587, 1976.

226 [Chun83] Chung, C-W., A Query Optimization for Distributed Database Systems, Ph. D. Dissertation, The University of Michigan, Ann Arbor, MI, lc83. [Codd70] Codd, E. F., "A relational model of data for large shared data banks," Comm. ACM, Vol. 13, June 1970. [Codd72a] Codd, E. F., "Further normalization of the database relational model," Database System (R. Rustin, ed), Prentice-Hall, Inc., Englewood Cliffs, NJ, pp. 33-64, 1972. [Codd72b] Codd, E. F., "Relational completeness of database sublanguages," Database System (R. Rustin, ed), Prentice-Hall, Inc., Englewood Cliffs, NJ, pp. 65-98, 1972. [Codd78] Codd, E. F., "How about recently?," Databases: Improving usability and responsiveness (B. Shneiderman, ed), Academic Press, New York, pp. 3 -28, 1978. [Codd79] Codd, E. F., "Extending the database relational model to capture more meaning," ACM Transactions on Database Systems, Vol. 4, No. 4, pp. 397-434, 1979. [CoGP81] Coffman, E. G. Jr., Gelenbe, E. and Plateau, B., "Optimization of the number of copies in a distributed data base," IEEE Transactions on Software Engineering, Vol. SE-7, No. 1, Jan. 1981. [Cohn83] Cohn, A. G., "Improving the expressiveness of many sorted logic," Proc. National Conf. on Artificial Intelligence, pp. 84-87, 1983. [DaPu60] Davis, M and Putnam, H., "A computing procedure for quantification theory," J. ACM, Vol. 7, pp. 201-215, March 1960. [Date83] Date, C. J., An Introduction to Database Systems, Vol. 2, AddisonWesley, Reading, MA, 1983. [Davi72] Davis, D. J. M., "POPLER: A POP-2," Rep. No. MIP-89, School of Al, University of Edinburgh, Scotland, 1972. [Delo78] Delobel, C., "Normalization and hierarchical dependencies in the relational data model," ACM Transactions on Database Systems, Vol. 3, No. 3, pp. 201-222, Sept. 1978. [DePa82] De Bra, P. and Paredaens, J., "Horizontal decompositions and their impact on query solving," SIGMOD Record, Vol. 13, No. 1, pp. 46-50, Sept. 1982.

227 [DoFo82] Dowdy, L. W. and Foster, D. V., "Comparative models of the file assignment problem," Computing Surveys, Vol. 14, No. 2, pp. 287-313, June 1982. [DuHN76] Duda, R. O., Hart, P. E. and Nilsson, N. J., "Subjective Bayesian methods for rule-based inference systems," Proc. National Computer Conf. (AFIPS Conf. Proc.), Vol. 45, pp. 1075-1082, 1976. [Ende72] Enderton, H. B., A Mathematical Introduction to Logic, Academic Press, New York, 1972. [EpSW78] Epstein, R., Stonebraker, M. and Wong, E., "Distributed query processing in a relational database systems," Proc. A CM SIGMOD Int. Conf. on Management of Data, pp. 169-180, June 1978. [Eswa74] Eswaren, K. P., "Placement of records in a file and file allocation in a computer network," Proc. IFIP Congress (Information Processing 74), North-Holland, Amsterdam, 1974. [FiHN72] Fikes, R. E., Hart, P., and Nilsson, N. J., "Learning and executing generalized robot plans," Artificial Intelligence, Vol. 3, pp. 251-288, 1972. [FiHo80] Fisher, M. L. and Hochbaum, D. S., "Database location in computer networks," J. ACM, Vol. 27, No. 4, pp. 718-735, Oct. 1980. [FiWe76] Filman, R. E. and Weyhrauch, R. W., "An FOL primer," Memo. 288, AI Laboratory, Stanford University, 1976. [FuLa79] Fung, K. T. and Lam, C. M., "Optimal data allocation in a distributed database," Proc. Trends and Applications, National Bureau of Standards, Gaithesburg, Md., pp. 111-116, 1979. [Furt81] Furtado, A. L., "Horizontal decomposition to improve a non-BCNF scheme," SIGMOD Record, Vol. 12, No. 1, pp. 26-32, Oct. 1981. [GaMi78] Gallaire, H. and Minker, J. (eds), Logic and Databases, Plenum Press, New York, 1978. [Gilm60] Gilmore, P. C., "A proof method for quantification theory: its justification and realization," IBM J. Research Dev., Vol. 4, pp. 28-35, Jan. 1960. [GoRo77] Goldstein, I. P. and Roberts, R. B., "NUDGE, a knowledge-based scheduling program," Proc. Int. Joint Conf. on Artificial Intelligence, Cambridge, MA, pp. 257-263, 1977.

228 [Gree69] Green, C. C., "The application of theorem-proving to question-answering systems," Proc. Int. Joint Conf. on Artificial Intelligence, Washington, D. C., pp. 219-237, 1969. [Hail571 Hailperin, T., "A theory of restricted quantification I," J. Symbolic Logic, Vol. 22, No. 1, pp. 19-35, March 1957. [HaMc78] Hammer, M. and McLeod, D., "The Semantic Model: A modelling mechanism for DB applications," Proc. ACM SIGMOD Int. Conf. on Management of Data, Austin, TX, May 1978. [HaNi79] Hammer, M. and Niamir, B., "A heuristic approach to attribute partitioning," Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 93-101, May 1979. [Haye71] Hayes, P., "A logic of actions," Machine Intelligence, Vol. 6, Metamathematics Unit, University of Edinburgh, 1971. [Haye77] Hayes, P. J., "In defence of logic," Proc. Int. Joint Conf. on Artificial Intelligence, Cambridge, MA, pp. 559-565, 1977. [HaZd80] Hammer, M. and Zdonik, S. B. Jr., "Knowledge-based query processing," Proc. Int. Conf. on Very Large Data Bases, Montreal, Canada, pp. 137 -147, Oct. 1980. [Hend75] Hendrix, G. G., "Expanding the utility of semantic networks through partitioning," Proc. Int. Joint Conf. on Artificial Intelligence, Tbilisi, Georgia, USSR, pp. 115-121, 1975. [Hend78] Hendrix, G. G. et. al., "Developing a natural language interface to complex data," ACM Transactions on Database Systems, Vol. 3, No. 3, pp. 105-147, 1978. [Hens72] Henschen, L. J., "N-sorted logic for automatic theorem proving in higher-order logic," Proc. A CM Conference, Boston, MA, 1972. [Herb30] Herbrand, J., Recherches aur la Theorie de la Demonstration (These Paris), Warsaw (1930) Chapter 3, 1930. Also in Logical Writings (W. D. Goldfarb, ed.), D. Reidel Pub. Co., 1971. [Hewi72] Hewitt, C., "Description and theoretical analysis (using schemata) of PLANNER, a language for proving theorems and manipulating models in a robot," Rep. No. TR-258, AI Laboratory, Massachusetts Institute of Technology, 1972.

229 [HeYa79] Hevner, A. R. and Yao, S. B., "Query processing in distributed database systems," IEEE Transactions on Software Engineering, Vol. SE-5, No. 3, pp. 177-187, May 1979. [Horn51I Horn, A., "On sentences which are true of direct unions of algebra," J. Symbolic Logic, Vol. 16, pp. 14-21, 1951. [Ide164] Idelson, A. V., "Calculi of constructive logic with subordinate variables," American Mathematical Society Tranalations (2), Vol. 99, 1972 - translation of Trudy Mat. Inst. Steklov. 72, 1964. [IrKh79] Irani, K. B. and Khabbaz, N. G., "A model for combined communication network design and file allocation for distributed databases," Proc. Int. Conf. on Distributed Computing Systems, Huntsville, AL, pp. 15-21, Oct. 1979. [King81] King, J. J., "QUIST: A system for semantic query optimaization in relational databases," Proc. Int. Conf. on Very Large Data Bases, pp. 510 -517, 1981. [KiPo81] Kilov, K. I. and Popova, I. A., "Meta-database architecture for relational DBMS," SIGMOD Record, Vol. 12, No. 1, pp. 1825, 1981. [KoHa69] Kowalski, R. and Hayes, P., "Semantic trees in automatic theorem proving," Machine Intelligence (B. Meltzer and D. Michie, eds.), Vol. 4, American Elsevier, New York, pp. 87-101, 1969. [KoKu70] Kowalski, R. and Kuehner, P., "Linear resolution with selection function," Metamathematics Unit, Edinburgh University, Scotland, 1970. [Kowa74] Kowalski, R., "Predicate logic as a programming language," Proc. IFIP Congress (Information Processing 74), North-Holland, Amsterdam, pp. 569-574, 1974. [KrKr67] Kreisel G. and Krivine J. L., Elements of Mathematical Logic (Model Theory), North-Holland, Amsterdam, 1967. [Lee 72] Lee, R. C. T., "Fuzzy logic and the resolution principle," J. ACM, Vol. 19, No. 1, pp. 109-119, Jan. 1972. [Love70] Loveland, D. W., "A linear format for resolution," Proc. IRIA Symp. on Automatic Demonstration, Versailles, France, 1968, Springer-Verlag, New York, pp. 147-162, 1970.

230 [Love72] Loveland, D. W., "A unifying view of some linear Herbrand procedures," J. ACM, Vol. 19, pp. 366-384, March 1972. [Luck70] Luckham, D., "Refinements in resolution theory," Proc. IRIA Symp. on Automatic Demonstration, Versailles, France, 1968, Springer-Verlag, New York, pp. 183-190, 1970. [MaRi76] Mahmoud, S. and Riordon, J. S., "Optimal allocation of resources in distributed information networks," ACM Transactions on Database Systems, Vol. 1, No. 1, pp. 66-78, Mar. 1976. [MaUl83] Maier, D. and Ullman, J. D., "Fragments of relations," Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, CA, pp. 15-22, May 1983. [McDe82] McDermott, D., "A temporal logic for reasoning about processes and plans," Cognitive Science, Vol. 6, 1982. [McHa69] McCarthy, J. and Hayes, P. J., "Some philosophical problems from the standpoint of artificial intelligence," Machine Intelligence (D. Michie and D. Meltzer, eds), Vol. 4, Edinburgh University Press, Edinburgh, Scotland, pp. 463-502, 1969. [McMi77] McSkimin, J. R. and Minker, J., "The use of a semantic network in a deductive question anwering system," Proc. Int. Joint Conf. on Artificial Intelligence, Cambridge, MA, 1977. [Melt66] Meltzer, B., "Theorem-proving for computers: Some results on resolution and renaming," Computer J., Vol. 8, pp. 341-343, 1966. [Mink78] Minker, J., "Experimental relational data base system bases on logic," Logic and Databases (H. Gallaire and J. Minker, eds), Plenum Press, New York, pp. 107-148, 1978. [Mins75] Minsky, M., "A framework for representing knowledge," The Psychology of Computer Vision (P. Winston, ed), McGraw-Hill, New York, pp. 211 -277, 1975. [MoLe77] Morgan, H. L. and Levin, K. D., "Optimal program and data locations in computer networks," Comm. ACM, Vol. 32, No. 5, pp. 315-322, May 1977. [Mylo81] Mylopoulos, J., "An overview of knowledge representation," Proc. Workshop on Data Abstraction, Databases and Conceptual Modeling, pp. 5-12, Jan. 1981.

231 [NeSi72] Newell, A. and Simon, H. A., Human Problem Solving, Prentice-Hall, Englewood Cliffs, NJ, 1972. [Nils80] Nilsson, N. J., Principles of Artificial Intelligence, Tioga Pub. Co., 1980. [NoRu75] Norman, D. A., Rumelhart, D. E. and the LNR Research group, Ezplorations in cognition, Freeman Pub. Co., San Francisco, CA, 1975. [Ouli84] Oulid-Aissa, M., The Distribution and Materialization of CrossReferencing Data Units in a Computer Network, Ph. D. Dissertation, The Univ. of Michigan, Ann Arbor, MI, 1984. [Qui168] Quillian, M. R., "Semantic memory," Semantic Information Processing (M. Minsky, ed), MIT Press, Cambridge, MA, pp. 227-270, 1968. [Quin55] Quine, W. V., "A Proof Procedure for Quantification Theory," J. Symbolic Logic, Vol. 20, pp. 141-149, 1955. [Raph68] Raphael, B., "SIR: A computer program for semantic information retrieval," Semantic Information Processing (M. Minsky, ed), pp. 33-145, 1968. [RaWa79] Ramamoorthy, C. V. and Wah, B. W., "The placement of relations in a distributed relational database," Proc. Int. Conf. on Distributed Computing Systems, Huntsville, AL, Oct. 1979. [Rebo76] Reboh, R. et. al., "QLISP: A language for the interactive development of complex systems," Rep. No. TN-120, AI Center, SRI Int., Inc., 1976. [Reit71] Reiter, R., "Two results on ordering for resolution with merging and linear format," J. ACM, Vol. 18, pp. 630-646, 1971. [Reit78a] Reiter, R., "On Reasoning by Default," Proc. TINLAP-2., The University of Illinoi, Urbana, IL, July 1978. [Reit78b] Reiter, R., "Deductive Question-Answering on relational data bases," Logic and Databases (H. Gallaire and J. Minker, eds), Plenum Press, New York, pp. 149-177, 1978. [Reit81] Reiter, R., "On the integrity of typed first order data bases," Advances in Data Base Theory (H. Gallaire, J. Minker and J. M. Nicolas, eds), Plenum Press, New York, 1981.

232 [Robi65a] Robinson, J. A., "A machine-oriented logic based on the resolution principle," J. ACM, Vol. 12, No. 1, pp. 23-41, Jan. 1965. [Robi65b] Robinson, J. A., "Automatic deduction with hyper-resolution," Int. J. Comput. Math., Vol. 1, pp. 227-234, 1965. [RoGo77] Rothnie, J. B. and Goodman, N., "A survey of research and development in distributed database management," Proc. Int. Conf. on Very Large Data Bases, Tokyo, Japan, pp. 48-62, Oct. 1977. [Roth80] Rothnie, J. B. et. al., "Introduction to a system for distributed databases (SDD-1)," ACM Transactions on Database Systems, Vol. 5, No. 1, pp. 1-17, March 1980. [RuDW72] Rulifson, J., Derkson, J. A. and Waldinger, R. J., "QA4: A procedural calculus for intuitive reasoning," Rep. No. TN-83, AI Center, SRI Int., Inc., 1972. [SAMP81] Proc. ACMA SIGART/SIGMOD/SIGPLAN workshop on Data Abstraction, Databases and Conceptual Modelling, Pingree Park, Col., 1981. [Schm38] Schmidt, A., "Uber deduktiven Theorien mit mehreren Sorten von Grunddingen," Mathematische Annalen, Vol. 115, pp. 485-506, 1938. [Schm51] Schmidt, A., "Die Zulassigkeit der Behandlung mehrsortiger Theorien mittels der ublichen Pradikatenlogik," Mathematische Annalen, Vol. 123, pp. 187-200, 1951. [Schu76] Schubert, L. K., "Extending the expressive power of semantic networks," Artificial Intelligence, Vol. 11, No. 1, 2, pp. 45-83, 1976. [Shap79] Shapiro, S., "The SNePS semantic network processing system," Associative Networks -- The Representation and Use of Knowledge in Computers, Academic Press, New York, pp. 179-203, 1979. [Shlr84] Shin, D. G. and Irani, K. B., "Knowledge representation using an extension of a many-sorted language," Proc. Conf. on Artificial Intelligence Applications, Denver, Col., pp. 404-409, Dec. 1984. [Shoe67] Shoenfield, J. R., Mathematical Logic, Addison-Wesley, Reading, MA, 1967. [Shor76] Shortliffe, E. H., Computer-Base Medical Consultations: MYCIN, NorthHolland, New York, 1976.

233 [Siek84] Siekman, J., "Universal unification", Proc. Int. Conf. on Automated Deduction, Napa, CA, (Lecture Notes in Computer Science, Vol. 170), Springer-Verlag, New York, pp. 1-42, 1984. [Slag67] Slagle, J. R., "Automatic theorem proving with renameable and semantic resolution," J. ACM, Vol. 14, pp. 687-697, March 1967. [SmSm77] Smith J. M. and Smith D. C. P., "Database abstractions: aggregation and generalization," ACMI Transactions on Database Systems, Vol. 2, No. 2, pp. 105-123, 1977. [SmSm78] Smith J. M. and Smith D. C. P., "Principle of conceptual DB design," Proc. NYU Symp. on DB Design, New York, pp. 18-19, May 1978. [SuMc72] Sussman, G. and McDermott, D. V., "CONNIVER reference manual," Memo 259, Al Laboratory, Massachusetts Institute of Technology, 1972. [TeFr82] Teorey, T. J. and Fry, J. P., Design of Database Structures, PrenticeHall, Inc., Englewood Cliffs, NJ, 1982. [Ullm80] Ullman, J. D. Principles of Database Systems, Computer Science Press, Rockville, Md., 1980. [Walt78] Waltz, D. L., "An English language question answering system for a large relational database," Comm. ACM, Vol. 21, No. 7, pp. 526-539, 1978. [Walt83] Walther, C., "A many-sorted calculus based on resolution and paramodulation," Proc. Int. Joint Conf. on Artificial Intelligence, Karlsruhe, West Germany, 1983. [Walt84a] Walther, C., "Unification in many-sorted theories," Proc. European Conf. on Artificial Intelligence, Pisa, Italy, 1984. [Walt84b] Walther, C., "A mechanical solution of Schubert's streamroller by many-sorted resolution," Proc. National Conf. on Artificial Intelligence, Austin, TX, pp. 330-334, 1984. [Wang52] Wang, H., "Logic of many-sorted theories," J. Symbolic Logic, Vol. 17, No. 2, pp. 105-116, June 1952. [Wang60] Wang, H., "Towards mechanical mathematics," IBM J. Research Dev., Vol. 4, pp. 2-22, 1960.

234 [Weyh77] Weyhrauch, R. W. "FOL, a proof checker for first-order logic," Memo AIM-235.1, Stanford Artificial Intelligence Laboratory, Stanford University, 1977. [Whit70] Whitney, V. K. M., "A study of optimal file assignment and communication network configuration in remote-access computer message processing and communication systems," SEL Tech. Report No. 48, The Univ. of Michigan, Ann Arbor, MI, Sept. 1970. [Wins77] Winston, P. H., Artificial Intelligence, Addison-Wesley, Reading, MA, 1977. [WoCR64] Wos, L., Carson, D. F. and Robinson, A., "The unit preference strategy in theorem proving," Proc. AFIPS Fall Joint Computer Conf., Vol. 26, pp. 616-621, 1964. [WoKa83] Wong, E. and Katz, R. H., "Distributing a database for parallelism," Proc. ACM SIGMOD Int. Conf. on Mangement of Data, San Jose, CA, pp. 23-29, May 1983. [WoMy77] Wong, H. K. T. and Mylopoulos, J., "Two views of data semantics: A survey of data models in artificial intelligence and database mangement," Information, Vol. 15., No. 3, pp. 344-383, 1977. [Wong77] Wong, E., "Retrieving dispersed data from SDD-1: A system for distributed databases," Proc. Berkeley Workshop on Distributed Data Management and Computer Networks, The Univ. of California, Berkeley, CA, pp. 217-235, May 1977. [Wong81] Wong, E., "Dynamic re-materialization: Processing distributed queries using redundant data," Proc. of Berkeley Workshop on Distributed Data Management and Computer Networks, The Univ. of California, Berkeley, CA, pp. 3-13, 1981. [WoRC65] Wos, L., Robinson, A. and Carson, D. F., "Efficiency and completeness of the set of support strategy in theorem proving," J, ACM, Vol. 12, pp. 536-541, March 1965. [WoYo76] Wong, E. and Youssefi, K., "Decomposition -- A strategy for query processing," ACM Transactions on Database Systems, Vol. 1, No. 3, pp. 223-241, Sept. 1976. [Yao 79] Yao, S. B., "Optimization of query evaluation algorithms," ACCM Transactions on Database Systems, Vol. 4, No. 2, pp. 133-155, June 1979.

UNIVERSITY OF MICHIGAN II3H9102illl 0h 3 9015 03525 0128