THE UNIVERSITY OF MICHIGAN COLLEGE OF LITERATURE, SCIENCE, AND THE ARTS Computer and Communication Sciences Department Technical Report A SELF-DESCRIBING AXIOMATIC SYSTEM AS A SUGGESTED BASIS FOR A CLASS OF ADAPTIVE THEOREM PROVING MACHINES Thomas H. Westerdale supported by: Department of Health, Education, and Welfare National Institues of Health Grant No. GM-12236-03 Bethesda, Maryland and Office of Naval Research Contract No. N00014-67-A-0181-0011 Washington, D. C. and U. S. Army Research Office (Durham) Grant No. DA-31-124-ARO-D-483 Durham, North Carolina administered through: OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR March 1969 Distribution of This Document is Unlimited.

ERRATA SHEET Page 129, Line 12 should read: We shall now show that for any theorem of Form 0 D a, the formula Page 129, Line 13 should read: ~T( )) is not provable. Suppose Da is a theorem. Then we can Page 129, Line 22 should read: by rule 18 from theorem ~D a Page 129, Line 23 should be deleted: Page 130, After Line 4 Insert: Let us call an expression anomalous if it contains a subexpression of form (B,y,6) in which occurs a variable i such that ( ( holds but E is not free in (8,y,6). Such anomalous expressions can never occur inside theorems. Now if a is a well formed formula which is not an anomalous expression, then 0 D a is provable. Hence we have shown that for any well formed formula a which is not anomalous, -T( )) is not provable. It would have been much more natural (and quite easy) to have originally defined the class of well formed expressions (and of well formed formulae) so as to exclude anomalous expressions. (We could even have defined the class of expressions so as to exclude anomalous expressions.) With such a natural definition, an expression a is well formed if and only if it occurrs as a subexpression of some theorem; that is, if and only if 3(z,x) (expression(z) ^ variable(x) A x o z ^ type(x)= exprtype( ) T(Sf(x, (, z)) ) holds. This is what is usually meant by a well formed expression. Page 130, After the Last Line Insert: If anom is a predicate expression such that anom(a) holds if and

ErrataSheet (Cont.) and only if a names an anomalous expression, then T(x)D ~anom(x) is provable by a tedious proof which, in outline, is something like the proof of T(x) D F(x). Thus if a is an anomalous expression, -T(() is provable by the same argument as above. Page 214, Delete Lines 16, 17, 18, and 19.

PREFACE Throughout the original manuscript colored symbols were used to represent quoted symbols. Use of colored symbols permits a more intuitive and economical abbreviation scheme than does the use of quotation marks. The abbreviations discussed in Section 2.1.2.6 would be confusing if quotation marks were used in place of color. In this particular copy it has been impossible (for typographical reasons) to use colored symbols. We have therefore employed the following convention: Symbols blue in the original manuscript are here surrounded by a balloon. Symbols red in the original manuscript are here surrounded by a double balloon. Symbols green in the original manuscript are here surrounded by a triple balloon The text has not been changed in this copy. Therefore symbols inside the double balloons are referred to as red symbols, etc. The reader may wish to clarify some of the more complicated formulae (particularly those in Section 2.2.2.) by writing them out in the original color notation. The description of our axiomatic system begins in Section 2.1.2. Section 2.1.1. describes the relationship of our axiomatic system to formal arithmetic. Section 2.1.1. does give an intuitive overview of the system, but some of the arguments there are more complete than is required for such an overview. The reader may wish on the first reading to skip the more tedious portions of Section 2.1.1., particularly the more tedious portions of 2.1.1.9. Evidence for claims made in Section 2.1.1. is frequently given in footnotes. Use has been made, in these notes, of notation explained in Section 2.1.2, so the reader should not be surprised to find ii

certain portions of these notes unintelligible until after he has read Section 2.1.2. The conclusion of Section 2.2.1 is that our axiomatic system is consistent. The reader is assured that this fact, the consistency of our system, is the only thing from Section 2.2.1 that is used in the other sections. The reader who does not wish to read the somewhat tedious Section 2.2.1, and who can believe that the system is consistent, may proceed to the short Section 2.2.2 (where other logical properties of the system are concisely developed) without fear that he has missed something that will be referred to later. It should also be noted that the reader who has read no more than Section 1 will be able to understand (though perhaps not believe) much of the conclusion of section 3.8. The present work is an outgrowth of certain investigations in the theory of Adaptive Systems as developed by Professor John H. Holland. The approach used here is the approach which Professor Holland has developed for more general cases. My original inspiration was Professor Holland's Iterative Circuit Computer which resembles the memory nets described here (taking his generators to be my formula nodes). Professor Holland provided the key to the scheme described in this paper when he suggested that heuristics might be regarded as "rules of inference with some conditions missing." The purpose of this work is to provide a scheme which allows the theory to be applied in a theorem proving environment. I would like to thank each member of my doctoral committee for his guidance and aid. Professor Peter G. Hinman guided me through logical arguments required iii

in Section 2.2, the section which presents the major results of this work. It was necessary for Professor Hinman to give me a course in Logic, suggest various approaches to my problem, and demolish many of my arguments before the arguments presented in Section 2.2 could be completed. Professor Arthur W. Burks showed me the necessity of making the arguments given in Section 2.2. For example, he pointed out that if statement (B) Section 2.2.2 holds, my system is inconsistent. Since I thought at that time that Statement (B) held, I realized my argument was full of holes. Professor Bruce W. Arden and Professor Bernard A. Galler aided me in the implementation aspects of this project. They pointed out areas of difficulty which, with my limited programming experience, I might otherwise have missed. I was, for example, entirely oblivious of the difficulties discussed in Section 3.3.5 until I was forced to think in detail about storage methods. I would like to express my gratitude for the privilege of working with the members of the Logic of Computers Group at The University of Michigan. I would like to thank the members of the Group and of the Computer and Communication Sciences Department for providing a stimulating environment, for asking many helpful questions, and for making many helpful suggestions. I am especially indebted to the late Professor Gordon Peterson for shielding me from the sort of strict calendaring of courses and exams which stifles a person's education. His advice in matters both academic and bureaucratic was extremely helpful. I would like to thank my instructors and fellow students of The University of Michigan Department of Botany for giving me the understanding of biological processes that is important for Adaptive Systems study. iv

I would like especially to thank Professor K. L. Jones of The University of Michigan Botany Department who was a constant source of inspiration during the nine years he guided my education as my instructor and counselor. This work was supported by the National Institutes of Health, the Army Research office, and the Office of Naval Research. v

TABLE OF CONTENTS PREFACE LIST OF FIGURES 1. INTRODUCTION 1.1 Purpose of the Paper 1.2 The Difficulty in Making "Small" Enough Changes 1.3 Our Approach: The Meta and Object Level 1.4 The Machine's Environment 1.5 Organization of the Machine's Memory, an Example 1.6 The Effector Acting Upon the Memory Net 1.6.1 Principles of Effector Action 1.6.2 Use of the Heuristics that are in the Net: an Example 1.6.3 Refining a Proof 1.6.3.1 The General Scheme —refineproof 1.6.3.2 prove —and the Problem of Loose Ends 1.6.4 Saving Examples of Derivations Employing Heuristics 1.7 Generation of Heuristics 1.8 Questions Beyond the Scope of the Paper ii ix 1 1 2 3 6 7 13 13 14 17 17 20 22 26 28 2. THE FORMAL SYSTEM 2.1 Basic Structure of the System 2.1.1 Overview of System 2.1.1.1 Plan of Discussion 2.1.1.2 Pertinent Properties of Formal Arithmetic 2.1.1.3 Addition of l (representability of individual functions) 2.1.1.4 Addition of X (ordinary-names of functions and relations) 2.1.1.5 Addition of label, cond, and pcond (algorithmic-names of functions and relations) 2.1.1.6 Elimination of + and ~ 2.1.1.7 An Operation on S-expressions: Addition of * and nil 2.1.1.8 Addition of Pv, Iv, Pfvb, Ifvb, newpv, newiv, newpfvb, and newifvb; subtraction of tv, I, Pf, and If 2.1.1.9 Elimination of S and 0 and addition of qu 2.1.1.10 Our Axiom Set 2.1.2 Expressions and Their Abbreviation 2.1.2.1 Motivation for Abbreviation 2.1.2.2 Definitions of Classes of Expressions of our Language 2.1.2.3 Format for Abbreviation Rules 2.1.2.4 Abbreviations of a Single Color with no "Defined" Symbols (the first 12 rules) 2.1.2.5 "Defined" Symbols (Rule 13) 2.1.2.6 Abbreviations using Colored Symbols (Rules 14-17) 2.1.2.7 Comma vs. Dot Notation 2.1.2.8 Reading the Expressions in the Tables and Text 30 30 30 30 32 36 38 38 43 44 46 49 55 57 57 58 63 63 67 69 73 73 vi

TABLE OF CONTENTS (Cont.) 2.1.3 Well Formedness 2.1.4 The Axiomatic System 2.1.4.1 Certain Functions and Relations 2.1.4.2 The Meta Level 2.1.4.3 The Axioms (Table 4) 2.1.4.4 The Rules of Inference (Table 5) 2.1.4.5 Proofs and Theorems 2.1.4.6 Summary 2.2 Formal Arguments 2.2.1 Consistency 2.2.1.1 Generation of Function Expressions 2.2.1.2 Preservation of Consistency while Making Identifications between Object and Meta Levels. (Completion of Consistency Argument) 2.2.2 Incompleteness 77 80 80 86 91 93 96 97 98 98 98 120 126 3. IMPLEMENTATION 3.1 Purpose of Section 3 3.2 Characterization of the Subclass of Machines 3.3 Structure of Memory 3.3.1 Basic Plan of the Memory Net 3.3.2 Some LISP Functions on Bug Values 3.3.3 Condition on Derivation Nodes (Rules ai 3.3.4 Net Changing Functions 3.3.5 Storage of Memory Nets-Difficulties 3.3.6 Patching 3.3.7 Tagging and Garbage Collection 3.4 Other Functions which Change Net Structure 3.4.1 Kinds of Functions to be Discussed 3.4.2 Searching 3.4.3 Functions on Two Nets 3.5 LISP Structure of the Effector 3.6 Refining a Proof 3.6.1 The Task of refineproof 3.6.2 The Use of T-tags and H-tags 3.6.3 The Task of prove 3.6.4 Example: A Heuristic which is a Compos Two Rules 3.6.5 Less Trivial Situations 3.6.6 parametertreegenerate 3.6.7 Suppose the Model Fails at Some Point 3.7 General Considerations 3.8 Conclusion: Discussion of Adaptation 3.9 Postscript: Other Object Theories nd Heuristics) 131 131 132 133 133 136 139 142 145 148 150 151 151 152 154 158 158 158 160 161 ition of 162 167 168 171 173 174 179 187 187 188 4. TABLES 4.1 Table 1. Alphabet 4.2 Table 2. Basic Recursive Functions 4.3 Table 3. Defined Complete, Recursive Functions of a General Nature 4.4 Table 4. Axioms 190 196 vii

TABLE OF CONTENTS (Cont.) 4.5 Table 5. Rules of Inference 4.6 Table 6. Defined Complete, Recursive Functions of a Specific Nature 4.7 Table 7. Definition of T and Immediate Consequences 4.8 Table 8. Non-Recursive Definitions Especially Useful for Meta-Theorems. Some Immediate Consequences 4.9 Table 9. Definitions for Handling Recursive Functions; apt 4.10 Table 10. Examples 4.10.1 Example 1. Reflexivity and Transitivity of = 4.10.2 Example 2. Proof of -atom(x) _ a(x)*d(x) (basic 199 203 210 4.10.3 Example 4.10.4 Example 4.10.5 Example theorem for a and d) 3. Some More Theorems about a and d; an Alternative Induction Axiom 4. Course of Values Induction (and a Corollary) 5. Generation of some Labeled Functions, e.g., map ist 6. The Predecessor Function 7. The p Schema 8. T(x) F(x) 9. Sketch of Proof of T(x) ID T(apl(x)) 211 215 218 218 218 221 222 224 234 236 237 238 242 242 244 247 4.10.6 4.10.7 4.10.8 4.10.9 Example Examp le Examp le Examp le 4.11 Table 11. Implementation Routines —Effector Algorithms Used in Section 3 4.11.1 Basic Functions and Notation 4.11.2 Routines which Return a Bug Value 4.11.3 Routines which Return a Bug Value Paired with a Sequence of Pairs REFERENCES 252 viii

LIST OF FIGURES Figure 1 9 Figure 2 10 Figure 3 11 Figure 4 12 Figure 5 16 Figure 6 23 Figure 7 24 Figure 8 25 Figure 9 143 Figure 10 144 Figure 11 148 Figure 12 160 Figure 13 164 Figure 14 165 Figure 15 170 Figure NSS1 182 Figure NSS2 182 Figure NSS3 183 Figure NSS4 183 Figure NSS5 184 Figure NSS6 184 ix

1. INTRODUCTION 1.1 Purpose of the Paper The overall goal of this paper is to argue for the thesis that we can arrive at a more general approach to adaptive theorem proving if we first formulate an axiomatic theory which relates structure of computer programs to their performance. A major part of the argument will consist in exhibiting an axiomatic theory which possesses the required features. Until now, adaptation in theorem proving machines has been rather limited. For example, Newell,Shaw,and Simon's General Problem Solver has a fixed set of heuristics (i.e., tests to see which strategy is to be followed) which are employed successively to construct a proof. The machine can adapt by changing the probabilities with which the various heuristics are employed, but the set of heuristics it may use remains the same set that the programmer put into the machine. It has no capacity (or, in later models, very limited capacity) to generate new heuristics. In this paper we shall show how an axiomatic theory relating structure of computer programs to their performance can be used as the core of an adaptive theorem prover which has a general capability of generating new heuristics. First we shall design a language suitable for talking about computer (LISP) programs. We shall then construct a set of axioms and rules of inference such that the theorems of this axiomatic system have natural interpretations as statements about LISP programs. In these theorems, LISP programs appear as long expressions which behave syntactically as names of partial recursive functions. Let us write (, 0, and * as abbreviations for whole LISP programs. Then 4(a) stands for the output of the program + when -1

-2 given the input a, just as in normal functional notation. For example, if (e(x) = 0) Z) (C(x) = +(x)) is a theorem, then the statement "( and i give identical outputs for identical inputs as long as e gives output zero for those inputs" will be a true statement. These statements include statements relating the structure of LISP programs to their performance. Finally we shall discuss one method by which this axiomatic system may be implemented as the core of an adaptive theorem proving machine. The main section of this paper will be devoted to a formal description of the axiomatic system. We shall not go into this formalism in the introduction. We simply mention that the system is as powerful as formal arithmetic. In the main section we shall show how the axiomatic system may be derived from formal arithmetic by a series of consistency-preserving transformations. In addition to showing consistency of the system, we shall give a Godel-type proof of incompleteness. Following the main section will be a section explaining how the system might be implemented as the core of an adaptive theorem prover. The remainder of this introduction will outline some of the significant characteristics of the sort of adaptive theorem prover we have in mind. 1.2 The Difficulty in Making "Small" Enough Changes For a machine to adapt it must change its structure bit by bit, checking after each small change to see whether the change has been an improvement or a pejoration. A radical change is almost always pointless. But what does it mean to make a small change in structure, as opposed to a radical change? In Friedberg's program generating machine [Friedberg, 1958], the "structure" was a computer program written in an assembly language.

-3The structure was modified by changing various instructions. The performance before the change was then compared with the performance after the change. A "small" change for Friedberg meant a change of only one or two instructions (as opposed to a change of many instructions). Such a definition of "small" is meaningless since his "small" structural changes tended to produce radical performance changes. What is needed is a theory of structural change which predicts which types of changes will produce "small" performance changes and which ones "large" performance changes. As yet no adaptive machines have employed any such theory. Of course the relationship between structural changes in programs and concomitant performance changes is much too complicated to be represented as a relationship between "magnitude" of the change. McCarthy has attacked this problem in the following manner. He has first developed a programming language (LISP 1.5) which lends itself to analysis of this sort. A LISP program is a symbolic representation of a partial recursive function. The "data" or "inputs" to the program are names for the function's arguments. The name of the value of the function for those arguments is the "output". McCarthy has then developed a theory which tells how to set up certain algorithms which, when applied to the function representations, tell us, among other things, over what sets two functions are identical. However, he gives no usable mechanical procedure for generating the various algorithms. (The set of algorithms we want is not a recursive set. In our system the set is defined by a generation procedure discussed in Section 2.2.1.1.) 1.3 Our Approach: The Meta and Object Level Our approach will be more general. We design an axiomatic system, S,the theorems of which have natural interpretations as statements about LISP programs. These theorems, once proved, can be used in modifying LISP programs in any

-4desired way. (Actually we use a modified LISP, modified in several essential ways.) A machine can use these theorems as a basis on which to make adaptive changes. We shall argue that there exist useful theorem proving machines which make adaptive changes on the basis of these theorems. We shall do this by describing such a machine in sufficient detail to demonstrate its characteristic properties. This machine will make two kinds of adaptive changes. First, it will generate new appropriate heuristics. Second, it will change the probabilities with which the various heuristics are employed. It is chiefly in the ability to generate new heuristics that this machine differs from such machines as the Newell, Shaw, and Simon General Problem Solver. We shall, in this section (Sec. 1.3), describe in a general way the method by which new heuristics are generated. Later in the introduction we shall discuss in some detail how our machine changes the probabilities with which the various heuristics are employed. Our machine will be making the adaptive changes in a theorem proving environment. An essential part of the machine will be a set of axioms and rules of inference for the system in which we wish the theorems proved. (The machine will prove theorems in a manner similar to that of Newell, Shaw, and Simon's General Problem Solver.) Let us call this the object system, and let us call the language in which the theorems are written the object language. The axioms are written in the object language. The rules of inference are like little LISP programs, each representing a function which maps theorems to theorems. The rules of inference are written in a LISP-like meta language. In addition, our machine will need other expressions written in the meta language. It will need expressions which look like rules of inference, but which are only sometimes valid. These we call heuristics. They will be associated with rules of inference in such a way that the successful

-5application of a heuristic will imply, for example, that one of a particular set of rules of inference is likely to be successful. A heuristic is a quick check to see whether a strategy is a good one. (The machine will, of course, assign weights to various heuristics according to past success.) A well-adapted machine will have a well-organized hierarchy of heuristics which classify rules of inference into useful overlapping classes. We will see more complicated uses of heuristics later. (We will also see how Newell, Shaw, and Simon's heuristics can be written in our notation.) Some of the simpler heuristics may be thought of as merely rules of inference with some of the required conditions missing. The situation is not always so simple, but there will always be some useful relationship between the performance of the heuristic and the performance of the rules of inference to which it is related. Now if our machine only had a good set of statements relating structure of LISP programs to performance, it could generate new heuristics directly from the rules of inference by making indicated structural changes. (It could also generate new rules of inference from old rules of inference, and new heuristics from old heuristics.) Suppose now that the object system is the system S whose theorems have natural interpretations as statements about LISP programs. (Some of these statements relate structure of LISP programs to their performance.) Then as our theorem proving machine worked, it would produce these statements which would tell how to modify our rules of inference and heuristics to attempt to improve the performance of our machine. In fact, we can even write rules of inference in a format such that they themselves become theorems of the system S. They will be theorems whose natural interpretations are statements of a form something like, "If such and such is a theorem, then so and so is a theorem." In fact we shall see later that

-6 heuristics may be written in a similar format so that they become formulae (but not theorems) of S which have a form similar to the form of rules of inference. Thus the theorems of the object system (now the system, S), the rules of inference, and the heuristics are all written in the same language, the object language of the system, S. The language of our meta and object levels is identical. We will see how this identification of meta and object level is exactly the same as the well-known identification made in formal arithmetic by means of Godel numbering. In our system, however, we have theorems which are statements about practical computer programs. 1.4 The Machine's Environment Likening the machine to, say, a mathematics professor's graduate assistant,one sees that the environment of the machine is not merely the mathematical system in which the theorems are to be proved, but includes also the value placed on theorems and completed proofs by the professor (or, in the machine's case, the user) and the ordering of the problem sequence presented. This is important, for it means that the machine, in adapting to take advantage of regularities in its environment, can take advantage of regularities both in the mathematical system itself and in the mind of the user. Any machine not taking advantage of this second class of regularities is ignoring vital information and may stand little chance of learning. (This point was missed by Amarel [Amarel 1962], whose machine may have been taking advantage of regularity in the user's mind rather than, as Amarel claims, regularity in the mathematical system.) The graduate assistant takes advantage of such regularity in the professor's mind when he produces "elegant" proofs rather than tedious ones, and when he becomes proficient in techniques for proving important theorems, ignoring

-7whole classes of trivial unapplicable theorems. ("Elegance" and "importance" are concepts in the professor's mind, not in the mathematical system.) The student takes advantage of such regularity when he looks first at most recent lessons in his search for theorems to use in homework problems. (It will not be absolutely necessary that the user use programmed learning techniques in developing machine competence in the environment in question, but it may frequently be advisable. Similarly for the professor developing student competence.) One consequence of the above point of view is that the sequence of problems and the rewards given are ultimately determined by the user. Thus in a vague sense the environment will be a sequence of "to prove" problems presented by the user together with the rewards given by the user for the completed proofs produced by the machine. The environment will be more regular and the machine's task simpler accordingly as the user grades the problems in difficulty and makes them interrelated. 1.5 Organization of the Machine's Memory, an Example We seek a machine "intelligent" enough to operate in this environment. Our approach is to construct a machine which attacks problems somewhat the way humans attack them. This approach presupposes some knowledge of human thinking. Our knowledge is obviously meager, but I have found Polya's How to Solve It [Polya 1945] an invaluable aid. His suggestions are illuminating: "Do you know a related problem?... Here is a problem related to yours and solved before.... Could you use its result? Could you use its method?" The human is here taking advantage of the very regularities in the environ ment we have been mentioning. Our machine must then be prepared to

-8remember both results and methods of problems solved (together with their values, determined by rewards given by the user) and to find similarities between them. One way to do this is to remember all previous proofs (proofs which accumulate little reward are, of course, actually thrown away later) and the rules of inference used at each step. Our machine must also remember which heuristics were used to arrive at the proof, but let us ignore this for the moment. Since each line of the proof is a theorem, the machine need only remember, for each of these theorems, the rule used to derive the theorem, the theorems to which the rule was applied, and certain parameter values the purpose of which we shall make clear below. In other words, for each theorem, the machine remembers its immediate derivation. We now give an example of such an immediate derivation and indicate how it is remembered inside the machine. In this example, for simplicity, the theorems and parameter values will be written in the language of propositional calculus, and the rules will be written in English. (In our machine both theorems and rules are written in the object language of the system, S, but we shall not discuss that formalism here.) THEOREMS Thm. i: (p D (q D p)) D ((p D q) D (p D p)) Thm. 2: p D (q D p) Thm. 3: (p D q) D (p D p) Thm. 4: (p D (q Dp)) D (p D P) Thm. 5: p D p RULES Rule 1: (Modus ponens) If a and 8 are well-formed formulae of the

-9 propositional calculus, and if both a D 6 and a are theorems, then 8 is a theorem. Rule 2: (Substitution) If a is a theorem of the propositional calculus, 8 is a propositional variable, and y is a well-formed formula of the propositional calculus, then the formula resulting from uniform substitution of y for 8 in a is a theorem. PARAMETER VALUES Value 1: q Value 2: q D p Now suppose Theorem 4 was derived directly from Theorem 3 by Rule 2. For Rule 2 to have been properly applied in this case, the parameters a,, and Y in the statement of the rule must have had values (p D q) D (PDp), q, and q D p respectively. The machine stores this information in the following net-like structure. Figure 1.

-10The circle is called a derivation node. The rectangles are called formula nodes. The dots are called antecedent nodes (the left one) or parameter value nodes (the right one). The triangles are called flags. Flags with a T on them (called tag-type flags) fly from all formula nodes whose contents are already-proved theorems or legitimate rules of inference. The antecedent node is connected to the theorem or theorems to which the rule was applied. The parameter value node is connected to the values (three of them in this case) for the parameters in the statement of the rule. The lines connecting the parameter node to the values have flags (called parameter-type flags) which indicate which value goes with which parameter. The presence of these flags indicates that the parameter value node is indeed a parameter value node and not an antecedent node. Similarly, suppose Theorem 3 was derived directly from Theorems 1 and 2 by Rule 1; then the machine stores Thm. 1 Thm. 2 7 Thm. 2 Rule 1 A11 Figure 2.

-11 Note: The values for parameters a and a in Rule 1 are respectively the formula which is Theorem 2 and the formula which is Theorem 3. Similarly, suppose Theorem 5 was derived directly from Theorems 4 and 2 by Rule 1; then the machine stores Figure 3. Of course if the machine wants to store all three derivations it can economize on space by not repeating identical nodes as we have above. By such economy the resulting net becomes as in Figure 4. The machine stores all such derivations together in one interconnected net called the memory net. This net will contain all the axioms,rules, previously proved theorems, etc., which the machine wants to remember.

-12 A portion of a memory net. Nodes shown here may have additional connections (not shown)to each other and to other parts of the net. Figure 4.

-13 To construct a new derivation the machine must select the proper rule, theorems, and parameter values from the nodes already in the memory net, then construct arrows and nodes connecting these in order that the desired derivation is included in the net. In one sense, then, the memory net may be regarded as a passive part of the machine which is acted upon by the active part of the machine called the effector. The effector performs the searches for the proper rules, theorems, and parameter values, and performs the construction operations which alter the structure of the net. 1.6 The Effector Acting Upon the Memory Net 1.6.1 Principles of Effector Action. A major problem, for the effector, is finding the proper nodes to connect in order to derive the desired theorems. (Eg. in Fig. 4, to derive Thm. 4, the effector must find nodes containing Rule 2, Thm. 3, Value 1, and Value 2.) We will describe a class of possible search and connection algorithms in some more detail in the implementation section. For now we will merely mention some principles which guide the search technique. The first principle is that the nodes the effector needs to connect will probably already be fairly close together in the net. Thus if the effector has found a node which it suspects is one of the nodes it wants, it looks nearby to find the other nodes and then checks to see if the proper connections can be made. What do we mean by nearby? In addition to the tag-type flags and parameter-type flags mentioned earlier, there are various sorts of value-type flags (not shown in the preceeding figures) attached to nodes and lines between nodes. These flags contain numbers which are used by the effector in searching and constructing. One such flag on a line connecting two nodes gives a measure of the "distance" between the two.

-14 "Distance" between two nodes not directly connected can be determined from the distance between successive nodes along a path connecting the two. The effector conducts several different sorts of searches. For each sort of search there is a different kind of value-type flag and hence a different "distance" measure over the net. The second principle is that the effector tends to look first at nodes which have been useful in making constructions in the past. Value-type flags attached to nodes tell the "worth" of the node, i.e., they tell how useful the node has been in the past. In addition to being useful in searches, these flags tell the effector which nodes may be forgotten when the machine runs short on storage space. The effector continually up-dates value-type flags to "reward" the nodes which are useful (by raising the number on the value-type flag attached to the node) and bring "closer together" groups of nodes which have been useful in combination with one another (by changing the numbers on the value-type flags attached to the lines on paths connecting the nodes to one another). Of course the act of completing the desired constructions provides new lines which can have flags whose values, in effect, draw together the nodes involved in the constructions. By construction of new nodes a useful corpus of theorems is built up; and by change of numbers on value-type flags a certain amount of adaptation takes place. 1.6.2 Use of the Heuristics that are in the Net: An Example. A search for the nodes required for a complicated derivation would be hopeless if each possible combination of nearby previously rewarded nodes had to be individually and completely tested until a combination was found that worked. The effector needs a way to quickly reject, at least pro

-15visionally, whole classes of possible combinations. Through the use of heuristics the effector can sometimes provisionally reject a combination of nodes quickly after examining only a few of the nodes, thus simultaneously rejecting all other combinations which use those few nodes. Consider a simple example: Suppose the net is constructed as in Figure 4, except that the nodes below the broken line have not been constructed. Suppose that the effector wants to prove Theorem 5 and suspects that Rule 1 is the rule it wants to use. Now it is looking for two theorems to apply the rule to, and two parameter values. Its search will be simpler if Heuristic 1 (see below) is near Rule 1 in the net. Heuristic 1: If y is a theorem of the propositional calculus with D as its major connective and B as its consequent, then B is a theorem. (Note that this is related to and "simpler" than Rule 1.) It is easier to apply Heuristic 1 than it is to apply Rule 1, because one need only look for one theorem, not two. In Figure 5, the solid lines indicate the derivation of Theorem 5 via Heuristic 1. Note that heuristics may be told from real rules of inference by the fact that their tag-type flags have an H instead of a T on them. Also, Theorem 5 has an H on its flag because it was derived via a heuristic, rather than via a rule of inference, and hence there is no guarantee that it is indeed a theorem. It is essential to remember that Heuristic 1 is near Rule 1 in the net but far from Rule 2. (This is indicated by the dotted arrow in the figure.) (In more complicated cases there is a whole class of rules close to Heuristic 1 and we must try them all until we find one that works.) Thus the construction of the solid line derivation node has moved Theorem 4 and Theorem 5 closer to Rule 1, which

-16 — t - - "I /0a _ / / / / /! \ \ / I - \ / I~. short distance Figure 5. is just what is needed to help the search for the proper nodes to permit the application for Rule 1. (The larger the net, the more important this moving becomes.) Thus the only really difficult search remaining is the search for Theorem 2. With each theorem examined in this search, the effector attempts to apply Rule 1 to it and to Theorem 4 so as to yield Theorem 5. Of course, there are several ways this application can be made (depending on parameter value choices etc.), but this problem is small compared to the problem of finding Theorem 2 somewhere in the net. When finally the effector finds that Theorem 4 and Theorem 2 satisfy the conditions required by Rule 1,

-17 then the broken line derivation node in Figure 5 is constructed and the T flag is added to the node containing Theorem 5 because Theorem 5 has now been legitimately proved via a real rule of inference. Thus the search for the proper pair of theorems has been broken into two stages. Instead of checking each pair (p,v) of theorems in the net, the effector first searches for the proper p, until it finds one that satisfies the heuristic. When such a V is found, a construction is made which moves it close to the rule to be used. Then when the effector searches for a (p,v) which satisfies the rule, it is almost certain to pick the P which has been moved, thus effectively rejecting all pairs using a different p. Of course there is no guarantee that the effector has moved the right p. (The idea behind heuristics is only that they help the effector make good guesses.) In the example above the effector wanted Theorem 4 to be i, but Theorem 3 would have satisfied the heuristic just as well. In that case Theorem 3 would have been moved, the effector would have gone on a wild goose chase, and eventually given up and returned to the heuristic to look for something which worked better than Theorem 3. Thus the effector might have picked Theorem 3 by mistake, but at least it would not have picked Theorem 1 or Theorem 2 because they do not satisfy the heuristic. Thus in using the heuristic the effector provisionally rejects all (p,v) combinations in which V is either Theorem 1 or Theorem 2. 1.6.3 Refining a Proof. 1.6.3.1 The General Scheme —refineproof. We have just seen a simple example of replacing an illegitimate "proof" which employs a heuristic with a legitimate proof which employs a rule of inference. The proof employing the heuristic is really a sort of proof "outline", and the replacement process is called refining the proof.

-18 The basic job of the effector is the job of refining proofs. A proof outline, the steps of which are steps using heuristics, is gradually refined until we have a completed legitimate proof using only rules of inference. The above discussion has given a simple example of this process of refining a proof, where the part of the proof being refined was only one step long. More interesting examples will be given in the implementation section. In more complicated examples the effector does not replace the heuristic step directly with a rule of inference step, but rather with another heuristic step which is more detailed. This is replaced by yet another heuristic step, and then another until finally it is replaced by a rule of inference. With each replacement, more conditions are checked, and more of the required nodes are brought in so that at each stage the effector checks to see that it is on the right track. (If it is not, it goes back a step and refines differently.) In the above refining process the single step of the proof "outline" becomes a single step of the final proof. In general, however, this will not be the case. In many cases a stage in the refining process will consist in replacing a single step (i.e., single derivation node) with two (or, rarely, more) steps (i.e., two derivation nodes). Thus what was at first a single step in the proof "outline" can end up as a whole complicated derivation consisting of many steps in the final proof. We will give some examples in the implementation section. The actual procedure for constructing a proof is to begin with a one-step proof "outline" and then refine it stage by stage. The heart of the effector is a recursively written program called refineproof. Its job is to completely refine a single step of a proof "outline". This is how it works. Suppose refineproof is presented with

-19 a single step of a proof outline. If the step employs a rule of inference then refineproof is finished. If the step employs a heuristic then refineproof tries to replace this step with a new, more refined step or steps as discussed above. If this is successful, then refineproof has completed the first stage in the process of refining the single step. refineproof then calls itself recursively and applies itself to the new proof step, or successively to each of the steps if there is more than one. If the result of these applications is a complete legitimate proof, then refineproof is finished. If a complete legitimate proof is not the result, then the first stage in the process of refining the original single step was probably performed incorrectly. refineproof goes back to the original single step proof and tries again, this time replacing the single step with a step (or steps) different from the step (or steps) used in the previous unsuccessful attempt. Note that after each stage in the refining process, all pertinent formulae in the net, while they may not yet be proved, may be said to be "semi-proved" in the sense that the node in which they stand is at the end of a proof "outline" the steps of which may employ heuristics as well as rules of inference. The procedure, then, for constructing a proof is to begin with a one step proof "outline" and simply present this step to refineproof. It matters very little what the original one-step proof outline is; it is the connections between the heuristic used and the other heuristics and rules in the net that governs the effectiveness of the refineproof procedure and hence the usefulness of the heuristic used. The following heuristic is as good as any for use in a one-step proof outline which is to be refined. Heuristic 2: If a is a well-formed formula of the propositional calculus,

-20 then a is a theorem. 1.6.3.2 prove —and the Problem of Loose Ends. Consider a situation in which refineproof is faced with the problem of replacing a single step of a proof outline with two steps (i.e., of replacing one derivation node with two). For this task it calls on a program called prove. Remember that if the heuristics being employed are good ones, most of the nodes which prove needs in order to construct the new two-step derivation are nearby. Suppose the step to be replaced was a step which derived formula a. prove now tries to construct a new and more detailed proof outline from nearby nodes such that the formula derived is a. In doing this it builds backwards from a. That is, it tries to build the "second"step of the new two-step outline first. If it has selected a prospective rule and parameter values for the second step, it can tell easily (and recurisvely) what formulae need be attached to the antecedent node to make the derivation work. (These formulae may not yet be in the net, but prove can construct them.) prove constructs the required formulae and attaches them to the antecedent node. Now these formulae may not yet have been proved. In fact, as the net stands, they may not even be "semi-proved" in the sense discussed above. They are "loose ends"; they are formulae from which something has been derived but which have not themselves been derived from anything. prove then tries to construct a proof outline of each of these in turn. Whenever a formula at one of these "loose end" nodes is also present at a nearby node which is not a "loose end" node, then the solution is trivial; prove simply merges the two nodes which contain the same formula, and the "loose end" disappears. This solution, however, will not eliminate all "loose ends". For example, one which cannot be so eliminated

-21 contains the formula 8, which is supposed to eventually be derived in the first step of the two-step outline that prove is constructing. prove tries to construct a derivation of 6 in the same way it constructed the derivation of a, thus producing a new set of "loose ends". If prove is lucky, these "loose ends" will all contain formulae present in nearby nodes which are not "loose ends". Then prove gets rid of "loose ends" by merging nodes containing identical formulae, as we discussed above. At last there are no more "loose ends". Then and only then is control returned to refineproof which tries to further refine each step of the new twostep outline. Notice that no attempt is made to refine the second step before the first step has been constructed and all "loose ends" have been tied up. It is important that this not be done. The heuristic used in the second step is supposed to be testing whether or not the machine is on the right track. Remember that the formulae at the "loose ends" were manufactured specifically to allow the heuristic to work; thus the most important part of the test is deciding whether or not the "loose ends" can be tied up. If they can not, then any "successful" job of refining step two will leave the machine with a derivation which, while it looks fine by itself, is a derivation that should be rejected since it does not fit into the overall proof. Furthermore, this derivation is a member of the class of derivations that the heuristic used in Step 2 was specifically designed to reject. If the "loose ends" cannot be tied up, prove must reject the Step 2 that it has constructed and try to construct a different one. Thus it is important that prove keep careful track of which derivations still contain "loose ends" and which do not. For this purpose it uses the tag-type flags which contain an H. Any node which is part of a

-22complete proof "outline" (with no "loose ends") in the net has an H flag attached to it. Other nodes do not. For example, no "loose end" has an H flag. Thus refineproof only attempts to refine steps in which all nodes have H flags. After each successful stage in the refining process (even if refineproof is ultimately on the wrong track) all pertinent formulae have H flags. 1.6.4 Saving Examples of Derivations Employing Heuristics. After a proof outline has been successfully refined to a legitimate proof, the outline and the partially refined intermediate steps in the refining process can be forgotten (i.e., eliminated from the net), but frequently it is better not to. For one thing, each of those derivation steps helps connect the heuristic used with the heuristics or rules of inference used in the more refined version of that derivation step. For example, in Figure 5, the derivation in solid lines helps make a closer connection, via Theorem 5, between Heuristic 1 and Rule 1. Thus the successful partnership of Heuristic 1 and Rule 1 is re-enforced by remembering a case in which the partnership was useful. There is another reason for saving previously successful proof "outline" steps: they give information about the exact way in which the heuristic used is related to the rules which replaced it when the "outline" was refined. To see how useful this information can be, consider the example shown in Figure 5. This example shows some details about a common use of Heuristic 1 (in fact, practically the only use of this heuristic). A single-step derivation using Heuristic 1 is usually replaced by a singlestep derivation of the same theorem, this one using Rule 1. Usually the Heuristic 1 derivation and the Rule 1 derivation are related as follows:

-23(A) The parameter value for the parameter g is the same in the two derivations. (B) The parameter value for y in the Heuristic 1 derivation is one of two formula nodes attached to the antecedent node in the Rule 1 derivation. (C) The other formula (call it i) node attached to the antecedent node of the Rule derivation is also the parameter value node for the parameter a in the Rule 1 derivation. This formula node is, in general, not part of the Heuristic 1 derivation. Given any Heuristic 1 derivation, and with the above facts at its disposal, prove can almost construct directly the Rule 1 derivation to replace it. The only thing needed is the formula node i for which prove must search. prove examines the appropriate formula nodes in turn, testing each one to see whether or not it would make a legitimate i. When it has found one that would (Theorem 2 in the Figure 5 example) it constructs the appropriate derivation. The test of a prospective i will be efficient and direct if use is made of facts (A), (B), and (C) above. If prove does not have facts (A), (B), and (C) at its disposal, the testing of a prospective i will be much more tedious. For each prospective i, the i will have to be combined in various ways with the formula nodes in the Heuristic 1 derivation. Without facts (A), (B), and (C), prove would have to try, in the example of Figure 5, ridiculous combinations such as

-24 as well as the correct combination which is Re 1 Thm. 5Figure 7. Thm. 5 \ If prove has posession of facts (A), (B), and (C), it can limit combinations tested to the one shown in Figure 7. How does prove obtain these facts? By reference to previously proved problems in which a Heuristic 1 derivation was replaced. If proof outlines and partially refined intermediate steps of previous problems are retained in the net, then attached to the Heuristic 1 nodes are various derivation nodes of Heuristic 1 derivations in previous problems. prove selects one of these. Suppose from some previous problem's proof outline, it selects a Heuristic 1 derivation of a formula n. Now when that proof outline was refined, the Heuristic 1 derivation of n was replaced (though the Heuristic 1 derivation was retained in the net) with a Rule 1 derivation of I, and this derivation is still in the net. Thus r is surrounded by exactly the same structure that Theorem 5 is surrounded by in Figure 5. In doing a new problem, prove simply tries to mimic the structure of this previous model problem. Thus it automatically restricts its attention to derivations of the form of Figure 7, because that is the form of the derivation it is mimicking; that is what worked before. Model problems successfully mimicked are rewarded so that they become more likely to be used next time. A large part of the machine's adaptation takes place in this way.

-25 Of course such mimicking does not always work, as, for example, when a heuristic is being tried for the first time. In this case we resort to simply checking which nodes are near the heuristic, and trying to combine them as discussed earlier. In any case, it is important that the rule or rules with which the heuristic is to be replaced be close to the heuristic in the net (since this is how candidates for replacement are selected). When the mimicking technique works, the very lines which connect the heuristic with the rule or rules (thus causing them to be close together) give the relationship between the structure of the heuristic derivations and the structure of their replacements. The behavior of prove will be discussed more explicitly in the implementation section. Thus the net contains not only completed proofs, but successful partly refined proof outlines. The role of the heuristics in achieving the solution is thus preserved.

-26 1.7 Generation of Heuristics In Section 1.3 we mentioned that once the machine proves a theorem which relates structure of LISP programs to their performance, the machine can often use the theorem to help generate a new heuristic directly from a rule of inference. The generation of a new heuristic is, in form, much like the derivation of a new theorem. In each case, a derivation node is added to the memory net, and from it an arrow points to a new formula node containing the new formula or the new heuristic (written in the object language of S) as the case may be. For example, suppose the machine were to generate Heuristic 1 directly from Rule 1. The record of the generation would look (in part) as shown in Figure 8. (In Section 3.8 we shall discuss the nodes not shown here.) This record looks just like a step in a proof outline, and, T/ T Rule 1 * Theorem relating structure to performance. Hrstc. 1 Figure 8. in fact, as we shall point out in Section 3.8, that is what it is. (We need the formalism of Sections 2.1 to 3.7 to see why it is genuinely a step in a proof outline.)

-27Now suppose Heuristic 1 is employed in a step of a proof outline as it was in the solid line construction of Theorem 5. The effector wishes to replace the derivation employing Heuristic-1 with a derivation employing a rule, a rule which is nearby. What rule is nearby? Since Heuristic 1 has just been generated and has never been used before, it is not connected to any rule by means of a sample problem which has been saved in the manner discussed in Section 1.6.4. Its only close connection to a rule is its connection to Rule 1 via the derivation shown in Figure 8. This connection is what was indicated cursorily by the dotted line in Figure 5. This explains the connection via the dotted line in Figure 5, the connection that was, in that example, so important to the effector in deciding which rule to select in trying to replace the heuristic derivation. The situation we have described is rather typical. When faced with the problem of replacing a derivation employing a brand new heuristic, the effector, not having any previous sample problem to look at, will look at the nearby rule and this will be the rule from which the heuristic was generated. If our scheme for generating heuristics from rules is at all reasonable, this is just where the effector should look.

-281.8 Questions Beyond the Scope of the Paper The implementation section will show the existance of at least one interesting adaptive theorem proving machine which treats heuristics in a general way. Thus it will show that the class of such interesting machines is non-empty. Eventually one would like to discuss performance of members of the class in various problem environments (i.e., various sequences of problems presented, and various rewards given the machine for correct answers —see Section 1.4.). One would compare the performance of various machines in representative environments. From this would arise a meaningful classification of environments based on behavior of various machines in those environments. For any two machines, there will be environments in which the first adapts more rapidly than the second and others in which the second adapts more rapidly than the first. Any measure of adaptability is, then, environment dependent, and any meaningful classification of environments must depend on adaptability of various machines when faced with them. Such a theory would allow us to compare machines with each other. The value of such a theory would be that it would in fact be a general theory of heuristics. It would allow us to compare one machine with another on the basis of heuristic generation methods (structure) and adaptability in various environments (performance). The question as to whether any machines in the class are interesting is, of course, a subjective one. An answer would require establishing the adaptability of a member of the class not in the various categories of environments we discussed above, but in a large number of sample environments chosen for their intuitive interest (a characteristic not considered in the above method of classifying environments). This would have to wait for the actual construction of the machine which we claim to be interesting.

-29 We have taken a small step in this direction. We have shown in the implementation section that there is a member of our class of machines which performs as well as Newell, Shaw, and Simon's machine in the environment used by Newell, Shaw, and Simon. (This is generally recognized to be an intuitively interesting sample environment.) We have further shown that it is easy to incorporate into this machine, in a very general way, the improvements which naturally come to mind. (Many were suggested, but not all tested, by Newell, Shaw, and Simon.) In most cases, these improvements are implicitly in our machine already since their incorporation means not the construction of a new part of the machine, but the addition of a few heuristics to the initial memory net.

2. THE FORMAL SYSTEM 2.1 Basic Structure of the System 2.1.1 Overview of System 2.1.1.1 Plan of Discussion. We now begin the description of the formal system which is the basis of the adaptive theorem prover. The formal system has the following parts: A. A set of symbols called the alphabet. B. A set of formation conventions by which the members of the alphabet may be combined to form expressions of various kinds, one kind being called formulae. C. A finite collection of formulae called axioms. D. A collection of rules of inference which when applied successively to the axioms generate an infinite set of formulae called the set of theorems. E. An infinite set of expressions called the set of well-formed expressions. This class is recursive relative to the set of theorems, and includes the set of theorems. F. An intended interpretation for the well-formed expressions such that the theorems become true statements. (When we say that a formula holds, we will mean that it is true in the intended interpretation.) These parts will all be described formally in later sections. We begin here with an informal description. We can think of the system as being obtained by a series of transformations on a finitely axiomatized first order formal arithmetic. Each transformation preserves consistency so that our system will be just as consistent as the original arithmetic. The formal arithmetic will be one with propositional variables, individual variables, predicate variables and individual function variables, as well as the binary predicate constant = interpreted as equality, the individual constant 0 interpreted as zero, the -30

-31 unary individual function constant S interpreted as the successor function, the binary individual function constant + interpreted as addition, and the binary individual function constant * interpreted as multiplication. We shall first discuss briefly the formal arithmetic and then discuss each transformation in turn. Each transformation is characterized by an addition or subtraction of symbols from the alphabet together with concomitant changes in formation conventions, axioms, rules, and the set of well-formed expressions to allow the new symbol to be incorporated into the language and to permit the intended interpretation we wish it to have. In this Section (2.1.1) we shall discuss in turn each transformation by discussing the symbols added or subtracted and discussing their intended interpretation. We will indicate from time to time what the concomitant changes are in the formation conventions, axioms, rfles, or set of wellformed expressions. We will not, however, give these concomitant changes in detail. We shall reserve such formal discussion for later sections where we shall present our formal system explicitly. The discussion in Section 2.1.1 should give the reader an overview of our system. The overview will provide a certain motivation for the notation to be described in the later formal discussion. The notation used there might be rather confusing otherwise. As we mentioned, each transformation of the formal arithmetic is characterized by the addition or subtraction raction of certain symbols from the alphabet. The first transformation is characterized by addition of i. The

second transformation below. Transformation transformation 1 transformation 2 transformation 3 transformation 4 transformation 5 transformation 6 -32is characterized by addition of X etc., as indicated Characterization add i add X add label, cond, and pcond subtract + and add * and nil add Pv, Iv, Pfvb, Ifvb, newpv, newiv, newpfvb, and newifvb; subtract tv, I, Pf, and If transformation 7 subtract S and 0 and add qu When we have finished the last transformation we will have arrived at our system. The language of our system is explicitly self-referential rather than being explicitly about numbers and only self-referential via a Go5del numbering as is the formal arithmetic. 2.1.1.2 Pertinent Properties of the Formal Arithmetic. We begin with a formal arithmetic with plus and times. Since we will be changing the axioms anyway it is not crucial just which axiomatization we begin with as long as the axiom set is finite. In order for the axiom set to be finite, we employ predicate variables and individual function variables, with rules of substitution for these variables. For example, we can use as logical axioms and rules of inference, the axiomatization of Church's system F21 [p. 218-219, Church, 1956] with suitable modification of the alphabet, formation conventions, and rule of inference *404 to include individual function variables, and the various n constants. (We regard *404 as a single rule.) n

-33The additional axioms to take care of the various constants could be, following Mendelsohn [Mendelsohn, 1964] for equality(using infix notation): x = x x = y D (P(x) D P(y)) (P is a predicate variable) for successor, the Peano axioms: (0 = S(x)) S(x) = S(y) D x = y P(0) D ((V (x) (P(x) D P(S(x)))) D (x) P(x)) (where our V(x) is Church's (x)) for addition.(using infix notation): x + 0= x x + S(y) = S(x+y) for multiplication (using infix notation): x * 0 = 0 x S(y) = y + (x * y) As we mentioned, our system will use both individual function variables and predicate variables as well as propositional variables and individual variables. The propositional variables are: p, q, r, s, p, q,, r,, s, p,.... The individual variables are: x, y, z, u, v, w, x,, y,, z,, u,... Each of these letters,together with its numeral subscript, is regarded as a single symbol of our alphabet. The alphabet is thus infinite. In the case of predicate variables,we shall use the subscript I to indicate the number of arguments. The I is a separate symbol of our alphabet. Unary predicate variables: PI, QI, RI, I, O'II R,' PI"... Binary predicate variables: PI I, QI I' RII II Q'I,' R'I, PI,..,1, I' I I I' I'' Ternary predicate variables: PI II QI,II RIII' PII' etc.

-34 Thus the ternary predicate variable II is made up of four symbols: the so-called base symbol e? to which has been added three subscript symbols, I's. These predicate variables, then, are formed by adding subscript I's to the predicate variable bases: P, Q, R, P., Q,, RP, P2,.... Each predicate variable base is a symbol of the alphabet. In this discussion we shall sometimes omit the subscript I's when the predicate variable occurs within a formula and the number of its arguments is clear from context. Thus, in the above axioms we have written P in place of PI. This is only an abbreviation for purposes of brevity. For example, the second axiom of equality is really x = y D (PI (x) D PI(y)) in spite of the fact that we have written, and shall continue to write, x = y D (P(x) D P(y)). The notation for individual function variables is similar to the notation for predicate variables. The individual function variable bases are f, g, h, f,, g,, h,, f,... To these we add the subscript I's to indicate arguments. For example, fI I I is a ternary individual function variable. Again, when the arguments are clear from context we shall sometimes omit the I's. The predicates differ from individual functions in that their range is the class of truth values rather than the class of individuals (in this case, numbers). In both the predicates and individual functions discussed above, the domain for each argument was the class of individuals. We can naturally extend the notion of predicate and individual function to allow a domain to be the class of truth values, or even the class of individual functions or predicates. Thus Dnames a binary predicate, each of whose arguments is to be a truth value. P tv is a binary predicate variable ranging over such predicates. The use of the subscript tv (a separate

-35 symbol of the language) in place of I indicates the truth value nature of the arguments. Thus we use I to indicate individual-type arguments and tv to indicate truth value-type arguments. We also use Pf to indicate predicatetype arguments and If to indicate individual function-type arguments. Consider, for example, an individual function like the LISP function mapZist. This is a function of two arguments. The second argument is to be an individual (in this case a so-called S-expression). The first argument, however, is to be an individual function of one argument. More specifically, it is to be all individual function whose single argument is to be an individual. fIf' I is an individual function variable ranging over individual functions of the mapZist type. The second subscript is I, indicating that the second argument is to be an individual. The first subscript is If, indicating that the first argument is to be an individual function. Since the single argument of this function is to be an individual, the subscript If is itself subscripted with a single I. This subscripting of subscripts may be repeated any number of times, so that the subscripting on variables may become rather complicated. However, the substitution rules for these variables are quite straightforward and are rather obvious extensions of Church's rule 404n; hence we shall not give the substitution rules here, but wait until Section 2.1.4.4 when we can give them in our final notation. As we said above, we shall often abbreviate predicate variables and individual function variables by writing the predicate variable base or individual function variable base without subscripts whenever the needed arguments are clear from the context. For example, we can abbreviate ftv.Ifp IP tvI x)PI as f(P(p,x),gpfI I) tv Pf, I tv, I PI'I

-36 Another useful abbreviation we shall employ is our abbreviation for the so-called numerals. A numeral is any one of the following sequence of expressions: 0, S(0), S(S(0)), S(S(S(0))),... etc. In our discussion, we shall often abbreviate 0 as 0, S(0) as I, S(S(0)) as 2, S(S(S(0))) as 3, etc. If k is a natural number, we call k a numeral. The k's are not new symbols of our language, merely abbreviations we will use in this discussion. This abbreviation makes it easy to state our representability property. If A is a formula with free individual variables x,...,x then let x,...,x o n o n A-, stand for the formula derived from A by uniform replacement k, K X... X' n o n of k,...,k for the free occurrences of x,...,x. We say an n+l-ary relation on integers is weakly representable whenever there is a formula A such that for any natural numbers k,...,k, the relation holds on x,...,x o n o n k,...,k if and only if A,. is provable. o n In formal arithmetic with plus and times, every recursively enumerable relation is weakly representable. [Mendelsohn, 1964] 2.1.1.3 Addition of i (representability of individual functions). The graph of every n-ary partial recursive individual function is an n+l-ary recursively enumerable relation, and is hence weakly representable. If the partial recursive individual function is a polynomial function, then its graph is weakly representable by a formula of form x - a, where x is the individual variable for which is to be substituted the name of the element of the function's range, and a is a term not containing a free

-37x. When this is the case, I shall say that the term a mimics the individual function. For each partial recursive individual function, we would like to find a term which mimics the function. In formal arithmetic with plus and times, this can be done only for polynomial functions. Let us add to the system of formal arithmetic, the new variable binder i. If A is a formula with free variables x,...,x, then X ~...~XO n I(x ) A is a term which means intuitively: the unique value for o x,,...,xn x such that A holds, when such a value exists. When none such O x,...,x 0 n exists, the term will have an undefined value. Suppose we add axioms to the system to implement this meaning. Now given any n-ary partial recursive function, its graph is weakly representable by some formula A. Hence the formula x0 = (Xo) A is true x,...,x o x,...,xn o n o n for values satisfying the graph. Whether this second formula weakly represents the graph depends on whether the formula is provable in the cases mentioned above where it is true. Proving the second formula for values satisfying the graph is harder than simply proving A for these values because we now have to prove the uniqueness of the x's for each set of xl,...,xn's in the domain of the function. Because of the incompleteness of formal arithmetic, then, there will be formulae A such that: X.,..,X O n A weakly represents the graph of a partial recursive function P, but O n the formula x = (x ) A does not weakly represent the graph of. o n The following however is true: Given a partial recursive function P, there is a formula A such that both A and x = l(x ) A Xo,...Xn Xoo. n o n...

-38weakly represent the graph of (. (In most practical cases, if A Xo,...'' 0 n weakly represents the graph of a partial recursive function (, so does x = i(x ) A ) Thus we can mimic any partial recursive function 0 0 X,...,X 0 n we want to. 2.1.1.4 Aadition of X (ordinary - names of functions and relations). So far our system contains names for only a few individual functions (Eg. plus and times). We shall introduce names for all the partial recursive functions. We shall do this via the Church X notation. If B is a term which mimics an individual function, then the expression X1' X,..., ( x, (xl...j x ) X Bx ) is a name for the function. (Similarly, if n x- x A is a formula which represents a recursively enumerable Relation, X,...,x o n then the expression ( X, (x0,..., xn), A ) is a name for that o n relation.) We shall add axioms to the system to implement this meaning in the proof structure. Since for any partial recursive function there is a term which mimics it (and for any recursively enumerable relation there is a formula which weakly represents it) we have, in the above way, provided an expression naming each partial recursive function and recursively enumerable relation. We shall call such expressions ordinary-names of the functions and relations. 2.1.1.5 Addition of label, cond, and pcond (algorithmic-names of functions and relations.) Our interest in such functions and relations stems from the fact that they possess algorithmic evaluation procedures. When we can find such an algorithm we will write an expression which describes the algorithm. This expression will be in a LISP-like notation (my LISP is like LISP 1.5 [McCarthy 1962 ] except that

-39 I have made some minor changes in notation and one significant change in the evaluation procedure ) and will be called an algorithmic name for the function. The structure of such an expression reflects the particular algorithm it describes. The notation of our system is so close to LISP notation that any polynomial function has an ordinary-name which is also an algorithmic-name. The only added symbols needed to complete the LISP repertoire for writing functions defined over non-negative integers are the symbols label, cond, and pcond.(pcond is an alternative spelling for cond. It is used in certain cases described in a later section,) label is the symbol which is used to indicate explicit definition by recursion. cond and pcond are especially useful in such definitions. Our usage is almost identical to LISP usage. We will not describe the usage in detail here, but the reader unfamiliar with LISP usage may benefit from the following example: The expression (cond, (acl,B),(a, 2 2 )...,(a n,B)) which we abbreviate to [al- 1;a2 >2';..; an 6 ] has the same value (for a particular set of values for its variables) as does.j, where j is the smallest number such that a. has value true. We use label, combined with cond to write an algorithmic-name such as this one for the primitive recursive function factorial: (label,f, (X,(x), [x = 0-+ 1; -+ x ~ f(x - 1)])). For purposes of illustrationwe have used here a subtraction symbol which we have not yet defined. Note that this expression reflects the usual primitive recursive definition of factorial. The dummy function variable f which follows the label is to be regarded inside the definition as naming the function we are trying to define. One can use label in recursions which are not primitive recursions.

-40We need to do more than merely apply algorithmic names, LISP fashion. We need to put rules in the system which allow us to process and modify these algorithmic names according to various rules of inference so that we can compare them with each other and with ordinary names. An example of such modification would be: substitution of argument terms into the matrix of a X expression according to one of the rules of inference (analogous to a stage of LISP application of a function). An example of comparison would be: if' is an ordinary-name and + is an algorithmic-name, then we will want to prove ~(x) =,(x) if we can, so that we can then substitute * for k in theorems, thus turning these theorems into statements about the performance of the LISP program *. Unfortunately if a function or relation is not a total function or relation, it may have an algorithmic name whichwhen processed in the ways we want to process algorithmic names,would produce a contradictory statement and ruin the consistency of the system. For example, consider the function name (label, f, (X, (x), (f(x) + 1))) which we abbreviate as 4. Then ~(y) (by the recursive definition of 4) evaluates to 4(y)+l. We can even prove ~(y) = 4(y) + 1 from p(y) = p(y) by the process of partial evaluation of the right hand side. Hence 4(y) A# (y) is provable and the consistency is ruined. We want to declare such "contradictory" algorithmic-names illegal. It turns out, luckily, that we have no need for these contradictory algorithmic names, because every partial recursive function has at least one non-contradictory algorithmic name.

-41How do we set up the formalism so the contradictory names are never used? All contradictory names contain label, but beyond this they have no easily recognizable structural characteristic. We don't want to limit ourselves to only complete functions. For example, we can easily allow all ordinary names to be used. They are all non-contradictory. Here is what we do. We carefully control which label-containing names are used. We start with a finite set of such names called the initial set of algorithmic names. (These will be the names occurring in the axioms.) We then write the rules of inference so that no new label is introduced unless either: (1) It is part of a name which has appeared previously in a theorem; or (2) it is part of a new name which has been generated from an already occurring name according to a special procedure which guarantees that the new name will be non-contradictory. This is an oversimplified description of a scheme which will be described in detail in a later section. When a theorem is proved which contains a function name not occurring in any previous theorem (whether or not the proof employs (2) above to introduce a new label) then we say that that function name is generated.

-42 A function name which can be generated, is called generatable. (The initial set of algorithmic names is thought of as already generated and hence trivially generatable.) We have now arrived at the following: Given any n-ary partial recursive function with ordinary name 4, we can write a LISP program 4 which calculates its value. 4 is then an algorithmic name of the function. We can even write the LISP program such that 1 is a non-contradictory algorithmic name. Then V(x1)... V(xn) ( (x,...,x) = i(x,...,Xn)) is true, but 4 might not be generatable and even if it is, the above formula might not be provable. Footnote: Sections 4.10.6 and 4.10.7 will illustrate situations where 4 is generatable, but not generatable from f, and where V(X)... V(xn) (X1(Xl...,xn) = (xl,...,xn)) is probably not provable. However by picking the proper axiom set we can ensure that every partial recursive function has a generatable algorithmic name. Footnote: we need only be sure that the class of functions which have generatable algorithmic names is closed under primitive recursion and the p operator. Section 4.10.7 will indicate how to show closure under the p operator. Similarly, given any n-ary recursively enumerable relation with ordinary-name p, we can write a LISP program V such that V is a non-contradictory algorithmic name and V(x1)... V(xn) (((xl,...,Xn) -' (Xl,...,Xn)) is true.

-43 Again T might not be generatable and even if it is, the above formula might not be provable. However, by picking the proper axiom set we can ensure that every recursively enumerable relation has a generatable algorithmic name. When the above equivalence statements have been proved, we can use the substitutativity of = and = (subject to the proper restrictions made explicit in our rules of inference) to change theorems using ordinary names to theorems using algorithmic names. (Theorems using algorithmic names can often be proved more easily without use of the above equivalence statements.) Thus we can prove a class of true statements about LISP programs. Of course, we cannot prove all true statements because the formal arithmetic we began with was incomplete, and the incompleteness is retained through each of our transformations of the system. (We shall prove the incompleteness of the final system.) Thus incompleteness arises from the self-describing capability of all these systems, a capability of which we shall make explicit use in our final system. 2.1.1.6 Elimination of + and. Note that it is possible for the names in the initial set of algorithmic names to be names of functions which have no ordinary names. For the system as described so far, the initial set of algorithmic names is empty. Let us write algorithmic names for plus and times such that the symbols + and ~ are not used in the names. (The symbol S is used. The names are simply the LISP programs implementing the primitive recursive definitions of plus and times in terms of successor!) Let us abbreviate these two names as e and o. (These are not new symbols, just abbreviations we will use here to represent the algorithmic names.) These names are non-contradictory since they are names of complete functions.

-44 Then let us put @ ando in the initial set of algorithmic names. (This could be done by adding the axiom 3(x) (x = ((yoz)ew)). In this section (2.1.1) we will mention many algorithmic names which we want in the initial set. We shall wait until we have a long list of such names before we discuss which axioms are to be added to include the names in the initial set.) It is easy to see, from the meanings of X and label discussed above and expressed explicitly by our rules of inference, that if e and o are in the set of initial algorithmic names, then for each axiom A. using + and, and hence for each theorem A using + and *, there is a theorem Ao, formed from A+ by uniform replacement of e for + and o for *. Furthermore, none of the formulae in the pro6f of A.o contain + or ~. Conversely, for any theorem Ao using ~ and o, the formula A is also a theorem. Let us eliminate the symbols + and' from the language, and all the formulae which use them. We thus remove the axioms which defined + and *. Now any relation which was weakly representable by a formula using + and e is still weakly representable by a new formula obtained from the old by replacement of + by Q and * by o. Thus any function or relation which had a name before still has a name. We have eliminated + and ~ from the language without destroying the representability of any function or relation. The only individual function constant symbol remaining in the language is S. 2.1.1.7 An Operation on S-expressions: Addition of * and nil. Let us consider a G6del numbering of the expressions in our language. We first state what we mean by an S-expression (as in LISP): (a) Each symbol (or atom) in our alphabet is an S-expression. (b) If a and 8 are S-expressions, so is (a. 8), which is called the "cons" of a and 8 If yis the cons of a and 5 then a is called the "car" of Y and 5 is called the "cdr" of y. (Note: in this scheme, the parentheses and dot are

-45not called numbers of the alphabet, but are introduced into S-expressions as part of the process of concatenating numbers of the alphabet.) By writing our syntax rules carefully (see Section 2.1.2.2.), we can arrange matters so that all the expressions in our language are S-expressions, though, of course, not all S-expressions are expressions of our language. (Then the expressions we are writing in this text may be regarded as abbreviations of the appropriate S-expression. The abbreviation conventions are specified in Section 2.1.2.) In order to make it easier to write our syntax rules, we add the symbol nil to our alphabet. This symbol will be used as a sort of delimiter in constructing the expressions of our language. Its precise use is explained in Section 2.1.2.2. Counting variables, our alphabet is countably infinite. Let us order the alphabet (excepting nil) in some appropriate order and let a symbol's index be its number in this ordering. Now let us Godel number the S-expressions. To be specific, we could do it as follows. Suppose y is an Sexpression: if y is nil then its Godel number is zero; if a is any other atom (other than nil) with index n then its Godel number is 2n - 1; if y is the cons of a and B and if n is the Godel number of c and m is the Godel number of B, then the Godel number of y is (2n + 1)2+l The important point is that for each number, n, there is an S-expression with Godel number n. Let us write the abbreviation a for n whenever n is the Godel number of a. Now, via an algorithmic name for the exponential function, we can easily write an algorithmic name (which we abbreviate as & ) for the function cons. That is, if a has Godel number k, B has Godel number j, and the cons of a and 8 has Godel number n, then n = (k ~ j) is provable. Or equivalently, (a 8 ) = ( @ I') is provable. Let us put * in the initial

-46 set of algorithmic names. Now let us add the binary individual function symbol * to the language. It is defined by the new axiom x * y = x * y. * is thus a symbol for the function cons in the same sense that ~ is an algorithmic name for the function cons. It is trivial now to write ordinary names (using t ) for the functions car and cdr. (car and cdr of an atom will be the atom itself.) We will abbreviate these names as a and d respectively. Thus, whenever, as before, n = (k * j) is a theorem, so is k = a (n) and j = d(n). The names a and d contain the symbol *, but do not contain the symbol S. 2.1.1.8 Addition of Pv, Iv, Pfvb, Ifvb, newpv, newiv, newpfvb, and newifvb; subtraction of tv, I, Pf, and If. We would next like to eliminate S from the language. We eliminated + and * by re-defining them in terms of S. Now that we have introduced * we would like to eliminate S by re-defining it in terms of *. Unfortunately we are foiled by the fact that our alphabet is infinite, because of the four infinite sets of variables and variable bases included in it. As we mentioned in Section 2.1.1.2, these sets are: 1. The propositional variables {p, q, r, s, p,, q,,...} 2. The individual variables {x, y, z, u, v, w, x,, y,..9} 3. Predicate function variable bases {P, Q, R. P, Q.,...} 4. Individual function variable bases {f. g, g, f,,...} If we eliminate S we will have no way of handling these sets. In order to handle these infinite sets and the Godel numbers of their nembers, we need predicates: Pv, Iv, Pfvb, and Ifvb true respectively on Gfodel numbers of members of the above four sets.

-47 We also need individual functions: newpv, newiv, newpfvb, newifvb, with the following properties. For any S-expressions, a and B: if newpv (a)= ~ holds then 3 is a propositional variable not in a; if newiv (a)= 8 holds then 8 is an individual variable not in a; if newpfvb (a)= 1 holds then 8 is a predicate function variable base not in a if newifvb (a)=8 holds then 8 is an individual function variable base not in a. We add these predicates and functions in the same way we added *. We first define algorithmic names for the four predicates and four individual functions. We then introduce eight new symbols to be equivalent to these. We proceed as follows. It is easy to use S,, and ~ to define the following algorithmic names. Abbreviation for algorithmic Name Predicate Named Pv True on Godel numbers of propositional variables Iv True on Godel numbers of individual variables Pfvb True on Godel numbers of predicate function variable bases Ifvb True on Godel numbers of individual function variable bases Similarly, we can define algorithmic names abbreviated newpv, newiv, newpfvb, and newifvb. The individual functions they name can be described as follows. (Remember that we have an ordering of variables by their indices.) newpv () = 8 holds iff is the first propositional variable of index > those in the S-expression a newiv (a) = ~ holds iffB is the first individual variable of index > those in the S-expression. newpfvb(3)=' holds iff is the first predicate function variable base of index > those in the S-expression a. newifvb (a)= ~ holds iff6 is the first individual function variable base of index > those in the S-expression a.

-48 Footnote: An example of how this can be done is seen below. We use here notation and abbreviations which are explained in Section 2..1.2. maxx(x, u, y, v, f) =: [u y + y;d + f(x,S(u), y, v)] maxy (x, u, y, v) =: [v = x - x;- + maxx(x, u, y, S(v),mazy)] (maxy causes no problems even though it is not complete.) max x,y) =: maxx(x x,, yy, maxy ) evenn(x, y) =, fx = y +; x = S(y) +;- evenn(x, S(S(y))) ] (similarly this causes no trouble) even(x) =: evenn(x, 0) maxpv(x) =: [x = 0+0; Pv(x) + x; -even(x) + 0; + -* max(maxpv(a (x)), maxpv(d (x)))] nextpv(x) =: [Pv (x) + x;e-+ nextpv(S(x))] newpv x) = nextpv (S(maxpv(x))) As we introduced the new symbol * by adding the axiom x * y= x * y, we now introduce eight new symbols by adding the following eight axioms. Pv(x) -= Pv(x) Iv(x) - Iv(x) Pfvb (x) - Pfvb(x) Ifvb (x) Ifvb(x) newpv(x) = newpv(x) newiv(x) = newiv (x) newpfvb(x) = newpfvb(x) newifvb(x) - newifvb(x) We shall call these symbols the eight special function symbols, and these axioms the eight special function axioms. Now that we have the symbols Pv, Iv, Pfvb, and Ifvb available, we can use them for an additional task. We can use them for the subscripts on

-49 predicate and individual function variables in place of the symbols tv, I, Pf, and If which we have been using. These last four symbols can then be eliminated from the alphabet. Thus a symbol like Iv has two uses. It is sometimes used as a predicate name and sometimes as a subscript in variables. Confusion will never arise between the two uses since the use is always clear from context. As before, we will often abbreviate by omitting subscripts in variables when the argument-types are clear from context. 2.1.1.9 Elimination of S and 0 and Addition of qu. We can now proceed with elimination of S. S occurs both as a component of numerals of the form n and as a symbol not part of a numeral. We shall first introduce an alternative notation for numerals which allows us to write numerals without employing an S. Then we shall find a function expression (which we shall abbreviate as i ) which does not contain S and which we can use as a replacement for S in the same way that we earlier used ~ as a replacement for + and o as a replacement for *. The new way of expressing numerals will depend on the fact that each numeral is the Go5del number of some expression. We will express the numeral n by writing (qu,a), where a is the S-expression whose Godel number is n. qu is a new symbol of the language. In other words, we will allow ourselves to write a as (qu, a ). We thus avoid the use of S unless a contains an S. (qu,a) may be thought of as an individual constant which names a. In this sense, the qu fulfills the same function that quotation marks fulfill in some logical systems and that the symbol quote fulfills in LISP. When we finish this section, the form (qu, a ) will be the only acceptable form for numerals to take in the expressions of our language. Let us begin by insisting that 0 (i.e. O)be written in that form wherever it occurs. Since zero is the Godel number of nil, this means we now replace 0 with (qu,nil) wherever 0 occurs in expressions. Thus 0 may be

-50 eliminated from our alphabet. In our discussion, however, we shall continue to write 0 as an abbreviation for (qu, nil), though 0 is no longer a symbol of our alphabet. For the (qu,a) notation to work for all numerals as well as the old notation, we want a theorem of form a =(qu,a) for each S-expression a. Now in our system, formulae of the following types are theorems, where a and S are S-expressions:'a * = (a.B); newpv(8i = 6 (where X is the first propositional variable of index those in a); newiv (a) = I (where is the first individual variable of index > those in a); newpfvb (S) = 1 (where 8 is the first predicate variable base of index > those in a); newifvb (a)= ~ (where S is the first individual function variable base of index > those in a). If' = (qu,a ) is to be provable for all S-expressions a, then the counterparts of the above theorems using the (qu,a) notation should also be provable. These counterparts are: A. (qua) * (qu,8) =(qu, (a.8)); B. newpv((qu,a)) = (qu,S) (where S is the first propositional variable of index > those in a ); C. newiv((qu, a)) = (qu,B) (where S is the first individual variable of index > those in a); D. newpfvb((qu,a)) = (qu,5) (where 8 is the first predicate variable base of index > those in a ); and E. newifvb((qu,a)) = (qu,B) (where 3 is the first individual function variable base of index > those in a).

-51 Now let us not add the formulae of form o = (qu,a ) to the theorem set yet. (These formulae contain S and we are trying to eliminate S.) Let us first add the formulae of types A through E to the theorem set (they don't contain S, except perhaps inside the a or ( ) by adding to our rules of inference five rules which generate theorems of these forms. We call these five rules, rules A through E. (The reason we need these now is indicated below.) Footnote: Using the notation for rules of inference described in Section 2.1.4.2, and used in Table 5, the rules would be A: T(( x y )= (x * y), B: newpv(x) = yD T( ewpv(qu ) x, C: newiv(x) = y D T( x Ynewi ) D: newpfvb(x) = y D T( x y ), and E: newifvb(x) = y D T( fvb((qu x yJ). We are now in a position to define the function expression which we shall use to replace S. This expression will be abbreviated as I. The symbol S does not appear anywhere in it, though the numerals0, (qu,p), (qu, (p)), and (qu, (p,p)) do appear in it. We add[ to the initial set of algorithmic names. Footnote: Using the notation explained in Section 2.1.2, one way of writing such an [would be as follows: Let us abbreviate an S-expression of the form (p. (p. (p.'* (p. nil) *))),which contains n p's,as n Note,_ is nil. Now let us define a new function with an algorithmic function name which we shall abbreviate as gn. This function shall have the property that for any S-expression a, if n is the Gbdel number of a then gn() )= holds. (We write~ instead of (qu,a ), following the notation developed later. Note: (,, 0, and 0 are abbreviations of the same expression, namely (qu, nil).)

-52 gn could be defined as follows: x 4 y =: [x = 0 + y; I ^d (x) + * y)] x y =: [ x = 0~ 0; + y $ (d (x) " y)] exp(x,y) - [y = 0 + x exp(x, d (y))] For propositional variables we define convert (y) =: [ y = 0 0; +* convert (d(y))] pvindexx (x,y) =: [ -Pv(x) + 0;a (y) = x - convert (y); +- pvindexx (x, newpv(Y) * y)] pvindeXx)=: pvindexx(x, 0) We similarly define: ivindex for individual variables, pfvbindex for predicate function variable bases, and Ifvbindex for individual function variable bases. From this it is easy to define index such that if an atom a has index m then index(~) =$ is provable. gn (x) = [ x = 0 + 0; a-om(x) - d(Q' index(x)); ~+(( ~) gn(a(x)))~ ) esp(, gn(d(x)) I)] Similarly the inverse function gn could be constructed as follows: pvconvert(x) =: [x = 0 + 0; - newpv( pvconvert( d(x))) * pvconvert( d(x))] pvindex' (x) =: a (pvconvert (x)) similarly for ivindex, pfvbindex-, IfvbindexThen we can easily define index-I half (x) = [x = 0 vx = - 40; @* half (dd(x))] even(x) =: x = half (x) oddfactor(x) =: [ even(x) - x; o oddfactor (half(x)) ] twosexpp(x,y) = [ even(x) y; twosexpp (haZf(x), p * y)] twosexp(x) =: tosexpp (x,0)

-53 gn (x) = 0[ 0; cven (x)-+ index (haf ( * x)); ~ -,gn1 (half(d(oddfactor (x)))) *gn l(d twosep (x))) ] Then we can define by S (x) =: gn ( * gn(x)) Now by virtue of the new rules that we have added (rules A - E above) if a and 8 are S-expressions then: S(a) = " holds, if and only if, ((qu,c)) = (qu,B) is a theorem. In such a case, ]((qu,a)) = (qu,6) is proved by successive transformations of the right side of the theorem [ ((qu,a)) = ((qu,c)) according to our rules of reference. This process of transforming the right side into a constant, (i.e. into the form (qu,6) ) is called "evaluating" the right side. The evaluation could not proceed to completion without the presence in the system of the theorems generated by the five rules of inference A - E that we just added to the system. The only reason we added those five rules when we did was specifically to permit the evaluation of ] ((qu, a)). So behaves like a successor function name when it is applied to constants. To see that is really a successor function consider the following: Definition: Let a stand for the S-expression a with suniformly replaced by S and then each sub - S-expression of form (qu,3) uniformly replaced by 8. Now if a is a theorem then a is a theorem, and can be proved in a proof in which qu doesn't appear except as (qu,nil). That this is so can be seen by replacing I with S and (qu,3 ) with f in the proof of a. The result can be easily converted into a proof of a by intercalating lines here and there. (The only interesting intercalation required is in a step of the proof of a in which something of form[ (y) was expanded. But if [ (y) = a is a theorem ( c being an expansion of ] (y)), then S(v) = c is going to be a true statement in arithmetic and will be provable without use of qu. We can then use this theorem in the required intercalation. )

-54Since jnames a successor function we see that E]induces a linear ordering on the set of S-expressions, an ordering that reflects the particular G6del numbering we decided to use. Had we used a different Godel numbering, we would have had a different ]. For example, suppose we pick a different G'del numbering which is just like our old Gb6del numbering except it is based on an indexing of the atoms which, while being otherwise the same indexing we had before, has the indices for the atoms D and V switched. Suppose we defined an based on this Godel numbering. This also would behave like a successor function. Now the basic successor function of our system is S. This is the function about which we are proving theorems. Now how does it behave when applied to constants? This we have not specified. We have specified some things about its behavior on constants when we added rules A - E. But these things are true of both and above. Let us add enough axioms so that W (x) = S(x), is provable. This will end the ambiguity. Let's do this in a perhaps inelegant but certainly straightforward manner, by adding [S(x) = S(x) to the set of axioms. Now a = (qu,a) becomes a theorem for each S-expression a. Also, it is still true that if a is a theorem then so is a where the proof of a does not employ a qu, except in (qu,nil), and also does not use the axiom [ (x) = S(x). Thus we have added no new theorems except those employing a (qu,3) where S is not nil. Hence our system must still be consistent. Now since S(x) = S(x) is a theorem, we can replace S with s in all our axioms without changing the theorem set at all. Now let's remove [ (x) = S(x) from the axiom set. (Deletion of an axiom can never ruin consistency.) Now no axioms contain S. S has become a useless symbol so we eliminate it from the alphabet.

-55 We now have an explicitly self-describing system. We shall no longer regard (qu,a) as a numeral, but rather an individual constant which names the S-expression a. We think of the individual variables as ranging over S-expressions, rather than over numbers. Domains and ranges of functions are no longer ever sets of numbers, but are instead sets of S-expressions. Finally, the expressions of our language may all be thought of as themselves S-expressions. 2.1.1.10 Our Axiom Set. This completes our overview of the system and its derivation from formal arithmetic by additions to and subtractions from the alphabet. In the following sections we shall give a more precise description of the system. We shall specifically state the axioms and rules of inference of the system. It will be clear then how the intuitive meanings of the symbols introduced in this section are implemented. In addition to making precise our description of the axiom system, we shall simplify some axioms and rules of inference somewhat so that the properties in which we are interested will be more evident. (This simplification, while convenient, is not necessary. We could easily implement the suggestions of this section directly. How this would be done will be obvious after looking at our simplified axiom system. Our simplified axioms and rules are listed in Tables 4 & 5. (The format of the tables will be discussed more fully later.)) For example, the Peano axioms using EShave been simplified to their analogues using *. Along with this we have suppressed the ordering of the S-expressions via the Godel numbering. This ordering is useless to us and we only introduced it in order to make the formal connection between our (qu,a ) notation and formal arithmetic. The suppression of the ordering has allowed us to dispense with several old axioms and rules and replace them with new simpler ones. These new axioms and rules

-56 were theorems and meta theorems in the old system, so the replacement creates no inconsistencies. Footnote: The changes involved in the above simplification of the axioms and rules may be summarized as follows: Axioms deleted: Axioms added: Peano axioms for E[ Peano axiom analogues for * 8 - 10 Tab: x * y = x * y disjointness of atom classes 15 Table eight special function axioms, new variables are variables 16 Table Section 2.11.1.8 le 4 4 4 4 new variables are new 17 Table Rules deleted: Rules added: rules B - E Section 2.1.1.10 variables are variables 14 Table 5 different atoms are ~ 15 Table 5 The axioms ~ rules on the right are theorems when the set of axioms and rules includes those on the left. The axioms added can be proved from the axioms deleted. Roughly speaking, rule 14, Table 5 can be proved as a meta theorem, from rules A-E, Section 2.1.1.10, the eight special function axioms, and the Peano axioms for[S; rule 15, Table 5 can be proved, as a meta theorem, from rules A-E Section 2.1.1.10 and the Peano axioms for S. Actually, in addition to making the simplification discussed above, we shall also introduce a complication. We add two new rules of inference (we will call them rules 18 and 19) and we also add a procedure for continually adding more rules of inference. This procedure will take advantage of the system's self-describing capability. The procedure will provide a means to implement the suggestions made in the introduction section with regard to heuristic generation. Since this addition is not a simplification we will have to show that it does not introduce any inconsistencies. In future sections we

-57 will show this and will also demonstrate certain incompleteness properties which follow from our consistency preserving techniques. 2.1.2 Expressions and Their Abbreviation 2.1.2.1 Motivation for Abbreviation. As we mentioned, we find it convenient to define a set of so-called S-expressions and to write our syntax rules so that the expressions of our language are all S-expressions. This practice allows us to state our meta theorems and rules of inference more simply. In our discussion, however, we will not want to write out expressions of our language as S-expressions. We will want to abbreviate our expressions into a notation similar to the usual notation for an applied first order predicate calculus system like ours. (We have used this more usual notation in the preceeding discussion.) Now of course we won't abbreviate all S-expressions. We do abbreviate those which are expressions of our language. We may abbreviate more or less as suits the purposes of our discussion. (Eg., pv (-p) is an expression which is already in abbreviated notation, but it may be further abbreviated to pv-p if we wish.) The reader who is familiar with the usual first order predicate calculus notation is already familiar with most of our abbreviated notation. The reader has already been introduced to our use of the Church A. Our use of label, cond (alternatively pcond), * (LISP cons), a (LISP car), and d (LISP cdr), is the same as their use in LISP, and our abbreviation of cond is just like that in LISP m-notation. (A brief example of the use of cond and label was given in Section 2.1.1.5) We will want to be very explicit in stating our syntax rules so we will state them for unabbreviated expressions. Since we will be writing the expressions in abbreviated form, we need to first specify a procedure by which the reader can unambiguously arrive at the unabbreviated expression

-58 if he is given the abbreviated form. Before we do this we shall give an explicit definition of the set of expressions of our language. 2.1.2.2 Definitions of Classes of Expressions of our Language. The symbols of our alphabet are called atoms; they are given in Table 1. The structures we will be concerned with are S-expressions (not to be confused with expressions or wfexpressions which we shall define later). As in LISP, 5 is an S-expression iff it is an atom or of the form (a.g) where a and 8 are S-expressions. As in LISP, we may abbreviate (a. nil) as (a ) wherever it occurs in an S-expression, and (a. (81,R2,..., 6n)) as (a,B, 2'..., n) wherever it occurs in an S-expression. An S-expression not abbreviated this way is said to be written in dot notation. An S-expression abbreviated in this way is said to be in comma notation. (We shall make a practice of using comma notation.) An S-expression which can be abbreviated in this way into a form without dots is called a list. I shall use lower case Greek letters to indicate strings of symbols with properly paired parentheses and brackets. (This is the way I used them above.) A type is an S-expression of one of the following forms: 1. Pv 2. Iv 3. (Pfvb, al, a,..,a) where each a. is a type 1 4. (Ifvb, a1' a2"' a ) where each a. is a type We define a predicate, whose name we abbreviate as typep true on any S-expression of one of the above four forms. We now define the subset of S-expressions which follow the syntactic rules of our language. These we shall call expressions. With each expression we will associate an S-expression of one of the above four forms. This S-expression will be called the expression's type.

-59An expression of type Pv is called a formula-type expression. An expression of type Iv is called a term-type expression. An expression whose type is of form (Pfvb, al, a2,..., an) is called a predicate function -type expression. An expression whose type is of form (Ifvb,al, a2,... an) is called an individual function-type expression. Both predicate functiontype expressions and individual function-type expressions are called functiontype expressions. The class of expressions is that class of S-expressions in which we will be interested. It is the class of "expressions of our language." We first define a subclass of the class of expressions, called the class of variables. There are four sorts of variables. Sorts of Variables 1. Propositional variables. These are atoms. (see Table 1) 2. Individual variables. These are atoms. (see Table 1) 3. Predicate variables; these are of form (',a1' a2,..., an) where I is a predicate variable base (see Table 1) and each ai is a type 4. Individual function variables; these are of form ((,aC1, a2,... an) where 4 is an individual function variable base (see Table 1) and each a. is a type 1 Type of the Variable Pv Iv (Pfvb, al, a2,..',n) (Ifvb, al, a2,..., a )

-60 An expression is an S-expression of one of the following forms. Form of the Expression Type of the Expression 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Pv a; (where a is a propositional variable) a; (where a is an individual variable) (qu,a);(where a is an S-expression) D. Pv Iv Ifvb Pfvb (,a,1 a2,..., an);(where I is a predicate function.variable base and each a. is a type. This expression is a predicate variable.) This expression is a predicate variable.) Pv Pv Iv Iv (Pfvb, (Pfvb, (Pfvb, (Pfvb, (Pfvb, (Pfvb, (Pfvb, Pv, Pv) Iv, Iv) Iv) Iv) Iv) Iv) al, a2,... an) 13. 14. 15. 16. 17. 18. * newpv newiv newpfvb newifvb (,' a.' a.", an) (where p is an individual function variable base and each ai is a type. This expression is an individual function variable.) (Ifvb, (Ifvb, (Ifvb, (Ifvb, (Ifvb, (Ifvb, Iv,Iv) Iv) Iv) Iv) Iv) a1, a2'.. an)

-61 Form of the Expression Type of the Expression 19. ( a, a' a2..., an);(where fr is an expression of type (Pfvb, 61, 2',..., n) and each a. is an expression of type.i) 20. (e, aL a2... n);(where 6 is an expression of type (Ifvb, 61, 2'..., 8 n) and each a. is an expression of type Pi) 21. (pcond, ( a, 1), (a2, B2),..., (an, 8n));(where each a. and 6. is an expression of type Pv ) 22. (cond, (al 81), ( 2' a 2)''' (', ));(where each ai is an expression of;(where each a. is an expression of 1 Pv Iv Pv Iv type Pv and each B. is type Iv) 23. ( V, (nl, n2, * *. n) A ) an individual variable pression of type Pv) 24. (3, (ni, n2''., rn), a) an individual variable pression of type Pv) 25. (3!:, (n),A) an expression of;(where each nn is and a is an ex-; (where each ni is and a is an ex-; (where n is an in Pv Pv Pv dividual variable and a is an expression of type Pv)

-62 Form of the Expression 26. (i, (n), a); (wheren is an individual variable and a is an expression of type Pv) 27. (X, (nl n2,, nn), ); (where each ni is a variable of type 8i, and a is an expression of type Pv) 28. (X, (n1, r2... nn) a); (where each ni is a variable of type i., and a is an expression of type Iv) 29. (label, r, ( (n 1',... nn)' )); (where each ni is a variable of type.i, a is an expression of type Pv, and tn is a variable of type (Pfvb,,12..,n)) 30. (label, (, (n,n * nn) a)); (whereeach n. is a variable of type S, ais an expression of type Pv, and < is a variable of type (Ifvb 1, 2',...,'n)) Type of the Expression Iv (Pfvb, 31,2,..., 8n) (Ifvb, 1,' 2'...' Sn) (Pfvb,1,2,..'.n) (Ifvb, 1' 2',.. n)

-632.1.2.3 Format for Abbreviation Rules. Although we are mainly interested in abbreviating expressions, there are a few situations when we may want to abbreviate an S-expression which, although composed of expressions, is not itself an expression. For this reason our procedures will be given for handling abbreviations of any S-expression. We shall give a recursive procedure for deriving an S-expression from an abbreviated S-expression. This will be called the unabbreviating procedure. The procedure will be given recursively; i.e., we shall assume that we know how to unabbreviate any abbreviation shorter than the abbreviation we are working with. The procedure for unabbreviating single symbols will be given explicitly. By the unabbreviating procedure we shall give in Sections 2.1.2.4, 2.1.2.5 and 2.1.2.6, an abbreviation is converted into an S-expression written in comma notation. (The comma notation was introduced at the beginning of Section 2.1.2.2 and used throughout 2.1.2.2 in defining the classes of expressions of our language.) In Section 2.1.2.7 we shall give the procedure for converting comma notation into dot notation. 2.1.2.4 Abbreviations of a Single Color With No "Defined" Symbols (The First 12 Rules). The symbols used in abbreviated S-expressions are the symbols of the alphabet and ( ) ^ 0 a d

-64 (Certain symbols will appear as subscripts in an abbreviation) Other symbols may be used if already "defined'" We shall ignore these for now and discuss them later. (A single word is, for our purposes, regarded as a single symbol; e.g., nil) The symbols may be in any number of colors. For now we shall limit ourselves to S-expression abbreviations written all in one color. Abbreviated formula-type expressions look just like formulae in an applied first order predicate calculus (which, in fact, they are). As is usual in such formulations we may drop sets of parentheses, relying on precedence conventions to give us the information which the parentheses normally would. Our first task, then, is to replace these parentheses. A substring of the string of symbols forming the abbreviation is called a sub-S-expression candidate if its parentheses match, if it can be unambiguously unabbreviated by our procedure, and if the result is an S-expression. A substring of the string of symbols forming the abbreviation is called a subexpression candidate if its parentheses match, if it can be unambiguously unabbreviated by our procedure, and if the result is an expression. By our induction assumption, we see that we can determine all sub-S-expression candidates and subexpression candidates of our expression. In the rest of Section 2.1.2, C denotes our abbreviation;a,al,a2...etc. denote sub-S-expression candidates. Greek letters other than C, X, or a denote subexpression candidates. A letter or string with a bar over it denotes the unabbreviated form of whatever is under the bar..,-indicates optional parentheses. We unabbreviate our expression E according to the following rules which we try to apply in order: 1A. If C is of form (a' 2'' ) (n 1) then i is (,1' 2'*' )

-651B. If i is of form (a1. a2) then g is of form (51 C 2) 2. If i is of form ( al, a2'',. a ) and % is a function-type expression, n then i is ( 1, l,., 2'' an) 3. If 5 is of form [ 1- N 1; a2? 82;"'; a > 6n] and 81 is a formulatype expression, then i is (pcond, (, (',2)'., (n'Bn)) 4. If is of form [a 1 — 1;a2 2;..; an + n] and 1 is a term-type expression, then i is (cond, ( 1' 81 ), ( 2' 82),.. ( n' 8n)) 5. If 5 is of form'ai v a2v...v a where each T. is a formula-type ~ 1 ~I"2 n'1 expression and no a. has a v outside of parentheses, then i is (v, 1( (a.... (v,.n-l.. )...))) 6. If i is of form " al a2... A a n A Ca) where each T. is a formulatype expression and no a. has a A outside of parentheses, then i is (, A ^,,,2 ( (, an-l n) ))) 7. If C is of form al a a2: where p is an infix binary function name abbreviation (e.g.,D, See definition in Section 2.1.2.5)and where the type of T is of the form (r, 1, B 2) where 81 is the type of a1 and 82 is the type of a2. then i is (, al, a2) 8. If i is of form ~a then ~ is (, T )

-66 9. If 5 is of form i(nry1 y2,..*, n) 6. where n is a listbinder (see Table 1), then T is (n, (El 2,. n), 6 ) where for each i if yiis a predicate variable base or an individual function variable base and if yi (Co, a2',...amn) or (Yi' al a2..., am) is a subexpression candidate of 6 with 8 being the type of aj, then ci is (Yi, 81' B2..., m), and otherwise E. is y.. 1 1 10. If i is 11. If E is 12A. If E is of form nI (al, a2,..., an) where II is a predicate variable base or individual function variable base, and i. is the type of a., then i is ((I, 31, 62,..., ), *1, a2',.., n) of form n where 1 is either a predicate variable 1T2..,Tn base, or is an individual function variable base, or is one of the four atoms, Pv, Iv, Pfvb, or Ifvb and where each T. is a type, then E is (rI,' T2"* *' an atom then ~ is i 0, then T is (qu, nil) a single symbol, not an atom,.... this situation will be discussed in section 2.1.2.5. 12B. If 13. If E is g is Note: V (x) (P (x)) is not an abbreviated expression; it unabbreviates to (V, (x), (((P, Iv), x))). V(x) P (x) is an abbreviated expression; it unabbreviates to (V. (x), ((P, Iv), x)).

-67 Since rule 7 precedes 9, V(x) P(x) D P(x) unabbreviates to ( D, (V, (x), ((P, Iv), x)), ((P,Iv),x)) as does ( V(x) P(x)) D P(x) However V (x) (P(x) D P(x)) is different; it unabbreviates to (V, (, ( ( D,((P, Iv), x), (( P, Iv), x))) 2.1.2.5 "Defined" Symbols (Rule 13). statement is a symbol string in one of the following 5 forms: 1. $ =- e where 4 does not occur in 2. c(a,1' 2'' a ) =: where p does not occur in 3. a1 4 a2 = 8 where c does not occur in 4. (' a 2'". a ) =:* where ( does occur in 6, 5. a 1 a 2 =: where ( does occur in B A definition o, B, o, or In addition,the following must hold for the above 5 forms: Each a. is a variable - or variable base (predicate variable base or individual function variable base). Each ( is a single symbol (e.g. letter or word) which is not any of the symbols of our alphabet (Table 1) and not any of the symbols in the box below. The symbol ( occurring in a definition statement is called a "defined" symbol, and it is said to be defined by the definition statement in which it occurs. Any defined symbol may be used in an abbreviated S-expression (including the y or 8 of a definition statement, see the 5 forms above ), so long as it is defined by a definition statement occurring (or referred to) earlier in the discussion.

-68 Rule 13 now can be written: 13A. If i is a single "defined" symbol ) then, If ) is defined by a statement of form 1, i is. If ) is defined by a statement of form 2, i is (X, (al' a2''' a ),') If 4 is defined by a statement of form 3, 5 is (X, (a1' o2),' ) If ) is defined by a statement of form 4, i is (label, (e, 61' 62'' 6n, (x, ( al' a2'' n a), )) Where 6 is the type of a. and: (1) If B is a formula-type expression then e is the first predicate function variable base not occurring in 8; (2) If B is a term-type expression then 0 is the first individual function variable base not occurring in-; and: y is B with each occurrence of 4 replaced by (8, 6, 62..,6 ). If ) is defined by a statement of form 5, i is as for form 4 with n = 2. (With respect to rule 7: ) is an infix binary function name abbreviation if and only if it is, =, *, or a "defined" symbol whose definition statement is of form 3 or 5 above.) The reader may note the striking similarity between our abbreviated definition statements and LISP definition statements in m-notation. Obviously, the form of abbreviations is quite dependent on preceding definition statements. 13B. If i is of form addad (a), then i is (a,, (d, (, (, (d,a))))). Similarly for any string composed of a's and d's 13C. If; is of form addad, then i is (X, (x), (a, (d, (d, (a, (d, x)))))). Similarly for any string composed of a's and d's.

-69 Example: - is an abbreviation for (X, (p), ( D, p, )). (The definition statement of - is given in Table 3.) p is an abbreviation for ((X, (p), (D, p, P )), p). 2.1.2.6 Abbreviations Using Colored Symbols (rules 14 - 17). In abbreviations we may use colored symbols to indicate a quote operation (i.e., to indicate the existence of a qu ) Color of Symbol Meaning Black Unquoted symbol Blue Quoted symbol Red Doubly quoted symbol Green Triply quoted symbol We will use colored Greek letters (except x) in our rules in the following way: If a stands for a string of symbols, then stands for the same string of symbols but with the following color changes: Symbols black in a are blue in; Symbols blue in a are red in; Symbols red in a are green in; etc., etc. Similarly, with other Greek letters used to represent strings. If a is B then we will sometimes writeO for. E.g., if a is * y then a is (*, (qu, x), y), is and is (qu, x)y) Note, 9stands for a string which has no black symbols. If stands for such a string, then stands for the same string but with the following color change: symbols blue in are red in(; symbols red in are green in; etc., etc.

-70These color conventions given for a will be the same for other Greek letters (except X ) including a. To handle colored abbreviations, we add the following rules. 14. If E is of form then 7 is (qu, a ) 15. If i is of form J then i is (*, a, 0) 16. If i is of form &1 a2 a3E1 a3 then 5 is (*, a1, 2a3c. a4 ) 17. Suppose there exist two sequences: 61, 62'', 6, a sequence of term-type expressions,and ~1, 2,..., En, a sequence of variables, such that no ( occurs in i and such that a uniform simultaneous substitution of i's for 6.'s in 5 yields 9 where a is an expression. Now let 8 be the result of uniformly simultaneously substituting the 6i's for the's in. Then 1 is. Example l.?x (There is a definition statement for - in Table 3.) Use rule 17 with 61= x and = p. Then 8 is ((, (p), ( D --: - - and the answer is 8 which we get by rule 16: (,rule 1 (P),, (X, (, Q rule 14: (*, (, ( (X, (p), ( (, p, ))), ) rule 15: (*, (qu, (X, (p), ( D, p, 0 ))), (*, x, 0)) rule 1, several times: (*, (qu, (X, (p), (, p, 0 )) ), (*x, 0)) rule 12, several times: (*, (qu, (X, (p),( D, p, ))), ( *,x, (qu, nil))) This is a term-type expression, and is of type Iv. Note yx is the same expression as x so we say x is (x since they are abbreviations of the same expression. We shall frequently talk in this way. We can also say Ox is (*,() (*, x, 0 ))

-71 Example 2. x* y By example 1 we see (x is a subexpression candidate of type Iv. Hence rule 7 operates: ( *, x, y) Rule 12: (*, x, y) This gives the same as (see Example 1) (*, (*~,< (*, x, 0)), y) Example 3. <^ * YQ Now rule 7 doesn't work,but rule 17 does,as in Example 1 with 6 = x* y. The result is the same as (see Example 1) (*,, (*, (*x,y), 0)). This is quite different from Example 2. Note that: (*, x,y)) gives the same result so we can also say, x * ye is X(*, x, y)Q. Example 4. x y By rule 17 with 61 =x, 62 y, e p = q: 72 xX Rule 16: (*,9, y ). Rule 16: (*,, (*,, Rule 15: (*, (*,, (*,, ))) Rule 12: (*, (*, x, (*, y,0 ))) Rule 14 and 12: (*, (qu, D ), (*, x, (*, y, (qu, nil)))) Note: x y is *(x* (y* 0 )).It is also. Example 5. T( X 9^ ) (There is a definition statement for T in the tables.) By rule 2 and 14 alternately, we get: (T, v ) (T, (qu, T(p ))) (T, (qu, (T, (p ))) (T, (qu, (T, (qu, p D p)))) (T, (qu, (T, (qu, ( D,P,P))))) We then use rule 13.

-72We must stress that all 17 unabbreviating rules are to be used as a unit. A symbol such as t appearing in any rule means the result of successively applying all 17 rules to a.

-73 2.1.2.7 Comma vs Dot Notation. The unabbreviating procedure given in Sections 2.1.2.4, 2.1.2.5, and 2.1.2.6, gives a result written in the so-called comma notation. This is a perfectly good notation and it is the one we shall generally employ (in fact, it is the notation we employed in Section 2.1.2.2 when we gave definitions of the classes of expressions), but it is itself a sort of abbreviation. (It is the result of making the kind of abbreviation mentioned at the very beginning of Section 2.1.2.2.) In other words, after applying the unabbreviating procedure of Sections 2.1.2.4, 2.1.2.5, and 2.1.2.6 to an abbreviated S-expression, one still has an abbreviated S-expression. In order to completely unabbreviate an S-expression one must follow the above unabbreviating procedure with a second unabbreviating procedure which is given below. The notations here are the same as in the previous procedure. There are three rules. 1. If i is an atom, then i is i 2. If E is of form (a), then 5 is (a. nil) 3. If i is of form (a' a2'...' ) then i is (1, (a2,.''a ) ) 1 2 n 1 2 n Note: These 3 rules are distinct from the preceding 17 rules. a occurring in these 3 rules means the result of applying only these 3 rules to a. a occurring in the preceding 17 rules means the result of applying only the preceding 17 rules to a. 2.1.2.8 Reading the Expressions in the Tables and the Text. In Section 2.1.2.5 we pointed out that the form of an abbreviated expression depends very much on what definition statements precede it. We will want to use many "defined" symbols in the following sections. We could either defer each definition statement until the "defined" symbol is needed, or we could list all the definition statements now. We shall adopt the latter course for most "defined" symbols. However, instead of actually

-74inserting the block of definition statements into the text at this point, we have placed the block of definition statements in the tables in Section 4. We refer the reader to these definition statements now and warn the reader that we shall in the future freely use the symbols "defined" by the definition statements in Section 4 (except for Section 4.11). To aid the reader in deciphering the tables of Section 4, we shall make the following remarks about the organization of the tables: The definition statements are all contained in Tables 3, 6, 7, 8, and 9. Tables 3, 6, and 9, consist entirely of definition statements. Since the definition statements are gathered in the tables for easy reference, the reader need not learn any of the definitions until he feels it is necessary. In fact, we shall ( in Section 2.1.4.1) give English statements of the meanings of the important "defined" symbols. The reader will find these English statements sufficient for most purposes and he may decide that careful scrutiny of the definition statements is unnecessary. Table 1 lists our alphabet. Table 2 lists those function expressions which name our basic recursive functions, each of which has its own special evaluation procedure (see the note in Section 2.1.4.2).(In any machine employing our system each such procedure would be stored as a separate subroutine.) The meanings of the function expressions are given using the notation of Section 2.1.4.1. Table 3: This is a basic list of definition statements for symbols which are abbreviations for the names of recursive functions. As in all tables except 11, each definition statement employs only those "defined" symbols which have been defined in previous definition statements. Table 3 gives definitions for all the "defined" symbols used in Tables 4 and 5 except T and Pfstep. We shall see in Section 2.1.4.2 that each of the function names given in Table 3

-75 is an especially nice kind of function expression called a complete recursing function expression. Tables 4 and 5: There are no definition statements in these two tables. Each of these tables is a sequence of abbreviated expressions, using the "defined" symbols which were defined in Table 3 (and also using Pfstep and T,"defined" in Tables 6 and 7; see apology below ). The expressions abbreviated in Table 4 are the axioms of our system. We shall be discussing these in Section 2.1.4.3 and later sections. We shall see in Section 2.1.4.2, that our rules of inference can also be written as expressions in our language. Table 5 gives the abbreviations of these expressions. We shall be discussing them in Section 2.1.4.4 and later sections. Table 6: Like Table 3, this table consists entirely of definition statements for symbols which are abbreviations of those especially nice kind of function expressions (which we shall discuss in Section 2.1.4.2) called complete recursing function expressions. As "defined" in this table, each Rule. symbol has an obvious relationship to the i'th rule of inference in Table 4. At the end of the table, the Rule.'s are used in a definition statement which "defines" the symbol Pfstep from which the symbol Proof Cwhich names the predicate true on proofs) is immediately "defined." From this, in Table 7, the symbol T (which names the predicate true on theorems) is "defined." We shall discuss the meanings of Pfstep, Proof, and T in Section 2.1.4.1 and later sections. The whole purpose of the definition statements in Table 6 is to make simpler the definition statements for the above three predicate expressions. Apology: Since the definitions in Table 6 can most easily be thought of as being derived in a natural way from the rules of inference in Table 5, we have placed the Table 6 definitions after the rules of inference. The reader

-76 will find this arrangement more convenient than the reverse, but strictly speaking, since the expression T is used in the rules of inference, the definition statement for T, and hence all the statements in Table 6, should precede the rules of inference (in which T is used), and should also precede the axioms (in which Pfstep is used). Table 7: With this table we leave the realm of those nice complete recursing functions that we shall be discussing in Section 2.1.4.2. The definition statements in the remainder of the tables give abbreviations for functions that are in general not complete recursing. The only definition statement in Table 7 defines T, the predicate expression which names the predicate true on theorems. (We shall be discussing T at end of Section 2.1.4.1,) Following the definition statement for T are several abbreviated expressions which use T. It will turn out that these expressions are theorems of our system and that their intended interpretations are virtually the same as the intended interpretations of the expressions in Table 5. It is intended that the reader defer examination of these theorems, and the other theorems in the tables, until after we have discussed the axioms and rules of inference in Section 2.1.4. Table 8: Here we have several definition statements and interspersed theorems (proofs not given) which will be useful as a formal counterpart to our discussion of well formedness in Section 2.1.3. Table 9: This table consists entirely of definition statements which are needed for the definition of api which is defined at the end of the table. This function expression abbreviation will be discussed in Section 2.1.4.2. Table 10: In this table are given several sample proof outlines. It is intended that the reader defer examination of this table until after we have

-77discussed the axioms and ruies of inference in Section 2.1.4. Table 11: This table has nothing to do with our axiomatic system. It contains some LISP-like routines for implementing an adaptive theorem prover as suggested in Section 3. 2.1.3 Well Formedness. In most axiomatic systems there is a rule of inference which allows uniform substitution of a well-formed formula for a propositional variable. What will be our analogue of this rule? We cannot allow substitution of any formula-type expression for a propositional variable, because, according to our definition, a formula-type expression may contain those "contradictory" algorithmic-names that we promised to exclude from theorems. For this reason we define a subset of the set of formula-type expressions, called the set of well-formed formulae. This set consists of just those formula-type expressions which contain no algorithmicnames except those which are generatable according to the procedure we discussed earlier. Thus the well-formed formulae will contain no "contradictory" algorithmic names. Because of the nature of our procedure for generating algorithmic-names, the set of well-formed formulae is not a decidable set. Consider the predicate expression abbreviated as F. (The symbol F is"defined" in Table 8.) This expression names the predicate true on Sexpressions which are well-formed formulae. Although,in form,F looks like an algorithmic name, the definition contains a function-type expression, (namely T) which can only be regarded as an ordinary-name since it contains a -4. Thus, F is not a pure algorithmic-name but a sort of hybrid, built up according to our rules of formation of predicate-type expressions from names, some of which are not algorithmic. Such hybrid names of individual functions and predicates are so common in our system that the distinction between ordinary-names and algorithmic-names, which we made earlier, is

-78 actually of little use to us once we take the more complicated cases into consideration. We can still speak, however, of purely algorithmic names. Such names must reflect their algorithmic evaluation procedures. Thus their unabbreviated forms can contain no V,!,!, or I, except inside the expressions l(y) (( atom(x) A y = x) v (X(z) y * z = x)) and i(y) (( atom(x) A y = x) v ( A (z) z * y = x)) which we are abbreviating as a and d respectively. (These two expressions are permitted only because any machine using our system would have stored a special algorithmic evaluation procedure for a and d - see note, Section 2.1.4.2.) F couldn't possibly be a purely algorithmic name since, being true on an undecidable set, it has no algorithmic evaluation procedure. Of course we can freely substitute wellformed formulae for propositional variables. That is: If a is a theorem, T is a propositional variable, and B is a well-formed formula, then the expression obtained by uniform substitution of B for' in a is also a theorem. This meta-theorem looks very much like the rule of inference we want to use. Close examination of the meta-theorem, however, shows that it cannot be a rule of inference, because there is no effective way of using it. How does one decide whether or not B is a well-formed formula? The set of wellformed formulae is not decidable. Yet we need some rule of inference similar to the above. We define a subset of the set of well-formed formulae called the set of simple formulae. A simple formula is a well-formed formula whose unabbreviated expression contains no function-type expressions except atoms, predicate variables, or individual function variables. Note: all the "contradictory" algorithmic names (pure algorithmic names or not) are in the set of excluded function-type expressions.

-79 The set of simple formulae is a decidable set, so the following can be a rule of inference: If a is a theorem, T is a propositional variable, and y is a simple formula, then the expression obtained by uniform substitution of Y for XT in a is also a theorem. If we have rules of inference which allow us to substitute already generated predicate function-type expressions for predicate variables and to substitute already generated individual functiontype expressions for an individual function variable, we can, in fact, substitute any well-formed formula uniformly for a propositional variable by first substituting the appropriate simple formula and then replacing its predicate variables and individual function variables by the appropriate function-type expressions already generated. Later we will examine in detail the rules which allow us to do this. We have taken a set of expressions (the set of formula-type expressions) and created two subsets: The first was created by excluding all expressions containing function-type expressions which were not generatable. (Non-generatable function-type expressions are all algorithmic names, but are not necessarily purely algorithmic.) The second was created by excluding all expressions containing function-type expressions which were neither atoms, predicate variables, nor individual function variables. We can similarly create two subsets for other particular sets of expressions, as the chart indicates.

Particular Set of Expressions Expressions Form-type expressions Formula-type expressions Term-type expressions Function-type expressions Predicate function type expressions Individual function-type expressions -80First Subset Well-formed expressions Forms Well-formed formulae Terms Function expressions Predicate expressions Individual function expressions Second Subset Simple expressions Simple forms Simple formulae Simple terms Simple formations Simple predicates Simple individual functions For each line of the chart the following facts hold: The first subset is a subset of the set in Column one. The second subset is a subset of the first subset. The set in Column one and the second subset are both decidable. The first subset is not decidable. (Note that in our terminology a function expression is either an individual function expression or a predicate expression. Instead of "well-formed formula" we shall often write simply "formula.") 2.1.4 The Axiomatic System 2.1.4.1 Certain Functions and Relations. Having discussed the various kinds of expressions in which we shall be interested, we shall now proceed to discuss our axiomatic system in more detail. We shall specify the axioms and rules of inference of our system. In specifying these we shall make free use of the predicate and individual function symbols defined in the definition statements in the tables. Before specifying our axioms and rules we will briefly discuss the defined predicate and individual function symbols that we shall be using in

-81abbreviations of the axioms and rules. The brief discussion will supplement the definitions in the tables by giving the intended interpretations of expressions whose abbreviations employ the defined individual function and predicate symbols. To do this economically we shall observe the following conventions: When a formula is true under the intended interpretation, we shall say that the formula holds. We shall say that an individual constant of form (qu, B) names the S-expression B. And when a formula of form a = (qu,B ) holds, we shall say that a names the S-expression B even though a is not an individual constant. (For example, we say that a((qu, (a.3 ))) names a since a((qu, (a.B ))) = (qu,a ) holds.) For any term-type expression a, we shall, in our discussion, write W to mean the S-expression which a names. (e.g. We say that a((qu, (a.8 ))) is a. Note that the symbolris merely a convenience for purposes of discussion and is never part of an S-expression or S-expression abbreviation.) In Section 2.1.3 we defined several useful classes of S-expressions. For each such class we can think of the unary predicate true on the class. Now the predicates of our language are defined over S-expressions so it is not surprising to find that we can write, in our language, algorithmic names for each of the above predicates. The tables give definition statements for defined symbols which are abbreviations for these algorithmic names.

-82We give below a chart which indicates intended interpretations of formulae whose abbreviations employ these defined symbols. Formula expression (a) wfexpression Ca) simplexpr (a) formtp (a) form (a) Ftp Ca) F (a) simple formula (a) Tmtp (a) Tm (a) simpleterm (a) functiontp (a) function (a) Ffetp (a) Pfe (a) Ifetp (a) Ife (a) It Holds If and Only If a Names an expression a well-formed expression a simple expression a form-type expression a form a formula-type expression a well-formed formula a simple formula a term-type expression a term a simple term a function-type expression a function expression a predicate function-type expression a predicate expression an individual function-type expression an individual function name Similarly, the several charts below give interpretations of formulae and terms whose abbreviations employ other defined symbols. (Note that an exact definition may be found of, for example, the set of simple terms, by first looking at the above chart and noting that the predicate true on simple terms is simpleterm, and then referring in Table 8 to the definition statement for simpteterm.) Also, two charts in Table 2 give interpretations

-83 of formulae and terms which employ the basic complete recursive function expressions, which are listed in Table 2. Use of Some Logical Connectives (Predication of Type (Pfvb, Pv) or (Pfvb, Pv, Pv)): Formula It Holds If and Only If rua o%4 O a A a v B Oa = f a does not hold. a holds and B holds. a holds or 3 holds. a and 8 both hold or else neither holds. Use of Some Other Predicates: Formula It Hol ds If and Only If - a and 8 name different S-expressions. a names a sub-S-expression of the S-expression named by B. a B g holds and if a names a variable then it occurs free in the S-expression named by B. (This only makes sense when 3 names an expression.) is a variable and' is an expression and if there is no free occurrence of a in' occurring in any subexpression of 7 of form (label, a, 6 ). (It also holds in some cases when a is not a variable or T is not an expression.) a is a variable which doesn't occur free in inside the scope of a binder which binds a variable free in'. (But it also holds sometimes when T is not a variable.) freecheck(a, 8, -)

-84 variable (a) nonvaratom(a) aE a names a variable. a names an atom which is not a variable. 8names a list of form (a1 a2,...a ) and' is a. for some n and i such that 1< i< n. (It may also hold in some cases when a doesn't name a list.) Use of Some Individual Functions: Term It Names exprtype(a) type(a) args(a) newvarex(a, ) The type of a' whenever a names an expression. The type of a if a names a variable, 0 if it doesn't. The list (6B,'2'..., n) wheneverc names a function-type expression of type r The variable y where y is of the same type as' and does not occur in. Use of Predicates Defined ( )n Lists: Formula andlista(f, a) (The type of n must be PfvbIV or this isn't a formula.) andlistlista (, a, ) (The type of 1 must be PfvbIv or this isn't a formula.) n(3) holds for every B such that 3 e a holds. (So a is meant to be a list.) a andb name lists of form (al, a2,..., an) and (B1, 2.''. " n) respectively, and I (9, ( ) holds for each i < n. (It may also hold sometimes whena or p don't name lists as indicated.) It Holds If and Only If

-85Use of an Individual Function Defined on a List: Term It Names map istcar(, a) p must be IfvbIv a formula.) (The type of or this isn't when a names a list of form (a'2'... "an) Te Use of Individual Functions Which Perform Substitutions:.rm It Names the Expression Obtained by S( ac,, y ) Sf(a,, a) Snf(ac, I, y ) SsZ(a, S, ) Uniform substitution of' for' in. Uniform substitution of ~ for all free occurrences of the variable T in7. Uniform substitution of $ for all occurrences of W in y except those occurrences inside sub-expressions which are function-type expressions. (Assuming a and 8 name lists of form (al' a2...an) and ~1'S2'',. n) respectively) simultaneous uniform substitution of each Bifor ai in. (Assuming a and 3 name lists of form (al' a2.' an) and (Bl's2'... n) respectively and each ai is a variable) simultaneous uniform substitution of each.i for all free occurrences of a. in 1y Ssfz (of$ 8,'Y

-86We have the three further predicates: Pfstep, Proof, and T Pfstep (a) holds if and only if a names a list of form ( al, a2'... an) where al is an axiom of our system or is derivable from the other a.'s by means of our rules. Proof (a) holds if and only if a is a list of form (al a2,.., a ) where a, an a2 a1 is a 1 n n "' - 2 1 sequence of expressions forming a proof in our system. Now except for some in the first chart in this Section (2.1.4.1), all predicate and individual function names we have discussed so far in this section are purely algorithmic names("purely algorithmic" is defined in Section 2.1.3.). We define T(x) =: 3(y) ( Proof(y) ^ a(y) = x) T does not have a purely algorithmic name. T (a) holds if and only if a names a theorem of our system. 2.1.4.2 The Meta Level. Consider a term ( a1, a2'...,an) where 4 is an individual function expression without free variable and each ai is either a function name, an individual constant (i.e. of form (qu, a)), or one of the two so-called propositional constants, e and ~. Suppose this term names the S-expression B. Using LISP terminology, one could say that the term has value g or that the function expression c gives value g when applied to the arguments al, a2... an A LISP program [McCarthy 1962] is a function-type expression much like p. Such a program is presented to the LISP interpreter which applies the program to the data, written in the form of arguments like al, a2,.., above. The task of the interpreter is to evaluate terms like ( (a1 a2,..., an) to produce the S-expression named by the term.

-87In our system we shall have a similar evaluation procedure. In a moment we shall direct the reader to the definition of our procedure. First, let us say that it differs from the LISP procedure in two ways: A. In applying X expressions, one substitutes the argument expressions directly into the matrix of the X expresion and the result is evaluated. (LISP would evaluate the arguments and substitute their values. Our scheme makes the LISP symbol FUNCTION, and all its complications, unnecessary.) B. The result of evaluation is a constant. That is, it is ( or O or a quoted expression, not the expression itself. Thus, in the above example, the result of our evaluating f (al,a2.. n) is, where) names (a,...,). We say ~(al',a,...,n ) evaluates to(. Now the evaluation procedure is a recursive procedure so if Q is the name of a non-recursive function, there will be choices of the arguments for which the evaluation procedure does not yield an individual constant which names ((a1, a 2,. p.. an). In such cases, either the procedure does not terminate, or it yields another term y, which names c(ai, a2,..., an), but which is not an individual constant. In the second case we still say K(a1, a2,..., en) evaluates to y. Our evaluation procedure is such that if $ is any purely algorithmic individual function name (defined in Section 1.3) then for any choice of arguments al, an,..., an (i.e. any choice of a.s such that: ( (a, a2,.. an) is a term without free variables, each ai which is a term is an individual constant, and each a. which is a well-formed formula is a propositional constant), p(aC, a2,...,an) will either evaluate to an individual constant or the evaluation procedure will not terminate. Such a )(a1' a2'... an) will never evaluate to a term which is not an individual constant. If ((al1 a2,.., an) evaluatesto an individual constant for any

-88 such choice of arguments a1., a 2'. n, then we say that ~ is a complete recursing individual function expression. Any complete recursing individual function expression is a purely algorithmic individual function name. Our evaluation procedure handles not only individual function expressions, but also predicate expressions, and analogous statements are true of them. If ( al, a2'....,an) is a well-formed formula, then either it evaluates to a formula y which holds if and only if q(a1 a2"'"a n) holds, or else the evaluation procedure does not terminate. If D is any purely algorithmic predicate name then for any choice of arguments al, a2'.. an (ie. any choice of ai's such that: ~(aL, a2,..., a ) is a well-formed formula without free variables, each a. which is a term is an individual constant, and each a. which is a well-formed formula is a propositional constant), 4(a1, aC2,...,an) will either evaluate to a propositional constant or the evaluation procedure will not terminate. If ((a1, a2,..., a ) evaluates to a propositional constant for any such choice of arguments a,1 a2'... a then we say that is a complete recursing predicate expression. The complete recursing individual function expressions and the complete recursing predicate expressions are the so-called complete recursing function expressions. Note: Each of the basic function expressions listed in Table 2 has a special recursive evaluation procedure. A machine employing our system would have a special little evaluation subroutine (analogous to LISP SUBR's)for each of the function names in Table 2. Thus, if ~ is in Table 2, it is a complete recursing function expression. The function expressions given in Table 3 are built from those in Table 2 by operations analogous to composition and primitive recursion, so it is not hard to see that the function names giveh in Table 3 are complete recursing function expressions, as are the names given in Table 6. Of course, there are plenty of function expressions whose

-89abbreviations are given in Tables 7, 8, and 9 that are not complete recursing. Our evaluation procedure will be written in a general way so that it will apply to any term or well-formed formula. If a is a form to which we apply our procedure, and if all function expressions in the form a are purely algorithmic names (Then they look like LISP programs.) and if a has no free variables, no quantifiers, and no t, then either the evaluation procedure does not terminate or a evaluates to a constant (individual constant or propositional constant). If all function expressions in such a form a are complete recursing function expressions then a evaluates to a constant. Consider the purely algorithmic individual function name which we shall abbreviate as apt. The definition statement of this defined symbol is given in Table 9. The intended interpretation is given below. Term It Names apl()0 The expression which results from applying our evaluation procedure to a. (Note: this expression will not always be a constant.) (apt is analogous to the LISP evaZ function.) Thus, the reader will find in the tables the specification of our evaluation procedure. It is written there in the guise of a definition statement for apt. (So we describe our evaluation procedure in much the same way that the LISP 1.5 manual describes the LISP evaluation procedure.) The following meta-theorems will hold in our system. META THEOREMS: If a is a term which evaluates to B then a = is a theorem of our system. If a is a well-formed formula which evaluates to B then a 6B is a theorem of our system.

-90 Since our meta theerems are statements about classes of S-expressions, we can write them as expressions in our language. For example, the first half of the above meta-theorem can be written (Tm(x) A ap (x) = y) D T(x y) where x is to be and y is the. Since rules of inference are meta-theorems, we can write them in the same way. Modus ponens is (T(x y) A T(x)) D T(y) Now a rule of inference is a meta-theorem of form a D T(S) where a is a well-formed formula and 3 is a term. To apply the rule to a set of theorems n., n,..., qm} we substitute constants for all free variables (These constants are the parameter values referred to in Section 1.5 in the introduction.), obtaining something of form C D T(6) c is then evaluated in the normal way except that whenever something of the form T (y) is encountered it is replaced by ~ if and only if the evaluation of y terminates in something of form (qu,ni). If this procedure terminates and the result is e, and if the evaluation of 6 terminates in something of form (qu, M ), then we say that a result of applying rule a D T () to antecedent set I{, n2, n m is the theorem p. If any of the above conditions fail to be met, we say the attempt to apply the rule fails. With this application procedure in mind, we have used the above notation to write, in Table 5, the complete list of the initial rules of inference for our system. (Remember, we have a procedure for adding new rules of inference. This is explained in Section 2.2.1.2.) Note that a rule will be more useful the fewer function expressions it contains which are not complete recursing, since if it contains many function expressions which are not completerecursing function expressions, the evaluation procedures will tend to not terminate or to yield something other than a constant.

-91 2.1.4.3 The Axioms: (Table 4). Furnished with the meanings of the defined functions and predicates: atom, ngnvaratom,', v, A,,, and Pfstep, as well as the meanings of the primitive symbols of the language, we are ready to examine our set of axioms listed in Table. 4. For purposes of our examination we divide the axioms into several groups. Axioms 1 - 5: This is simply one formulation of the axioms for a finite axiomatization of the pure first order predicate calculus. These are just as they might be in a formal arithmetic. Axioms 6 - 7: These are the axioms of equality. Again these are just as they might be in formal arithmetic. Axiom 7b is needed for the substitutativity of equivalence discussed in Section 2.2.1.1. The only reason this - counterpart of 7a appears in our system, but not in most formal arithmetics, is that we have predicates with domain of truth values as well as predicates with domain of individuals. Axioms 8 - 10: These are our analogues of the Peano axioms, where our axioms are based on * rather than on S. Recall that we made this simplification with assurance that we were only decreasing our power by doing so. Had we retained the more powerful axioms based on [, they would have been: (0 = [ (x) ) [(x) = () D x = y (P(0) ^V(x)(P(x) D P(S E(x)))) D V(x) P(x)

-92 Axioms 11 - 13: We could regard 0, 3, and 3! as abbreviations and not actual symbols of our language. However, we do actually introduce them as symbols, and these axioms may be regarded as their definitions. Axiom 14: This is the definition of i. i cannot be regarded as an abbreviation. We need the actual symbol or we lose power to represent the functions we want represented. Note we have only formalized an interpretation of i(x) P(x) when in fact 3!(x) P(x) holds. We could formalize an interpretation in the other cases by picking an arbitrary interpretation, say nil, and adding the axiom ~3!(x)P(x) D i(x)P(x) = 0

-93 Axioms 15, 16, and 17: In Section 2.1.1.10 we removed from our system the ordering of the atoms via indexing. We retained, however, a few axioms (previously theorems) and rules which gave the minimal amount of information about the atoms needed to carry out the proofs we want. These three are the axioms retained. Axiom 18: Remember that we need certain algorithmic names in our initial set of algorithmic names. This set is the set of algorithmic names which appear in the axioms. Now all algorithmic names which we need in the initial set are in the unabbreviated expression which we abbreviate as Pfstep. Axiom 18 ensures that these algorithmic names are all in the initial set. 2.1.4.4 The Rules of Inference (Table 5). Furnished with the meanings of those defined functions and predicates listed in the charts of Section 2.1.4.1, as well as the meanings of the primitive symbols of the language, we are ready to examine the rules of inference listed in Table 5. These are written in the notation explained in Section 2.1.4.2. In Rules 2,6,16, and 17, the predicate 4 is used to make sure that in no theorem is there, occurring inside a subexpression of form (label, r, 6 ), a variable bound by a binder located outside that subexpression. The O in Rule 6 has another purpose too, which we discuss below. Note that Rules 3,4,5,16, and 17 are the only rules which can generate new expressions of form (label, r,6,). (I.e., they are the only rules that generate new algorithmic names.) Rule 1: Modus ponens. Rule 2: Generalization. Rule 3: Change of bound variable. Rule 4: Substitution of simple expression for a variable of the same type. Recall, we substitute simple expressions instead of well-formed

-94 expressions because by using this procedure in conjunction with Rule 5, we lose no power, and have a rule we can effectively apply. Rule 5: Substitution of function expression for function variable. This is written in a way that takes advantage of the following property of the function exprtype (defined in Table 3). (This property is not mentioned in the discussion of Section 2.1.4.1). If T(A))S) holds, then a will actually be a function expression whenever a(exprtype (v)) names either Pfvb or Ifvb. (The analogous statement for forms does not hold: i.e., simply having T(A)^ A) hold and having exprtype ( ) name Pv or Iv does not ensure that a is a form.) Thus, for Rule 5 to be successfully applied, the constant (i.e., parameter value) substituted for the free variable z in the rule must name a legitimate function expression. Rule 6: Application of a X function to its arguments. Axiomatizations of formal arithmetic which do not have a X notation frequently have one rule which is, in effect, Rule 5 followed immediately by Rule 6 (so the X disappears as soon as it is substituted in ). Rule 6 and Rule 7 are consistent with our modifications of the LISP evaluation procedure. Examining in detail the statement of Rule 6, suppose, & are the constants (i.e. parameter values) to be substituted for y, u, and v respectively in the application of the rule. If n is not a legal term then neither Dig nor OW can hold and thus, for the rule to be successfully applied, u and X must be identical expressions. Rule 6 allows us to simplify the expression ((X, (x), x), y) to y inside a theorem. However, we don't want to simplify maplistcar(( A, (x), x), y) (which we can also write as (maplistcar. ((h, (x), x), y)) ) to (mapistcar. y) inside a theorem. The result is not even an expression. To prevent this sort of misuse of functional arguments, the condition expression (v) is added to the statement of Rule 6. The condition

-95 (y < u v (andista ((X(z) z I adda(y)),ada (y)) andZistZista((X(x,z) l) z D x * adda(y)),ada(y),d(y)))) in Rule 6 ensures that no new expressions of form (label,7,6) are generated. Rule 7: Function recursion. This shows the use of label in functions defined recursively. The ad(y) s add(y) term in the rule assures us that no new label expressions aan be generated by this rule. Rules 8, 9, 10, and 11: Conditional expressions. When combined with the propositional calculus rules, Rule 11 says one can replace a with [e + a]; Rule 9 says from ~a D (..... [1-1; B2 2;; Yn] and ca D (C. 6..... ) we can infer (......[a- 6;1 Y1; 1 2 72;..; n- n] n....... ) The rules also let us go in the reverse direction. In LISP a conditional expression (beginning with a cond) can be, in effect, either a formula-type expression or a term type expression. In our system, we use the atom pcond in the formula-type expressions and reserve the atom cond for the term-type expressions. Rule 12: Listbinder notation. This lets us write V(E1) V(S2). -VC(n) a as V (1,' E2'.' n) a. Similarly for 3. Rule 13: Definition of qu. See Section 2.1.1.9 for discussion. Rules 14 and 15: Rules about atoms preserved along with atoms 15, 16, and 17 when, in Section 2.1.1.10, we got rid of the ordering on atoms. Rules 16 and 17: These are the rules which generate new algorithmic names. They implement the procedures suggested in Section 2.1.1.5 for avoiding "contradictory" algorithmic names. We shall discuss these rules in detail in section 2.2.1.1. Note the following with regard to the specific form of the rules: Suppose) and( are the constants (i.e. parameter values) that we substitute for z and v respectively in applying these rules. Then Rules 16 and 17 require us to replace every occurrence of C in w. This may seem like a restriction, but it is not, since

-96we can always trivally re-code the l'S wewantto replace and then replace only re-coded 4's. This re-coding can be merely a change of a bound variable, accomplished via Rule 3. Note that any function expression insidd the generated function expression must have previously been generated. Rules 18 and 19: These rules are the central feature of our system. They allow us to shift theorems from the object level to the meta level and back again. We shall, in Section 2.2.1.2, discuss how these rules operate and why they don't destroy consistency. We shall constantly add new rules to the system according to a scheme which we shall discuss in Section 2.2.1.2. 2.1.4.5 Proofs and theorems. Table 10 consists of some sample proofs which utilize the various rules of inference. We shall be referring,from time to time, to lines in these sample proofs which illustrate interesting applications of the various rules. Accompanying almost every line of proof in the Table is an English phrase indicating how the line was derived. This phrase usually gives the rule used and the previous lines to which the rule was applied. It may also state the constants (i.e. parameter values) which were substituted for the various fr-e variables in the rule. These are given by writing strings of form 5 = (where i is an individual variable occurring in the statement of the rule and is the individual constant which is to be substituted for it in this application of the rule. (Such strings are, of course, not definition statements, in spite of their appearance.) For the sake of brevity, many lines have been omitted from tbe proofs in Table 10, so that Table 10 actually consists of proof outlines rather than complete proofs. In cases where many lines have been left out, an English phrase will indicate the nature of the omitted development. Sometimes we just indicate the key rule or key axiom used in the development. The phrase "by

-97P.C." indicates that the omitted development employs only those rules which our system shares with pure first order predicate calculus. The phrase is used rather loosely and really is a signal that the omitted development is trivial, uninteresting, and employs no techniques which are distinctive to our system. In some of the later sample proof outlines major sections of the proof have been replaced by an English discussion of the principles involved. Use of Greek letters to stand for any one of a class of expressions makes these outlines into proof outline schemata, the proof outline being good no matter which particular expressions are substituted for the Greek letters. It is not intended that the reader necessarily read through all of Table 10. We shall, however, frequently refer to parts of Table 10 which illustrate interesting points. The proof outlines in that table will be a useful source of examples in the coming discussion. Since the lines of these proof outlines are theorems, we shall freely mention them as theorems in the text. We shall frequently mention other theorems in the text without giving an outline of the proof of each one. The methods one would use in constructing proofs of such theorems would not differ substantially from the methods already illustrated in Table 10. In Tables 7 and 8 we similarly mention some theorems without giving outlines of their proofs. 2.1.4.6 Summary. As promised in Section 2.1.1.1, we have given a formal description of all parts of our formal system. Our alphabet and formation conventions were described in Section 2.1.2. The axioms and rules of inference were given in Section 2.1.4. The well-formed expressions were given in Section 2.1.3. The intended inter pretation was given in Section 2.1.1, especially Section 2.1.1.9. In Section 2.1.1 we described many of the formal properties of our

-98system. In doing so we outlined the reason that our system was consistent (since it can be derived from formal arithmetic by consistency preserving transformations). Unfortunately the discussion in Section 2.1.1 preceded the description of our notation (given in Section 2.1.2) and our axioms and rules of inference (given in Section 2.1.4). Thus the discussion in Section 2.1.1 was necessarily incomplete, since we did not have the notation of the system available for use in the discussion. Our remaining task is to discuss in detail some formal properties of our system which we were not able to discuss until now because we lacked the notation. The first such property is consistency. 2.2 Formal Arguments 2.2.1 Consistency 2.2.1.1 Generation of Function Expressions. As we said in the last section, most of the consistency argument was made in Section 2.1.1. We need only review the parts of that argument which were vague. These are the two parts we will review: (1) The procedure for generating new function expressions described in Section 2.1.1.5 for a modified formal arithmetic is followed essentially unchanged in our formal system. We shall discuss this in more detail in this Section (2.2.1.1), and show in detail why the procedure introduces no inconsistencies. (2) In Section 2.1.1.10, we introduced Rules 18 and 19 which allow us to shift theorems from the object level to the meta level and back again. We also introduced a procedure for constantly adding more rules of inference. As yet we have done no more than mention this procedure and Rules 18 and 19. The procedure and the rules will be fully specified in Section 2.2.1.2 andthe consistency argument will be completed.

-99Rules 18 and 19 together with the procedure for adding new rules are the crucial features needed for use in an adaptive theorem prover of the type we are discussing. Discussion of these crucial features iS deferred until after the discussion, in Section 2.2.1.1, of the rest of the consistency argument. That is, in Section 2.2.1.1 we shall be discussing the consistency not of our entire system but rather of a limited system which is without Rules 18 and 19 and without the procedure for adding rules, but which is otherwise just like our entire system. Once we have completed the consistency argument for the limited system, we shall show (in Section 2.2.1.2) how Rules 18 and 19 a.id our procedure for adding rules can be added to the limited system without destroying consistency. In Section 2.2.1.1, then, we shall be discussing the procedure for generating new function expressions as it operates in the limited system. We can divide function-type expressions into those of form (X, (rl', 2' "' n)'B) which we shall cal. X expressions, and those of form (label, T, 6 ), which we shall call label expressions. X expressions are easily generatable via Rule 6. It is the procedure for generating label expressions that chiefly concerns us here. This procedure was briefly discussed, for a modified formal arithmetic, in Section 2.1.1.5. (The conclusions given there about generatability still hold in our limited system.) The basis of the procedure is the use of Rules 16 and 17. Examples of the use of Rules 16 and 17 may be seen in Section 4.10.5 in several different lines.

-100 What might cause consistency problems in the limited system? Generation of X expressions causes no consistency problems. It is the generation of label expressions (i.e., function type expressions of form (label, r, 6 )) that we have to worry about. Note: Any X expression in which no label expression occurs is easily generatable via Rule 6. If no label expressions were ever generated we could transform all our theorems by application of all the X expressions to their arguments. (E.g. by repeated application of Rule 6). This eliminates all X expressions from the theorem. Such a transformation transforms a proof into a proof in a simpler system which uses no X's. In the simpler system the rule for substitution for a function variable is like the more familiar formulations such as the one in Church [ Church 1956 ]. This simpler system is familiar to us and is consistent by the standard model theoretic proof. The above argument shows that our limited system is consistent if we promise never to generate a label expression. Now, because of the extensive use of; and Snf in our rules, label expressions can be generated in the limited system only by (A) use of axiom 17 or 18 in which label expressions occur. (B) use of Rules 16 or 17. (C) change of bound variable in a label expression via Rule 3. (D) substitution for a variable free in a label expression via Rules 4 or 5. Consider a system which is identical to the limited system except that axioms 17 and 18 and Rules 16 and 17 have been eliminated. Let us call this the basic system. By the above argument, the basic system is consistent. It is easy to see that (but for the lack of axiom 17) the basic system is identical to the limited system except that no label expressions may be generated.

-101 We shall prove the consistency of the limited system relative to the basic system. We postulate a sequence of systems W1, W1', W2, W3, W3', W4', W4 where W4 is the limited system and W1 is the basic system (with a minor addition). We shall show that each system in the sequence is consistent if the preceding system is, either by showing that a proof of ~ in the former can be converted into a proof of 0 in the latter, or by showing that the two systems have the same theorem set, or by showing that the former is identical to the latter but for an added axiom which cannot ruin consistency. We shall now define the systems Wl, W1, W2, W3, W3, W4,and W4 by stating how each one differs from the limited system: () whichaxioms and rules the limited system has that it has not, and(2)whichaxioms and rules it has that the limited system has not. (All these systems have the same alphabet.) Axioms and rules in category (2) above are selected from the following list: Axiom 14: ~! (x) P(x) D (x) P(x) = 0 Rule 5": (T(y) A T(u) A z U u ^ variable(x) A type(x) = exprtype(z) ^ freecheck(x,z,y)) D T(Sf(x,z,y)) Rule 4: (T(y) A variable (x) A -.x y A SimpZexpr (z) A type (x) = expertype (z) A freecheck (x, z, y)) DT ( Sf(x, z, y)) Rule 5U: Rule 5: T(y) A T(u) A z u A (Pfv (x) v IfJ (x)) A X y A type (x) = exprtype (z) A freecheck (x, z, y)) DT(Sf(x, z, y)) (T(y) A T(u) A z 4 u A variable(x) A x 4 y A type(x) = exprtype(z) A freecheck(x,z,y)) DT(Sf(x, z, y))

-102Rule 20: (This one we shall state in English.) If 6 results from a by substitution of p for v at zero or more places (not necessarily all occurrences of p in a ), and if none of these substitutions occurred within a function expression, and if 8 is an expression, and if a is a theorem, and if either p = v or p = v is a theorem, then a is a theorem. System Differs from limited system in Deleion of Addition of Axioms Rules Axioms Rules Wi 17,18 16,17 14 none Wl' 17,18 16,17 14' 20 W2 none 16,17 14 20 W3 none 4,5 14' 4, 5u, 20 W3' none 4,5 14 4u 5L, 20 W4 none none none none W4 | none 5 14' 5, 20 consistency of each We shall now show the methods we have mentioned. of these systems by the Consistency of W1 Except for the addition of axiom 14', the system W1 is identical to the basic system, consistent by the standard model theoretic proof. The introduction of axiom 14' does not destroy this model thoretic proof, for, if i(x)n(x) is interpreted to mean the unique x such that H(x) holds if such a unique x exists (This interpretation makes axiom 14 true and says nothing about axiom 14'.), and if i(x)i(x) is interpreted to mean nil if such a unique x does not exist (This interpretation makes axiom 14' true and

-103says nothing about axiom 14.), then the interpretation is still consistent with the other axioms and the rules of inference. Consistency of Wl' Wl' differs from W1 only in the addition of Rule 20. We can show that the theorem set of W1 is identical to the theorem set of W1 by showing that Rule 20 may be proved as a meta-theorem for the Systeni Wl. Rule 20 is our substitutativity of equivalence rule and it can be proved for W1 by the standard induction technique. For example, it can be proved by a method analogous to the method of Church's proof of his Corollary *342 [page 190, Church 1956]. We shall just mention a minor way in which our proof must differ from Church's. Church proves Corollary *342 from Theorem *340. Theorem *340 he proves by induction on the size of the formula. We do something similar. But because of the way W1 uses i and cond, well-formed formulae may occur inside terms inside theorems. Hence, where Church uses in the statement of his Theorem *340, we must allow for either or = in order for the induction to work. Axiom 14 is needed to carry the induction through for the case where Church's A begins with. Consistency of W2 W2 differs from Wl only in the addition of Axioms 17 and 18. The effect of Axiom 18 is to introduce certain label expressions into the language, namely those which we wish to be in our initial set, and those identical to initial set members but for a change in bound variable. Addition of any one of the label expressions in the initial set is very similar to addition of a new function constant. To see that the addition of the label expression causes no inconsistencies, let us consider the analogous problem in adding any new function constant symbol to the system.

-104One could modify the system Wl by adding function constants in the way we shall describe. Suppose one wishes to add a symbol P to the system and suppose one wishes it to be a function constant of type p, naming a certain function. The addition can be accomplished in two steps. First one adds ~ to the alphabet and changes the expressions Pfatom, Ifatom, and exprtype as they appear in the rules of inference so that they regard f as an atom of type p (Specifically change exprtype by the addition of x = -+ at the beginning of the main conditional. Also if m is a predicate constant, add x = to the disjunction in Pfatom. If p is an individual function constant, add x = to the disjunction in Ifatom.) (Among other things, these changes allow the substitution of ~ for the proper function variables.) Second, one adds a single axiom to characterize ~. It must be a formula of form ~((1' 12',',n) = a if ( is a predicate function constant, of form ((E;' 2',...n) = a if ( is an individual function constant, where the Si's are variables, and they are the only variables free in a. We shall always insist that ( occur somewhere in a. For example, if one wants ) to stand for the LISP function maplistcar, one adds the axiom 4(f,x) = [atom(x) -+ x; -+ f(a(x)) * p(f,d(x))]. If one makes sure that the axiom holds when ~ is given the desired interpretation, then no inconsistencies are introduced. If it also holds when < has other interpretations, no problems arise; but if it does not hold, whatever the interpretation of (, then an inconsistency might be introduced. Thus, if instead of the above axiom one adds the axiom 4(f,x) = 4(f,x) no inconsistencies are introduced. But if one adds 4(f,x) = p(f,x)*0, an inconsistent system results. By virtue of Rule 20, the addition of axiom ((' 2''.. ) a or gn)=ao (gl1' g2'' n) = a allows one to substitute a' for ~(n1, n2,.., nn)

-105where a' is derived from a by simultaneously substituting the Tl,'r2,.',n for E1, 2,,'' n (as long as the substitution is not inside a function expression). And Rule 6 allows one to substitute ((X (1''2'' ) a )'l 2',..'',n ) for a' (in the same situations). Thus one can directly substitute (X (i,' U2'' En)a) for p any place k occurs (outside function expressions) other than as a functional argument. We can regard ~ and ( X(il, 2'"' n ) ) as alternative and interchangeable names for the same function. Suppose we were to allow interchangeability of f and (X(S1, E2'*** Sn)a) everywhere outside function expressions, even as functional arguments. This would introduce no inconsistencies since it would only mean, in effect, that we have two names for the same function which, as we have seen above, are already interchangeable whenever it matters, i.e., whenever they are applied to arguments. We can achieve this effect by using, instead of the atom (, the expression (label, r, (X(i1', 2'..' En) a"')) everywhere --- where T is a variable of the same type as ( not occurring in a, and a" can be derived from a by substitution of ~ for ). Let us call this label expression ~'. It is the analogue of the atom f. 1' has no free variable and behaves just like ( did,in that it can be substituted for function variables by virtue of Rule 5 and the characterizing axiom'((1',...,' n) = ac" or ('( E1' 52'' n) = ca' (where a"' can be derived from aA by substitution of P' for X ). Furthermore, we can substitute (X (i1' 2'"' Sn) a ) for p' anywhere outside of a function expression by virtue of Rule 7. Thus <'behaves just as ~ did (i.e., any proof using <'can be converted into a similar proof using p simply by replacing all occurrences of (X ( E1';2''' * n) a'') and i' in the proof by ~ ), with the one addition we wanted and the further addition

-106that by virtue of Rule 3, we can substitute for bound variables in %'. Let us say two expressions are almost identical if they are identical or if they differ only by changes of bound variables. Any theorem 6 in the system with P has a counterpart 6' in the system with V' where 6' is derived from 6 by replacing all occurrences of expressions almost identical to ( X(Si1' 2''' Sn) a"') with 1 and then replacing all remaining occurrences of expressions almost identical to.with %. Thus, we have introduced no inconsistency and have added, in effect, a function constant without adding to the alphabet. We can introduce any label expression in this way by introducing its characteristic axiom as a new axiom, so long as the characteristic axiom holds under the intended interpretation for the label expression. The result will be a new system different from Wl but still consistent. Consider the label expressions whose abbreviations are given in Tables 3 and 6. It is easy to see that the characteristic axiom of each of these label expressions holds if the label expression is interpreted as the function defined by the LISP program that it resembles. Thus, the characteristic axioms of these label expressions may be added to the axiom set without destroying consistency. Actually, since the characteristic axioms are needed only by Rule 5, and not by Rule 7, any axiom containing the label expression will do as well as the characteristic axiom so long as the label expression occurs in it and so long as it is true under the intended interpretation. We have added axiom 18 to serve for all the label expressions in Tables 3 and 6.

-107The characteristic axioms now appear as theorems. E.g. these tables which is an individual function expression: (1) = x (2) f(il,,2'''' = f(1, 2' 2.. n) (3) ~ D Pfstep(x) (4) q ( i1',2'... * ) = q (,1' 2'...., ) 1 2 n 2 2 n (5) 4'( 1, -2'''' n) (( (', 2',..,n)'' ), 1,2'...,n) (6) C p(i...P 2'''n) = a consider a q' from axiom from (1) axiom from (2) from (4) by Rule 4 & (3) by Rule 5 by Rule 7 from (5) by Rule 6 With axiom 18 added, the label expressions abbreviated in Tables 3 and 6 are in the initial set, and only expressions almost identical to them are generatable. Since axiom 17 holds under our intended interpretation for, addition of this axiom can add no inconsistencies. Note that none of the label expressions appearing in theorems of W2 have any free variables. Consistency of W3 The System W3 is just like the limited system except for the addition of Rule 20 and the changing of Rules 4 and 5 So as not to allow substitution for variables free in label expressions. We shall show consistency of W3 relative to W2 by showing that a proof of 0 in W3 may be transformed into a proof of @ in W2. Suppose al' 2'... is a proof of 0 in W3. We shall transform the proof successively. After each transformation we will still have a proof of 0 in W3. After the last transformation we will have a proof of 0 in W2. The first transformation is accomplished as follows: First, divide into equivalence classes the set of all label expressions appearing in a1, a2''.' an by putting two label expressions in the same class if and only if they are almost identical (i.e., identical but for

-108changes in bound variables). Let Z be the set of all such equivalence classes which don't contain a member of the initial set (i.e., don't contain a label expression occurring in axiom 18). If Z is empty then our proof can be trivially converted into a proof of @ in W2, by replacing any line derived by Rule 16 or 17 (Such a line contains only label expressions almost identical to those in Axiom 18.) by a derivation of that line from Axiom 18 by P.C. and Rules 5 and 3. Otherwise, to each member 0 of Z we attach an integer called the rating of o. The rating of 0 is defined to be the smallest integer I such that a label expression in 0 occurs in aI. (No two members of Z have the same rating because each use of Rule 16 or 17 generates only one new label expression - because of the } in the statement of the rule.) Let R be the member of Z which has the largest rating. Then no member of n occurs in our proof as a proper subexpression of another label expression. Now consider the line aI, where I is the rating of I. Then aI must have been derived via Rule 16 or 17 from some line, say aj. a. must be of form ~(K1' E2' "' m)- - or (1' 2""' Em) = ~. Let nr,' 12..'' nk be the variables free in O whose free occurrences in aj are all in subexpressions of form f. Let C1' 2''.' Ck be a sequence of variables occurring nowhere in a, a2''..an with Si and nr of same type for all i. Let aj' be the result of simultaneous substitution of i., 2'.' Ck for free occurrences of nl, n2'... nk in a.. For any i, let a. be the result of substituting the 1', k2' for all occurrences of n,, n2,...*nk in ai. Then alc, a2',..., a.j' can be easily converted into a proof i 1 2''j of aj. in W3 by prefixing certain lines. For each a. (1 < i < j) which is an axiom, we prefix lines deriving a.i from a. via Rules 3 and 4. i i

-109 If 61' 62... 6h are the lines we prefix to do this, then 61' 62'' a a... 2,,...,' 2', a' is a proof of aj' where s. is derived from ei 1 (and E1 from a.' and aj' from e ) by Rule 3.' We insert this proof into al, a2..., a immediately after the line aj. Note that this addition does not increase the member of classes J of label expressions appearing in the proof. Now a.' is of form 3 q'( i1' 2''' in ) or' ( 1' 2'""' n) = -. Also, all numbers of I occur in the proof only after aj', i.e., in lines aj+l' aj+2''' an Now, a member of H is of form (label, T, y). If y' is the result of replacing all free occurrences of 7, in Y with (label,,y ), then we call y' the expanded form of (label, T, y). Two label expressions with the same expanded form must be the same label expression, for otherwise they would each occur as a proper subexpression of the other. Let T' be the set of expanded forms of members of n. Let ai.,for j + 1, i 4 n, be the formula formed first by replacing all occurrences of members of Win ai with %', and then replacing all remaining occurrences of members of n by d'. Note, in the replacement no new variables become bound since all free variables in 4" which do not occur free in the element of Hn or I being substituted for, do not occur in a.. Now we can easily insert lines in the sequence a1', a2,''' aj' 61' 62' 6h a' a',..., ", c... E 2'p 1' 2' J..2' aj., a.jl, "j+2, an to make it a proof in W3 in which no member of n appea I i cj+2 p u t t appears. It is clearly a proof up to the aj~ Suppose ai' is some line after aj. 3. 2

-110 If i. was an axiom or was derived via any rule but 6, 16, 17, we add no lines. In such a case, either cl. is a. or a. was derived from 1 1 1 aK or from aK and aL via Rule 1, 2, 3, 4U, 5u, 7, 12, or 20. Call the rule used p. In such a case we must have K > j or (in the case of Rule 1, 5, or 20) K > j and L >j. Then a." is derivable directly from 1 aK or from aK and aL' via Rule p. (Note, if we were using rules 4 and 5 instead of 4u and 5 this would fail.) If a. was derived from aK via Rule 6, then we must examine the value 1 of the parameter y in the Rule 6 application. It must be of form e(n1, n2. *. nDi) (A) If e is not a member of n', then we add nothing, and a. is a. 1 1 or else K > j and ai'is derivable from aK' via Rule 6, since any member of n or H' occurring inside e can contain no dummy variable free (because in W3, as in the limited system, and as in our whole system, no variable in a theorem occurs bound inside a label expression, where the binder binding it is outside the label expression — see Section 2.1.4.4) and hence the members of I and V' remain unchanged when the n.'s are substituted for the dummy variables. Thus the Rule 6 application still works when 4' is substituted for members of o' and H (B) If 8 is a member of', then we intercalate lines deriving, from aj, a theorem a." of form (n n..., n ) or 3J 1 2 in) 4c( n1, 2',.'.., n) =- ", by substitution of the ni's for the Sis in aj. Then a.' can be derived from aKK and aj. via Rule 20. 1 j

-111 If a. was derived from aK via Rule 16, then a. is of form i 1 @D 3 (I, 2' -..,' n) * If 0 is not in H then neither a. nor aK contain an occurrence of a member of H and a.i is.l and is still derivable via 1 1 Rule 16 just as before. Hence we need add nothing. If 0 is in I then ai. is 0 D'( 1,',2'..' n) whose proof, via Rules 4 and 5 from 0 D p,.aj 2 n and aK,we can easily intercalate. If ai was derived from aK via Rule 17, we proceed as in the Rule 16 case. After all the required intercalations, we end up with a proof of @ in W3 which is much like our original proof. But now all members of H have been eliminated and yet we have added no new equivalence classes of label expressions (no new members of Z ). The set Z for this new proof has one fewer member than the set Z for the old proof. We can now transform the new proof in the same way we transformed the old proof, giving us a still smaller Z. We can repeat this until we get an empty Z, and the proof thus arrived at can be trivially converted (as we said above) into a proof of ~ in W2. Hence if 0 is provable in W3, it is provable in W2. Then, since W2 is consistent, W3 is consistent. Consistency of W3" The system W3' is identical to W3 except for change of Rule 5u to 5" u. At first sight it might appear that this change adds some power to the rules, but it is clear upon reflection that any proof in W3' can easily be converted into a proof of the same theorem in W3, simply by replacing each step which uses Rule 5'1 by the proper sequence of steps using Rules 4U and 5. Hence the theorem sets of W3 and W3' are identical, and W3' is consistent because W3 is consistent.

-112 Consistency of W4' W4' differs from W3' in the substitution of Rules 4 and 5' for 4u and 5'U. We shall show the consistency of W4' by showing that W4' and W3' have the same theorem set. We shall do this by showing that a proof of a in W4' can be transformed into a proof of a in W3'. We shall, then, be considering transformations of proofs in W4'. We shall begin by considering transformations of a special kind of W4' proof which we shall call a single substitution proof. A single substitution proof is a proof in W4' which becomes a proof in W3' upon deletion of its last line. We shall first show that we can transform any single substitution proof into a W3' proof which contains all the lines of the single substitution proof. The transformation will be in two steps, the first step being what we shall call preprocessing. To see what preprocessing is, we need to discuss some transformations of proofs in W3. If al, a2'...' a is a proof in W3'and n and C are variables of the same type, c not occurring anywhere in the proof, and if a.i is the result of replacing n by c in all bound occurrences in ai, then there is a proof 1', 2','.,m in W3' in which each of the a.' appear as lines and in which n never appears bound except perhaps in 1 formulae which are almost identical to axioms (i.e., identical but for change of bound variable). We construct the proof 81, 82',, Bm by intercalating lines in the pseudo proof a "' a2'. a' n Now a1 a2,... anl is almost a proof as it stands. Certain of the steps, however, are "illegal." I. e., they would be legitimate steps in W3' if it weren't for an unusual change of bound variables. We shall show how to add lines to successively reduce the number of "illegal" steps.

-113 Let a.j be the consequent of one of the "illegal" steps. It is not hard to convince ourselves that a. must either be an axiom or it must be J the result of application of Rules 2, 16, or 17. If a. is an axiom, then we J insert before aj' in the pseudo proof, steps deriving aj from a. by successive applications of Rule 3. If aj iS the result of an application of Rule 2, 16, or 17, then we modify a1' a 2... aj by changing all n into C (and then adding lines so that the axioms with C substituted for n are legitimately derived from the real axioms via Rules 3 and 4). The resulting sequence is inserted into the pseudo-proof before aj.. This puts, before a.j, a theorem identical to a.j except that free n's are replaced by r This theorem will be identical to a. if a. was originally generated via Rule 2. Otherwise a.j may be legitimately derived from this theorem J via Rule 4, since none of the free n's in aj. are inside a label expression, aj' being in the special form taken by the results of Rules 16 or 17. This eliminates one "illegal" step. We proceed in this way to eliminate each "illegal" step. This finally gives us a proof in W3' in which n does not appear bound, and in which each ai' is present. Suppose a1, a2,..., an is a proof in W3'; we can convert this proof with respect to an expression g by successively transforming the proof as above such that for each variable free in r, this variable does not appear bound in the converted proof, and such that for each ai there is a line in the converted proof almost identical to ai (i.e., identical to ai but for a possible change of bound variables). We are now ready to discuss single substitution proofs. A proof v1, v2,.. oK in W4' is called a single substitution proof if V1, v2,...,VK1 is a proof in W3'. If the single substitution proof is not itself a proof in W3' then vK must have been derived by substitution of an expression B for a free variable i via Rule 4 or 5.

-114 We shall first show we can transform all such single substitution proofs into proofs in W3. We do this in two steps. The first step is preprocessing, which is accomplished as follows: We first convert vl, v2,..., K-l with respect to 6, giving al, a2','' n-l' Then there must be a formula a almost identical to vK such that al' a2',. an is a single substitution proof, and such that an is derived by substitution of'Vfor i, where B'is identical to B except for certain changes in bound variables. Pick one such an. Then a.,a2'. an is a preprocessed form of vl, v2,..., VK. a1' a2''.. a is a single substitution proof of a special kind, for not only is its last step derived by substitution of an expression' for a free variable g (as is the case for all single substitution proofs), but variables free in V' occur bound nowhere in a1, a2..' a. Such a single substitution proof is called a preprocessed single substitution proof. (A single substitution proof which is already a proof in W3. is also called a preprocessed single substitution proof.) If a in such a preprocessed single substitution proof is derived n by Rule 4 or 5' from line ai by substitution for a variable free in a label expression in ai then we say the rating of the proof is i. Otherwise we say the rating is 1. (Note: the first line of a proof, being an axiom, contains no label expressions with free variables.) We shall now show we can transform every preprocessed single substitution proof al, a2,..., an into a proof in W3' in which each a. (for 1 < i 4 n) occurs as a line. We shall do this inductively. Proofs of rating 1 can be transformed trivially. Suppose we know how to transform all preprocessed single substitution proofs of rating less than j. We shall show how to transform an arbitrary single substitutionsproof of rating j.

-115Suppose al, a2... an is such an arbitrary proof. Since the proof has rating j we know an is derived via Rule 4 or 5' by substitution of 8 for ~ in the line aj, where E occurs free in a. inside a label expression. Hence a. is not an axiom nor is it a theorem which is almost identical 3 to an axiom, since such formulae have no variables free inside label expressions. We also know that a. J

-116wasn't derived by Rule 8, 9, 10, 11, 13, 14, or 15 since theorems from these rules have no labels. Suppose a. was derived from ak (k < j) by Rule p where Rule p is either Rule 2, 3, 6, 7, 12, 16, or 17. In that case let y be the result of substituting 8 for 5 in ak (this substitution is always legal because variables free in 8 don't occur bound in ak unless a. is almost identical to an axiom, and we said this wasn't the case ). Consider the proof al, a2,.. n-l' Y This is a preprocessed single substitution proof of rating k and by the induction assumption can be converted into a proof 81, 2'..., (m > n) in W3' in which Y and each ai (for 1 < i < n - 1) occurs as a line. Now an can be derived from Y via Rule p, where the parameter values are modified by substitution of 5 for i where appropriate. (Some detailed checking is needed to verify this; we shall not reproduce the details here.) Also, y occurs in B1, B2,., m so B,.. a, is our desired converted proof of al, a 2...' an We can make a similar argument if aj was derived from ak and ak' by Rule 1 or 20. Let y and y' be the result of substituting 3 for 5 in ak and ak, respectively. Then by our induction assumption we can convert ald a... anl, Y and al' 2a... a n-' y into proofs 1,' 3..., m and' 2,..., m in W3. Then 1 2 m 1,2 m' B1' B2'',m' 1' O2h., an is our desired converted proof of a1', a,..., a. an is derived from Y and Y' by Rule 1 or 20 as is appropriate. If a. was derived from ak and ak. via Rule 5, substituting V" for n in akthen if n is different from i we can handle this situation just like the Rule 1 case. If n and E are identical, our final proof is 81' 2' m a, a1',..., a2'"' n where a is derived from a and y by Rule 5'. i p 2 ) 2 n n k

-117If aj was derived from ak via Rule 4, substituting'' for n in 3 ak, then we first construct a few lines el, e2,..., Ez of a W3' proof which contains no labels, with ek being a theorem containing B". (Since 6"is a simple expression, this is trivial.) Let y and y' be the result of replacing free i with B' in ak and es respectively. By the induction assumption we arrive at a proof 1,' 82',.., m in W3' containing y as before. 1' B2,..., m',',2',' e,' y', an is our converted proof, with a derivable from y and y' by Rule 5', and y'derivable from e by substitution of B for free E.

-118Thus, we have shown inductively that we can convert every preprocessed single substitution proof into a proof in W3'. Suppose vl, 2,..., VK is any single substitution proof, with vK resulting by substitution of 8 for i. We preprocess'., v2,..., K to get a preprocessed single substitution proof al, a2"''. n where an is almost identical to vK.Convert al, a2,..., an into a proof 1', 2'..., in W3' in the manner we have shown. Let a, E,,, vK be a derivation of VK from a by successive changes of bound variables. Since a is among 1' 2'..., 8m the following is a proof in W3': 1' V2'*'' VK-1' B1' 82''''' m' El' 2,'''' Ch K Thus we have shown that we can convert all single substitution proofs into proofs in W3' containing all the lines of the single substitution proof. This means we can convert any proof in W4' into a proof in W3' as follows: Suppose al, a2,..., a is a proof in W4' and suppose k n of the lines are derived by substitution for a variable free inside a label expression. Suppose a. is the first of these undesirable lines. Then J al' a2,..., aj is a single substitution proof which we can convert into a proof 1', B2..., m in W3'. Then B1, 2..., M j+l aj+2,. an is a proof in W4' with only Z- 1 undesirable lines. Proceeding in this way, we eliminate all undesirable lines, giving us a proof in W3'. Since all of the original a.'s appear in the converted proof, anything provable in W4' is provable in W3'; W4' and W3'have the same theorem set. Therefore, W4' is consistent since W3' is. Consistency of W4 W4 differs from W4' in the elimination of axiom 14' and Rule 20 and the substitution of Rule 5 for Rule 5'. Now Rule 5 is just a special case

-119of Rule 5'. Thus any proof in W4 is a proof in W4'. Hence if @ is provable in W4, it is provable in W4'. By the consistency of W4', then,W4 is consistent. Since W4 is the limited system, we have shown the consistency of the limited system as promised.

-120 2.2.1.2 Preservation of Consistency while making identifications Between Object and Meta Levels. (Completion of Consistency argument.) We have shown the consistency of the limited system; we must now show the consistency of our entire system. Our entire system differs from the limited system in the addition of Rules 18 and 19 and the addition of a procedure for adding still more rules. We shall first show that the addition of Rules 18 and 19 to the limited system will not destroy consistency, will not even add any new theorems to the system's theorem set. As we said in Section 2.1.4.2, Rules 1-17 may be written as formulae in the object language. (It is not hard to show that T is a generatable function expression.) In fact, so written, they happen to be theorems of the limited system. (This dual nature of rule-expressions is basic to our system: In the object language, these expressions are theorems. In the meta language, they are rules of inference. The exception, Rule 19, is discussed below.) A completely equivalent form for Rules 1-17 is given in Table 7. A glance at the Table 7 formulation of Rules 1-17 and at the definition of T shows us clearly that the following two meta theorems hold for the limited system. (1) For any S-expression a, if a is a theorem then T() is a theorem. (2) For any S-expression a, if T(9) is a theorem then a is a theorem. Thus the addition of Rules 18 and 19 (which merely state these meta-theorems) to the limited system as rules of inference adds no new theorems and so can't destroy consistency. Are Rules 18 and 19 also theorems of the limited system, the way Rules 1-17 are? We have a recursive procedure to convert, for any a, any proof'of a into a proof of T(i). Thus it is not hard to prove Rule 18

-121 as a theorem. There is, however, no recursive procedure which will take us, for any a, from any proof of T(() to a proof of a. It turns out that Rule 19, as a theorem, is not provable and it is the only rule that is not so provable. We could increase the set of axioms by adding Rule 19 as an axiom. This would not destroy consistency, but since the T in the new axiom refers to a theorem in the old system and not a theorem in the new system with the axiom added, we just have the same thing to do over again, this time with a T1, which means theoremhood in the new system, etc., etc. Can we rewrite Rule 19, replacing T with a function which means provable in the system to which this re-written Rule 19, has been added, thus solving the problem once and for all? Yes, it is possible to so re-write Rule 19; but whether the addition of this re-written rule as an axiom destroys consistency, I do not know. Now we have shown that the limited system with Rules 18 and 19 added is consistent. Consider the class of theorems in this system of the form a D T(g). Any theorem in this class may be regarded, as described in Section 2.1.4.2, as the description of a possible rule of inference. Any such rule of inference could be added to our system without destroying consistency, without even adding anything to the set of theorems of the system.

-122 To see why this is so it is only necessary to see how a proof step using this new rule could be replaced by steps using the old rules. Consider a proof step in which the new rule a D T(8) is used to derive theorem w from theorems y1, Y2"'". n,with the parameter values, %..., being used for the free variables 1,',2'.'.' (See Section 2.1.4.2 for description of how the new rule is applied and for explanation of our terminology.) We shall show how to derive X without the new rule. By Rule 18 we.can derive theorems T(( J, T(C ),..., T( By hypothesis,a D T(8) is an already proved theorem. We can use Rule 4 to substitute the parameter values for the free variables in this theorem, giving a new theorem ~C T(6). We shall make use of the following meta-theorem. (A) ( F(x) A oneapZ(x) = y) D T(x y) (The proof is by induction on the G6del number of the expression named by x.)oneapZ is that function from which apZ is built by iteration. Its definition is in Table 9, and it is generatable. The above meta-theorem is the meta-theorem from which one would prove the meta-theorem (B) (F(x) A apZ(x) = y) D T(x y) which we stated in English in Section 2.1.4.2.

-123Let oneapl' be defined exactly as oneapl is except that oneaplV(QT fj ) ) names ~ for all yi I.e., the definition is: oneapl'(x) = [x= Tyv x= Tv...vx= v x4 mol(x) + x; a(x) = ( )oneapl(ad(x)); a(x) = p v a(x) = ( ) + [atom(d(x)) x; aad(x) = + adad(x); aad(x) = ( + a(x) * dd(x) ) - a(x) * ( oneapt'(aad(x)) adad(x) * dd(x))]; a(x) va(x) = () a ((x = apZtwoatoms( Qa(x), oneapZ'(ad(x)) q oneapl'(add(x)) ); a(x) = ~ v a(x) = ( Pfatom(a(x)) Ifatom(a(x)) - aploneatom( a (x) oneapZ (ad(x)) ) atom(a(x)) x; aa(x) = + Ssffixt(ada(x),d(x),adda(x)) aa(x) = la) Sffix(ada (x),a (x),adda (x) * d(x) ~+ x]

-124Let z be a class of well-formed expressions defined by the statement that a well-formed expression p is in Z if and only if one of the following holds: 1. H is of form T( ) for one of the; 2. p is of form (cond, (P1, al ), (P2, )2)'"'* (Pk' k )) or of form (pcond, (p1' al ) (p2, a2)..., (POk Ok)) where P1 is in; 3. p is of form (D, p, a ) or of form (=, p, a ) or of form (*, p, a ) where either p or a are in Z; 4. p is of form ( a, a ) or of form ( d, a ) where a is in; 5. p is of form ( a, a) where p is a function symbol of our alphabet and ais in. Note that each p in Z contains certain occurrences of T(C)'s such that if these occurrences are replaced by ~ the resulting expression, which we shall designate as p, is not in z, and such that replacement of any

-125fewer occurrences of T( I)'s results in an expression still in Z. Note also that if p is a well-formed formula in Z then p - i is a theorem. This is because the occurrences replaced are all outside function expressions and not within the scope of any quantifiers, and also because the T( )-'s are theorems. Now we can prove the meta-theorem (C) (F(x)AoneapZ'(x) = y) D T(x ) y) from meta-theorem (A). We do this as follows. Suppose p is the well-formed formula named by x in the statement of meta-theorem (C). Let v be the expression such that =oneapZ ( holds. Then meta-theorem (C) claims = E v is provable. Can we show this? If oneapl( )=0holds, we get p - v by meta-theorem (A). If oneapl() =1 does not hold, then P is in, in which case either holds or oneapZ( )= holds. In either of these latter cases we can prove p v from meta-theorem (A) and p ^. Thus we have proved meta-theorem (C). Now return to the theorem ~ D T(6) which we derived above. We know that repeated application of oneapZ' to e eventually yields e. (This is because repeated application of oneapZ' is exactly what we did when we applied the new rule -see procedure for rule application outlined in Section 2.1.4.2- and the application of the rule was successful.) Hence by meta-theorem (C) we can prove e = e and hence we can prove T(6). Now apZ (Z) = (holds (since the application of the new rule was successful) so by meta-theorem (B) we can prove 6 5.From axiom 7b we get P(6) D P(~) and thus T(6) DT(~) and T( ).Then we get w via Rule 19. Thus we have proved w without the use of the new rule. Thus the addition of the rule a D T (S) to the set of rules of inference did not add any new theorems to the system and so did not destroy consistency.

-126 Our procedure for adding rules of inference to our system is simply to add, as a rule, any theorem of form a D T(B) that one wishes to. Our system, then, can be thought of as being identical to the limited system except for the addition both of Rules 18 and 19 and of a procedure for adding an unlimited (but always finite) number of additional rules of inference. We have shown that these additions add nothing to the theorem set. Hence our system is consistent since the limited system is. 2.2.2 Incompleteness. Having shown the consistency of our system, we shall now show some of its other formal properties, specifically those properties related to incompleteness. We shall need to use the theorem (A) T(y x)D T(y0 apl(x)) This is a generalization of the theorem T(x) D T( ap (x)) whose tedious proof is sketched in Section 4.10.9. A similar technique gives us a proof of (A). Now theorem (A) may be used as a new "derived" rule of inference, as explained in Section 2.2.1.2. When we use it this way we shall refer to it as Rule (A). Now define 0(x) =: Sf(C, u( x?, x) As an example of the use of Rule (A), consider the following derivation: (1) P(Q(f(w))) D P(Q(f(w))) by P.C. (2) -T( 0( (z ) D T( (0( T( ( )) from (1) by Rules 5 and 4 several times (3) %-T( 0(( C(zh))) D (T(() () )D ~_ ) from (2) by Rule (A)

-127 (4) -T( )) D(T( z ))) from (3) by Rule 6 (5) T( 0(( ( )) D T 0( )) from(2)by P.C. (6) T( (z)) D T( ( ) from (5) by Rule (A) We shall show that our system is incomplete by exhibiting a well-formed formula which is not provable and whose negation is not provable. The formula is (7) T( 0((T(0(z)))) whose negation is (8) 0T( T 0(z) )) Now if (7) is provable then (9) T( T e ) follows from (6) and (7) via modus ponens. From this, (8) follows via Rule 19. Further, if (8) is provable then (9) follows via Rule 18 and (7) follows from (4) and (9) by P.C. Hence if either (7) or (8) is provable, they both are and our system is inconsistent. But we showed our system was consistent and so neither (7) nor (8) is provable. Hence our system is incomplete. Clearly then (9) is not provable either, nor is its negation (10) -T( vN (: ) provable. For if (10) were provable then (8) would follow from (6) and (10) by P.C. Consider the following statement: (B) "If a is a well-formed formula then T() D a is provable."

-128 This statement does not hold for our system. Consider the case when a is (8) above. Then the statement claims we can prove (1. 1) T(g ) ) 3 D -T( ())( But if we could prove (11) then from (6) and (11) we could prove (8) by the propositional calculus. Since (8) is not provable, neither is (11). Note the vast difference between the false statement (B) above and the statements of Rules 18 and 19, either as rules or as formulae in the object language. (As formulae, of course, Rule 18 is a theorem and Rule 19 is not.) The statement (B), in our notation, is F(x) D T(T(qu x )x) This is not a theorem and is false under the usual interpretation. Note: if a is statement (8) then -T( ) is statement (10). In such a case a is not provable and yet -T(() is not provable either. Hence it is clear that although T weakly represents the set of theorems, it does not strongly represent the set of theorems nor does it "express" provability in the sense of Mendelsohn [Mendelsohn, 1964 p. 177 ff]. We shall now show the non-provability of consistency within the system. We shall show that (12) F(x) D (-T(x) V T(g. x)) is not provable. We first consider the following proof: 1. eT( T 0( (z) )) D -T( 0t e( )Q ) this is (4) which we proved above 2. Tt( @T( 0 Z)D ) from 1. by propositional calculus and then Rule 18 3. T(T d Rl e 1 use ) Das a teor0 from 2.and Rule 1 used as a theorem.

-129 4. T(x) DT( ( ] x) ) Rule 18 used as a theorem. 5. T( 0 ( O ) D T(T( )< from 4. via Rule 4 6. -T ( T o -T ) DT( O ( ) from 3. and 5. by propositional calculus. Now if (12) were provable we could prove (13) -T(T 9 ) v -T(( () ) from (12) and the clearly provable F( ( ). Then from line 6. and (13) we could prove (10) which we know is not provable. Hence (12) is not provable. We shall now show that for any well-formed formula a, the formula ~T(()) is not provable. Let a be any well-formed formula. Then we can construct the following proof: 1. -V(x)(F(x) D (-T(x) v -T( x))) 3(( F((F(x) T(x) A T(~x)) by P.C. 2. (T(x) A T(x ( )) T( ) from Rule 1 used as a theorem 3. 3(x)(F(x) A T(x) A T( x)) DT(( ) from 2. and T(()x) > T(x()) (this from Rule 6 used as a theorem -- tedious proof) 4. T(~ )) by propositional calculus and Rules 4 and 5, followed by Rule 18. (This is where we use the fact that a is a well-formed formula.) 5. T() ) DT( from 4. 6. -V(x)(F(x) (-T(x) -T(Q x))) DT(()) from 1., 3., and 5.

-130 7. -T( ) V (x)(F(x) (-T(x) v -T(( x))) from 6.by P.C. Hence if -T( ) is provable we can derive (12) from line 7 above. Since (12) is not provable, as we have already shown, -T(( ) is not provable. Are there any S-expressions a for which ~T(Q ) is provable? Yes. We can show that if a is not a formula type expression, then -T( ) is provable. We do this as follows: Suppose a is an S-expression which is not a formula type expression. Then since Ftp is a complete recursing function expression, we can derive the following lines: 1. Ftp( ) Ftp( ) by P.C. and Rule 5 2. Ftp( ) ) from 1. by Rule (A) 3. T(x) Ftp(x) by the same tedious proof we referred to for proof of T(x) D F(x) earlier. 4. T( ):Ftp(() from 3. 5. -T(( ) from 4.and 2. by P.C.

3. IMPLEMENTATION 3.1 Purpose of Section 3. In Section 3 I shall show the existence of a class of interesting machines which utilize the language described in the last section. These machines, when regarded as adaptive theorem provers, do not possess those limitations and obvious shortcomings which have plagued previous adaptive theorem proving machines. The class of machines discussed here is diverse. Machines which possess identical memory structure may differ in their search algorithms. Machines whose memory structure and search algorithms are essentially the same may differ in the specific reward schemes used. Now the adaptability and rate of growth of a machine in a particular problem environment depends on the reward scheme. I will not in this paper attempt to discuss and compare various reward schemes for various machines. A meaningful discussion along these lines would have to include a more precise definition and meaningful classification of the various problem environments. This is beyond our reach as yet. It is hoped that expected behavior of machines of the kind discussed here might become the basis for such a classification in the future. Since demonstrations of growth rate and degree of adaptability must await either this kind of detailed study or actual programming, the current attraction of this class of machines comes chiefly from the interesting hierarchical method of handling heuristics, which bears a resemblance to the techniques of Polya, and which avoids the limitations of previous adaptive methods, and from the fact that comparison with other hierarchical systems (eg. [Holland 1961]) suggests that this class of machines contains members with an acceptably limited growth rate.Although the growth rate may be acceptably limited, the absolute size required is probably very large for interesting members of this class of machines. The construction of these -131

-132machines may be beyond the range of present day technology. Simpler members of the class can be programmed on present day computers. One machine in the class, as a result of a very restricted reward scheme, acts just like Newell Shaw, and Simon's General Problem Solver. [Newell, Shaw, Simon 1961 ]. I will indicate, by way of example, how this machine operates. The discussion of such examples will be preceded by a characterization of a subclass of the class of machines. This subclass is large enough to include many of the interesting adaptive machines. The discussion of the adaptive properties will necessarily be imprecise, but it will show the operation of the hierarchy and it will indicate several rather human aspects of the problem solving apparatus. The discussion is not designed to specify the properties of any particular machine, but only to indicate the usefulness of the language described in Section 2. 3.2 Characterization of the Subclass of Machines. A typical machine in the subclass consists of two parts which I shall call the memory and the effector (or central processor). The memory is merely a storage section in which we store already proved theorems, useful formulae, and other items mentioned below. The effector is a set of algorithms which operate on the memory one after the other. Some of these algorithms are indicated in the following typical sequence of effector operations. Suppose the user presents the machine with a theorem to be proved. The "insett problem" algorithm places the theorem in the memory at a special spot and returns a pointer to that spot. This pointer is handed to the "refineproof' algorithm (the only one which we shall discuss in detail) which alters the memory to produce, inside the memory, a proof of the theorem. Another algorithm reads this proof to the user who then says how much he thinks the proof is worth. A fourth algorithm takes this rating and makes certain

-133rewards of formulae in the memory on its basis (we shall indicate how this is done). After this, other effector algorithms may juggle rewards within the memory, cause certain formulae to be forgotten, and cause other formulae to be produced on the basis of the rewards. We shall first discuss the structure of the memory. Then the refineproof algorithm will be outlined. After that we will discuss, in general terms, the other algorithms and further possible sophistications. 3.3 Structure of Memory 3.3.1 Basic Plan of the Memory Net. The memory may be thought of as containing a finite number of entities called memory nets. (Later we will see how to arrange things so that we need only store one net.) The structure of memory nets was partially explained in Section 1. A memory net may be diagrammed as a set of nodes connected by arrows (called paths). The direction an arrow points is only a reference direction; the algorithms are able to follow a path in the direction of the arrow or in the reverse direction. Some arrows are thick (drawn as a double line) and some thin. Various triangular flags are attached to nodes and paths. There are three types of nodes: join points, formula nodes, and derivation nodes. These are diagrammed respectively as a dot, a rectangle, and a circle. Each join point is attached to only one outgoing arrow. Arrow heads for arrows coming into join points are not drawn. Arrows coming into join points are thin arrows originating at formula nodes. An arrow leaving a join point is thin and terminates at a derivation node. Thick arrows always originate at a formula node and terminate at a derivation node. Each derivation node has two thin arrows coming in from join points, one thick arrow coming in from a formula node, and one thin arrow leaving to a formula node. (A typical neighborhood of a derivation node is seen in Figure 2,

-134Section 1.) Each formula node has any number of thin arrows coming in from derivation nodes, any number of thick arrows going out to derivation nodes, and any number of thin arrows going out to join points. Each formula node contains an S-expression of the language of Section 2. (The S-expression may be thought of as written inside the rectangle.) The triangular flags are of three types. The tag type contains a T or H. The parameter type contains an individual variable of the language of Section 2. The value type contains a number. Not all entities satisfying the above conditions are memory nets, but it will be easier to state the other conditions if we first define "bug value." "Bug value" can be most easily defined if we consider how a memory net might be stored in a digital computer. One convenient way of storing a memory net in a digital computer is as a memory structure of the LISP type [McCarthy 1962 ]. We first convert the memory net and all its contained S-expressions into one huge special S-expression called a net-expression. This is then stored as an S-expression is stored in LISP. If we do this then we can write the algorithms of the effector as LISP programs. (We shall indicate the general outline of a LISP program for the refineproof algorithm.) Each algorithm then becomes a LISP function whose arguments are to be memory nets. Dummy variables used in these functions will be called bugs to distinguish them from the individual variables of the language of the preceding section whose values were normal S-expressions (not net-expressions) rather than nets. (Definition: A sequence will mean an expression of the form (a a' a2'' an). The a. are said to be members of the sequence.) A net may be written as a net-expression in many ways. One way is to select one node to begin with and write a sequence, the first member of which

-135 is the contents of the node (if it's a formula node), the second a sequence of contents of the various flags of the node, and the others data pertinent to the various arrows entering and leaving that node, data such as direction, thickness, contents of any flags on the arrow, and a sub-S-expression in sequence form which describes the node at the far end of the arrow in the same way that the whole sequence describes the original node. By a recursive argument, we see that the whole sequence gives the structure of the entire memory net seen from the point of view of the originally selected node. The members of the sequence, except the first two, contain sequences which give the structure of the memory net from the point of view of each of the neighbors of the originally selected node. Hence if one begins with a sequence which describes a net from the point of view of one node, and wishes to derive from it a sequence which describes the same net from the point of view of a neighboring node, one has only to pick the appropriate sub-S-expression of the original sequence. And to return from this new sequence to the original, we once again take the appropriate sub-S-expression. Notice that the net-expression, as we have described it, is an infinitely deep S-expression. I.e., such a representation as we have described above results in an infinite S-expression when we try to represent any net of two or more connected nodes, because each node is a neighbor of its neighbors. However, by using the LISP functions RPLACA and RPLACD in the proper places we can construct a net-expression that, while finite and with no such redundancies, curls back on itself in such a way that it appears to any LISP function operating on it to be the infinitely deep S-expression we have described above.

-136 Such apparently infinite net-expressions will be said to describe a memory net from the point of view of a particular node. These net-expressions will be named by the arguments to which we apply the algorithms in the effector, and they will be the values over which the bugs range. Hence a bug value is an S-expression which describes a particular net from the point of view of a designated node. Each bug value describes a net and designates a node of that net. Let us make some informal definitions. Suppose E is a bug value. Then let us refer to the net described by; as E. Similarly, let us refer to the node designated by i as i. Now certain thin arrows may terminate at i The origins of these arrows are, then, certain nodes in [. These certain nodes are called thin-predecessors of 2. Now other thin arrows may originate at i. The terminations of these arrows are, then, certain other nodes in. These certain other nodes are called thin-successors of i. We similarly define thick-predecessors of i and thick-sucessors of i by replacing the word "thin" by "thick" in the above two definitions. 3.3.2 Some LISP Functions on Bug Values We shall now mention several LISP functions defined on bug values. The precise definition of these functions depends on the precise method of storing memory nets. Definition of derivations and derivationsusing: Suppose 5 is a bug value and i is a formula node. Then derivations( ) and derivationsusing( ) both name seqences of bug values; each of these bug values describes 1. The members of the sequence named by derivations(a) designate the various thin-predecessors of e. The members of the sequence named by derivationsusing(0) designate the various thick-successors of We will want these sequences to be examined by other LISP functions. In

-137 doing so, it will be important to examine and modify the cnntents of valuetype flags attached to the arrows connecting C to the nodes designated by the members of these sequences. This means that the arrow in question must be specially marked in each member of the sequences. We will assume this has been done; any number of methods are possible. (One method is to generate, instead of the sequences of bug values, sequences of pairs, each pair consisting of a bug value and a pointer to the relevant flag in the bug value. In this way the net itself is not changed. I shall assume for simplicity that the members of the sequences are the bug values, but in a specific implementation scheme this need not be so.) Definitions of containedformula, T-tagged, and H-tagged: If i is a bug value and E is a formula node, then containedformuZa( ),which we write as, A names the S-expression contained in i. T-tagged and H-tagged are LISP predicate expressions. T-tagged(9) holds if and only if i has a triangular flag containing a T. H-tagged( ) holds if and only if t has a triangular flag containing an H. Definitions of down and ruZe: Suppose 6 is a bug value and 6 is a derivation node. Then down( ) and ruZe() both name bug values describing W. The bug value named by down( ) designates the single thin-successor of 6. The bug value named by rule(C) designates the single thick-predecessor of 6. Another condition on memory nets: Recall that if 6 is a bug value and 6 a derivative node then there are only two thin-predecessors of 6, and that these are both join points. One is called the antecedent node associated ~~~~~~~~~A /\~~~~~~A with 6. The other is called the parameter values node associated with 6 They may be told apart as follows. Each arrow terminating at the parameter values node has a parameter-type flag. The parameter-type flags on the various arrows coming into such a join point are all

-138 different from one another. Arrows terminating at the antecedent node have no such flags. Definition of antecedents and parameters: Suppose 6 is a bug value and 6 is a derivation node. Then antecedents() names a sequence of bug values each describing (. These bug values designate the various thinpredeeessors of the antecedent node associated with 6. Again, suppose 6 is a bug value and 6 is a derivation node. Then parameters ( ) names a sequence of pairs, each of the form (ni, 8i) where each n. is an individual variable and each B. is a bug value describing ]. The Bi.s designate the various thin-predecessors of the parameter values node associated with 5. For each i, n. is the variable written on the parameter-type flag attached to the thin arrow which goes from B. to the parameter values node associated with. The functions we have defined allow us to move from one node to another of a net. We can make statements such as: If x designates a derivation node then x e derivations (down(x)) holds. (Unless the pair scheme of tagging is used which I mentioned.) In this statement, E was meant to be the LISP function of that name. However, if we ignore the order of members. in a sequence and regard it as a set, then e can be thought of in the set theoretic sense. We will frequently use such set theoretic abbreviations on sequences the order of whose members is unimportant. So instead of x * derivations (x) we may write derivations (x) U x t; we may write x C y instead of andlista(( X (z) z E y), x); etc. We shall need some more LISP functions which we can define by definition statements in LISP m-notation as described in the LISP 1.5 manual [McCarthy 1962 ]. However, instead of using exactly the m-notation, we shall use a modified m-notation which is identical to the notation used in

-139 Section 2.1.2 in our definition statements for the function type expressions. For example, we define the following: pair(x,y) =: [x = 0 + 0; 0 - (a(x)Q a(y)@ * pair(d(x), d(y))] projl (x) = mapZistcar(a,x) proj2 (x) = maplistcar(ad, x) find(x,y) =: [y = 0 + 0; aa(y) = x + a(y);,-+ find(x, d(y))] and variablesin is defined such that variablesin (a) names a sequence of the free variables in the well-formed expression 7. Although the notation here is the same as that of Section 2.1.2, the definition statements here imply the actual definition of a new LISP function, whereas the definition statements of Section 2 were merely summaries of our abbreviation conventions. Note,we use "as in Section 2.1.4.1. 3.3.3 Condition on Derivation Nodes (Rules and Heuristics). In Section 2.1.4.2 we saw that a rule of inference was simply a theorem of form a D T(3). We specified, in that section, the procedure for applying such a rule of inference. Now a heuristic, in our scheme, is formally a formula type expression (not necessarily a theorem) of form a D T(B). The procedure for applying a heuristic is the same as the procedure, described in Section 2.1.4.2, for applying a rule of inference. Of course, the usefulness of a heuristic depends not on its form as an expression but rather on its position in a memory net. The significance of its position in a memory net was outlined in Section 1. In Section 1 we did not have available the notation to discuss the actual form of heuristics and rules of inference. We have now specified that form and specified the application procedure. We shall now review the application procedure and re-state, this time with the aid of the notation we

-140haie developed, the way in which the connections to a particular derivation node reflect an application of a rule of inference or heuristic. In the rest of Section 3 we shall use the word rule to mean something which is either a rule of inference or heuristic. Thus, a rule of inference is a rule which, as an S-expression, is a theorem. Consider a particular rule of form a D T(B). Suppose y is a sequence of pairs such that all the free individual variables in the rule are contained in proj )', and for all, if Q projl( ), holds then ad(find(Q, )) names a bug value. Then 8 can be evaluated as follows: (1) for each individual variable U which is free in B, substitute the S-expression named by ad(find(,Q )) for all free occurrences of u in S. (2) Then apply the function apt to the result. This application will terminate in time T and yield a constant, or it won't. If it does we return the result (call it a). If it doesn't we return. The LISP function that does all this is result: If i names a D T(B), n names y, and C names a number such that 9= (C) holds for some specified function %, then result(S, nr, ) names p if the application terminated in time T and names @ otherwise. In a similar way, we could make the same substitutions in a and apply apt. However, here we will be likely to encounter a T. Let us simply keep track of our T encounters in the following way. Each time we try to evaluate api((T ) let us return the value, but add the S-expression named by the evaluation of apl() to a special sequence a which we keep for the purpose. (In evaluating the apl( ) we return when a T(6') is encountered, adding the S-expression named by ap( ) ) to a, etc. etc. ) If this process terminates before the time limit T and yields a, then return, otherwise return (. The function which does this is requiredantecedents:

-141 If C, n and C are as before, then requiredantecedents (,, n, ~) names a if the process terminated in time T and names (@)otherwise. Thus we can summarize the idea of a derivation via a rule of inference as follows: If i names a theorem of form a D T(3), y names a sequence of pairs as above, and T names a number, and if andlista(T,requiredantecedents(S,y,T)) holds, then T( result( i, y,T )) holds. We have described an application of the rule of inference a D T(8). This idea of a derivation is what is to be summarized by a derivation node in the net. Hence we want one more condition to hold on memory nets. For every bug value 6 which designates a derivation node, there exists a number K such that result( rule( ), parameters ( ) )= down () holds and requiredantecedents ( rule ( ),parameters ( ))) C maplistcar(containedformula,antecedents ( )) holds. When rule(~ ) names a theorem the derivation node summarizes the application of a rule of inference. Otherwise it summarizes the application of a heuristic.

-1423.3.4 Net Changing Functions All the functions so far discussed whose range is bug values or sequences of bug values or sequences of pairs of variables and bug values have one feature in common. If p is an expression naming such a function and 5 is a bug value which is a subexpression of U(),,.then there exists a bug value r and an integer i such that c is a subexpression of ni and W is the same net as [fl. In other words, no new memory net is produced by ~; it just produces a new pointer(in the LISP sense)to an old net. We shall now consider functions which actually produce a new or different memory net. We shall consider later how such a new net is stored. For now, we simply specify its net structure. The most obvious types of functions, of the sort which produce a new net, are those that erase nodes (and their connections) from an old net, or attach a new node to an old net. We will need both types. We will specify the operation of the second type in some detail. The operation of the first type is no different in principle. Definition of constructderivation: Let us suppose p is a bug value; o is a list of pairs such that variablesin(r ) C proj1 ( Q )holds and proj2( ) names a sequence of bug values; a is a sequence of S-expressions; and 5 is a bug value. Let us write w as (((n, 1), (n2' 2)'... (9n' n)) and write a as (1' 2 g...' n).In the caseswe are interested in, all the above bug values describe the same net and they all designate formula nodes. (We could define this for cases of bug values not describing the same net, but it is not necessary. ) Thus we could diagram a section of the described net by the solid lines in Figure 9. (I have omitted many possible flags.) Then constructderivation( Q, 9, 0, () namesa bug value u such that F is the net in Figure 9 with the broken lines added. v is indicated in Figure 9,

-143 as are new nodes Y1"',' ~,, where G=Q holds for all i. A. 11 A I_,' Ym Tt ia - /.I th ra n te cot N / \ \ -o ( t / nio ine ne."'-. i e nt ar Figure 9e, ne. l -Not In r ave, pute ci ii solid boxes are inorctthe box instead of net, not iten ne idiee fr. gram contain the node) name. not hold, although down does hold. Ofthe formula, of course. I shall ~~as and~~nodes (but not for flags) inholds. Of course, before making such a construction we would want to check to be sure there is a K such that result(, ), a) =( holds and requiredantecedents S((, (, I) ) c mapZistcar (containedformuta, a)) holds.

-144 Notice that the function constructderivation really only constructed the nodes u and y1.**. ym The others were there to start with. For some purposes we will want to use the similar function, constructparameterderivation. In this case the nodes Y',.., Ym are available to start with as are p, and 51''-, The function must construct nodes u and. Definition of constructparameterderivation: Here suppose p and w are as before, and suppose a is a sequence of bug values ( y1,.., ). As before, assume the nets [,, and E are identical for each i. Let X be an S-expression. Then constructparameterderivation(t,, a, Q) names a bug value u, much as before, describing the total net in Figure 9 (though in this case the yi's can have other connections just as the i's do), where ) = Q holds. Another sort of function which produces a new net is the join function which combines two nodes of a net. We shall give its definition by a general description in English and by an example. Suppose a and B are two bug values, [ is the same as m, a differs from 8, a and 8 are both formula nodes, and @ = ( holds. Then join changes the net [ into a new net E by combining the two nodes a and 8 into one node y. = holds and all arrows which entered or left a or 8 in the old net are now diverted so that they enter or leave y. E.g.: at-1 — { \ g s becomes EyFgr 0 -igure 10. Figure 10. We say == join{(,@) holds.

-145 3.3.5 Storage of Memory Nets - Difficulties By way of example, we have indicated three functions (constructderivation, constructparameterderivation, and join) which produce new nets. We now consider an economical way of storing these new nets. The various algorithms of the effector will be written recursively as LISP programs. The process of following the algorithms will then consist in evaluating an expression of the form $(9) where t is a LISP function expression (or algorithm) in the effector, and n is a bug value describing a net in the memory. It makes no difference whether we do the evaluation in LISP fashion (with an association list and LISP evaZ function) or in the manner of the evaluations in Section 2 (using substitution via our apt function). In either case, we will, during the evaluation, have to keep track of many bug values simultaneously. For example, in the LISP-type evaluation, these bug values will be stored on the association list together with the variables (bugs) which currently have those values. Thus at a given moment we may have to have, in memory, a large number of bug values, just as in normal LISP we have to have a large number of S-expressions. Let us call the set of bug values we must remember at current time, the current bug value set. Now an S-expression can be thought of as a pointer to a spot in a list structure memory. A bug value can be thought of as a pointer to a node in a memory net. As long as the members of the current bug value set all describe the same net, the storage problem is easily solved. The memory need only hold that one memory net and the members of the current bug value set are simply pointers to various nodes of that net. Creation of new bug values of the same type causes no problem. Suppose we execute, inside a LISP PROG, y:= down(x) where x and y are bugs and the bug value paired with x on the association list is a. This execution adds y to the

-146 association list, paired with a pointer to a node in the net E. We already have the net i[ in memory, and a pointer to the node a. That pointer is already paired with x on the association list. The pointer to be paired with y simply points to the neighbor of a which is at the end of the thin arrow leaving'. Though we have created a new bug value, we have not created a new net. The storage problem is simple here. y:= (x) (where x and y are bugs) causes no problem, then, unless ( actually creates a new net structure. Suppose now that ( creates a new net structure and we try to execute y:= (x), where x and y are bugs and the bug value paired with x on the association list is a.(For simplicity we consider a ) with a single argument, though the three examples we gave of such a ( had 2 or 4 arguments.) In a normal LISP program such an instruction causes no problems and is handled automatically by the LISP interpreter. It is the fact that our variables are not normal variables, but bugs, that causestrouble. To see why this is so, let's consider the analogue in a normal LISP program (i.e., suppose a is a normal S-expression, not a net expression ). Consider, for example, a ( which adds something onto a. In a normal LISP program, an example would be =' (X(u) * u ). In executing the instruction y:= p(x), the LISP interpreter has only to take the pointer paired with x, use it to construct a new pointer to pair with y, and save both the old and new pointers. This works because the value a of x (old value) is to be a subexpression of the value of y (new value). The situation is quite different if x and y are bugs, and p adds nodes to the net described by a. We add the nodes, create the new net expression, and save the proper pointer to one of the nodes (presumably a new one) to be the bug value for y. But what are we to do with the old

-147pointer that was the value for x? New nodes have been added to the net to which it points so it no longer stands for the original bug value a of x (Looked at another way, the bug value represented by that old pointer now contains a subexpression which is the new bug value for y, designating one of the newly constructed nodes. This certainly was not the case when we started.) The perpetrators of this unfortunate state of affairs are the pseudofunctions RPLACA and RPLACD hidden inside 4. (If there were no LISP pseudofunctions in c, such a thing could never happen, but we need them to make the new nodes into neighbors of their neighbors. Thus any function which constructs new nodes contains such pseudo-functions.) LISP pseudo-functions are notorious for changing the value of a variable behind the variable's back. (see LISP 1.5 manual [McCarthy 1962] ). We would like to be able to write LISP programs for the effector algorithms in the normal recursive LISP style and assume that no bug will change its value unless we tell it to by a statement such as x:= (x) or some such. We want to be able to retrieve the old x value if we write y = (x) followed by [ H(y) + return(y); e + return( i (x))] where H names a predicate over bug values and k is another function from bug values to bug values. To permit this we must write those functions which change net structure in such a way that the old bug values remain unchanged. One way of accomplishing this would be to copy the old net, making the required change on the new copy. Thus a new memory net would be added to memory each time an instruction was executed which changed net structure. A memory net would disappear from memory only when all pointers to it had disappeared. Then it would be snapped up by the LISP garbage collector in

-148 the normal LISP fashion. This procedure of duplicating nets has many disadvantages not the least of which is the huge storage space requirement which grows ridiculously for exactly the sorts of procedures we want to reward the system for using. 3.3.6 Patching Why copy a whole net when we only want to change a tiny part of it? It is more sensible to put a patch over that portion of the net we wish to change. To illustrate, let us use the example we drew before in Figure 10. Suppose we begin with bug values a and ( such that = holds and such that ] is the same as] and such that the neighborhoods of "a and are as shown in the left hand portion of Figure 10. Suppose we execute:'= join(, ). Then the resulting net will be (in the neighborhood of, and ) as in Figure 11. of &, B, and 9 ) as in Figure 11..a y I..... I........ q Figure 11.

-149 The paths which have been added are drawn as broken lines. They are to be distinguished from all other lines as they are members of the patch. The outer dotted line indicates the edge of the patch. Any solid line within the edge is covered by the patch. Lines crossing the edge of the patch are double, having a broken and solid component. Let us regard the bug values a, B, and y as pointers to nodes in ia, E (or i ), and EyJ respectively. I have indicated these pointers in Figure 4 by the curvy arrows. The curvy arrow is solid if it is to be regarded as pointing to a node in, and broken if it is to be regarded as pointing to a node in y]. Now let us indicate the bug values 6 and v defined such that = a(derivationsusing(Q)) and 9= a(derivationsusing(9)) hold. The pointers for these bug values are indicated in Figure 4. They seem to point to the same node, but one is solid and one broken so we see 6 is meant to point to a node in ], and v is meant to point to a node in E. The consequences of this are most clearly seen by attempting to find'rule(S)' and'ruZe( )'. In either case we start with the node pointed to by the curvy arrow labeled 6 or v. In either case we move backwards along the thick arrow which terminates there. But that arrow has a broken and solid component. In the case of rule( ) the curvy arrow was solid so we pick the solid component and move under the patch to ~. In the case of ruZe(9) the curvy arrow was broken so we pick the broken component and move up onto the patch to. Thus = rule(fr) and = rule(Q) hold just as if we had saved two whole nets. Instead of saving two nets with curvy arrows on each, we save one net with a patch, with two sets of curvy arrows (solid and broken); the members of one set "see" the patch, and the members of the other don't "see" the patch as they move over the net.

-150 If, at any time, all bug values whose curvy arrows are broken have disappeared from the association list, then we can erase the patch with impunity. Similarly, if at any time all bug values whose curvy arrows are solid have disappeared from the association list, then we can, with impunity, turn the broken curvy arrows into solid ones, erase the solid stuff under the patch, and make the broken stuff that is on the patch solid. 3.3.7 Tagging and Garbage Collection Suppose instead of distinguishing the patch by broken vs. solid, we just tag the "broken" component of the double arrows crossing the edge of the patched area with a (1). We similarly tag the "broken" curvy arrows with a (1). We say the patch is tagged with a (1). Thus our patched net stands for two nets, an untagged net and a 1-tagged net. Now, suppose we want to put on another patch. We just tag the new patch with a different number. The details depend on whether we want this new patch to go on the untagged net or the 1-tagged net. In the first case, the new patch gets tag (2) and if it overlaps the old patch it is placed upon the stuff under the old patch. In the second case the new patch gets tag (1, 2) and is placed upon the stuff written on the old patch. The curvy arrows get the same tag as the patch. Each new patch gets a tagging sequence ending in a number not in any other sequence in the association list. In this way we develop a hierarchy of patches upon patches. As in LISP, we periodically garbage collect. We examine the association list for redundancies. E.g., suppose all curvy arrows whose tag sequence contains a 3 have a 7 immediately preceding it. In that case the 7 can be dropped and the 7 patch made a part of the net it patched (i.e., of the uppermost patch it patched). The stuff under the 7 patch is then cut loose from the net and is picked up by the regular LISP garbage collector. Converse if 7 appears in no curvy arrow tag sequence, then the 7 patch can be cut loos

-151Because of the recursive structure of the effector algorithms, the number of patches will drop drastically upon completion of a subroutine. In the examples we shall consider, the number of patches will drop to zero upon completion of a "to prove" problem presented to the machine. At any given time, then, the memory of a machine consists of a single patched memory net, this being equivalent to many unpatched nets. 3.4 Other Functions Which Change Net Structure 3.4.1 Kinds of Functions to be Discussed We have yet to specify in more detail certain important algorithms of the effector. These will be built up from primitive functions (on bug values) both of the type which leave net structure unchanged (we have already defined these) and those that change net structure. We have discussed only three of the latter type; we require several more. We will describe how they change net structure; but remember that what is actually accomplished is the creation of a patch as described in Section 3.3.6. In addition to functions which add nodes to the net like those discussed above, other functions erase nodes. I shall not describe these in detail. Simpler than either of these kinds of functions are functions which merely change flags. T-tag simply attaches a flag containing a T to the node designated by the argument. I.e., T-tag(Q )names the bug value n where E can be formed from ] by attaching a new flag with a T to the node, and where' is that node to which the new flag has been added. H-tag similarly attaches a flag containing an H to the node designated by the argument.

-152Recall that various value-type flags may be attached here and there to nodes and arrows. We will want functions which read the values on these flags, raise them, and lower them. We will be looking at these values usually in one of three contexts: searching, rewarding, or punishing. 3.4.2 Searching The simplest search pattern is the search of a sequence of nodes. Suppose 1,2'.. i are bug values which all describe the same net. Now let us suppose that with each C. we can unambiguously associate a number value from some nearby value-type flag. Then we can define a function bestlist which makes a probabilistic choice from among the i on the basis of the size of these values. The bug value returned would then be the S. selected. Actually it will be convenient to raise, at this time, the number on the flag we looked at when we obtained the value to associate with i.. (I.e., "Unto him that hath shall be given", since a large value means higher probability of being chosen and being chosen raises this very value.) A more detailed description of the procedure follows. Suppose n is a sequence of bug values all describing the same net, together with a designation, for each bug value, of a value-type flag. (Examples of such a sequence would be derivations( ) or derivationsusing(( ) for some bug value C. Recall that we decided not to specify in these cases the exact method of distinguishing the value-type flag.) Suppose k is a number. Then bestlist (, ) names a bug value E determined as follows. A probabilistic choice is made among the bug values in n on the basis of the numbers on their associated value-type flags. If none of these numbers is high enough with respect to k, return 0. Otherwise return the chosen bug value and raise the number that was the cause of that bug value being chosen (i.e., raise

-153 the number on the flag). This converts the old bug value into a new one,, which describes a slightly changed net (one flag is changed). The function bestlist is, in a sense, a model for all the search functions we shall use. The principle of the other search functions is the same: select a bug value or pair of bug values from a set of bug values or pairs of bug values; make this selection on the basis of numbers on value-type flags; raise the number on the value-type flag used to select the winning candidate; return 0 if the numbers on all the value-type flags are low with respect to a parameter k; otherwise return the winning candidate, complete with modified flag. The function bestproductinlist is the same as bestlist except that the sequence is a sequence of pairs of bug values, each member of the pair with an associated value-type flag. The value of the pair is to be regarded as the product of the numbers on the two value-type flags associated with the members of the pair. Selection is based on this value. The pair selected is modified by raising, in each bug value, the values on both flags used; i.e., the two bug values in the pair returned are modified so that they end up still describing the same net. Certain parts of our algorithms are designed to handle very bad situations, when no obvious heuristic seems to work. In these cases, searches over a whole net are required in order to find the right heuristic to use. These complicated and, for our purposes here, uninteresting searches can be designed in various ways. If i is a bug value and k a number then bestnetrule (),Q) names nil or a bug value n constructed as follows: A node p of [1 is selected probabilistically on the basis of numbers on certain value-type flags attached to the nodes of [I. ( p is a bug value and ~ is M1.) These numbers are, however, not the sole criterion used in the choice. Nodes "closer"

-154 to 5 are weighted more likely to be chosen then nodes farther away. The "distance" between a node and i is calculated with the aid of the numbers on value-type flags attached to arrows along paths connecting the node and Z If the value-type flag on the chosen node ^ does not have a high enough number with respect to k, or if p is too "far" from i with respect to k then bestnetrule(,1 ) names nil. Otherwise p is modified to form n by raising the number on the value-type flag attached to ^. In this case bestnetrule ($, ) names n Bestnetparameterrule is like bestnetrule, but different value-type flags are used to guide the choice of node. Actually the operation of bestnetparameterruZe is a bit more complicated and will be explained in more detail later, as will bestnetparametcrvalue. In eliminating patches, etc., in garbage collection, it could happen that the patched net is split into several parts not connected with one another. This creates a small problem. Ad hoc provision will have to be made so that all parts are saved. The three above functions (and other similar ones) will have to be able to look at all saved parts of the net. 3.4.3 Functions on Two Nets Now we take advantage of the fact that memory is really a single patched net and not several different nets. It is possible, for example, that, although [ and [ are two different nets, E and n are the same node of the patched net. That is, the curvy arrows for E and n point to the same node, but they have different tags. Thus we have a one to one correspondence between certain nodes in E and certain nodes in En. Similarly, if m and r are two different nets, and $ and n are not the same node of the patched net, then via such a one to one correspondence

-155there may be a node Z of E such that 2 and i are the same node of the patched net. We say = jumpback((,) holds. If no such node Z exists then jumpback(,Q) = 0holds. In other words, the curvy arrow representing jumpback(, 9) points where the curvy arrow representing n points, but is tagged as the curvy arrow representing i is, whenever such a tag makes sense. We will need some way of punishing a wrong choice in a search. Suppose we have selected a certain node Z in F] by, means of one of the search functions, raised the number T to iT on the appropriate flag r and perhaps even constructed some new nodes. Suppose the net E which is the result of all this turns out to be all wrong and we wish to return to m and choose a different node. But how are we to keep from choosing C again? We want the flag punished by lowering T so that there is less chance of picking C again. Suppose i and n are the same nodes of the patched net in memory. Then erasepunish( 9, ) names a bug value w which is just like g but with values on value-type flags altered as follows: For each value-type flag ~ in E we find the corresponding one

-156 in ]. We examine the pair (T,f) of numbers. ( f is 0 if there T2 exists no corresponding flag in mE.) Set v equal to. (Or perhaps not this exact function. We need a v equal to 4(T,f), where < is a function such that for any numbers T, X, and p: (1) T = <(T,T) (2) T >' (T,r)T > (p(T,P) whenever T < Tr < p, and (3) 0 < (T,T). ) Now the value-type flag on which corresponds to P on L is to be Note that this function can be executed very quickly since actual work is needed only on areas of the patched net where the patches differ for Si and 1. punish is just like erasepunish except we require the nets F and F to be identical except for values on value-type flags. reward( Q, ) names the same bug value as erasepunish(, ) except that with T and T defined as before and v equal to t(T,), we now require, for all numbers T, w, and; (1) T = (T,T) (2) T < (T,Tr) < ( (T,) whenever w < T < p, and (3) 0 < 4 (T,r)

-157 If some of the flags we are to be rewarding or punishing are to be attached to new derivation nodes, then they had better be added to new nodes by constructderivation. I shall not specify just which flags are to go on the new derivation nodes. Suffice it to say here that the definition I gave of constructderivation should be modified so that the new node receives the proper value-type flags with the values appropriate monotionic functions of nearby value-type flags. Similarly, the definition of join should be modified so that each of the values on the flags on the new nodes is the sum of the two corresponding values on the corresponding flags on the two old nodes which have been joined. We shall make frequent use of the term operate. If we write an operate statement such as'= operate(@), it means that in the rest of this program (i.e., LISP PROG): if a names a bug value then (a) is to mean jumpback(a, ); if atom(a) holds then (( a) is to mean a; and if a names neither a bug value nor an atom, ((a) is to mean (a(a())* 4(d(a)). We shall find this abbreviation very useful. Any Greek letter may be used on the left of the operate statement. In place of I may write any term which names a given bug value.

-158 3.5 LISP Structure of the Effector The effector may be thought of as a set of LISP programs, each describing a LISP function. The LISP functions are defined over S-expressions and bug values (remember, a bug value is really a specialized S-expression, though we are treating it as something different). These LISP functions are built up from one another in the normal LISP programming fashion. Included in the effector are the following functions: the primitive functions of the LISP 1.5 interpreter; the LISP functions analogous to the function type expressions where abbreviations are given by the definition statements in Sections 4.3, 4.6, 4.7, 4.8, and 4.9; the LISP functions described above in this section (these are defined over bug values). Other functions in the effector are built from these, until finally we arrive at certain functions, such as refineproof, which are described recursively in LISP fashion and are the so-called effector algorithms which we have mentioned earlier and which operate on the memory during machine operation. (Since the functions are defined over bug values we must guard carefully against infinite recursion. If a and 8 are bug values then the evaluation of = recurses infinitely, at least in some circumstances. I will never evaluate such a form unless it is of the form = 0.) We are now in a position to describe certain effector algorithms. They will be specified by a LISP program built up from already defined functions (But we will continue to use some of our notation from Section 2: we write * for cons and a for car, etc.,etc.) 3.6 Refining a Proof 3.6.1 The Task of refineproof We shall describe only one of the effector algorithms in some detail, namely refineproof. This algorithm is used when the machine is attempting to

-159produce a proof of an already given formula. Such a problem might arise, for example, if the user presented the machine with a supposed theorem and asked the machine to prove it. Presented with this sort of a problem, the machine makes a trial proof outline, and then attempts to alternately refine and modify the outline until a complete proof is achieved. The steps of an outline are not necessarily made via rules of inference (i.e. by theorems in rule form). Usually they are made via so-called heuristics. A heuristic may be thought of as a rough approximation to a summary of the rules used in a line of reasoning or sequence of rule of inference applications. How this is so was discussed in Section 1. Technically, a heuristic is merely a formula type expression in the form of a rule (a heuristic need not be a theorem). In refining the proof outline, a step using a heuristic is replaced by one or more steps which use rules of inference or more detailed heuristics. All these proof outlines will be constructed in the memory net by means of constructderivation. We described the refining process in Section 1. We shall give a slightly more formal description of that process here. The first task of the effector is to take the formula to be proved and construct a simple trial proof outline. The simplest would be a one-step outline. Suppose at this time, the memory holds the single unpatched net fm. The effector searches the net [] for a heuristic which will yield the required formula in one step. Such a search is similar to those we have discussed and will discuss. It is especially simple if the net contains a node n such that 9 names 6 D T(x), since this heuristic will always yield the required formula. We have but to add the broken line construction below to the net m.

-160 — p I \ I / i —. _ Figure 12. Note: the new formula node contains the formula to be proved. The antecedent node has no arrows coming into it and needs none. From this point on we proceed by successive refinements of this proof outline. The refining procedure is accomplished by refineproof. The definition of refineproof or rather an outline of its definition (along with outlines of definitions of other LISP function names we shall be using) is given in Section 4.11. Refineproof makes constant use of T-and H-tags. 3.6.2 Use of T-tags and H-tags. T-tags and H-tags appear only on formula nodes. The axioms of the system and the rules of inference which are theorems, are contained in formula nodes which are T-tagged and H-tagged. Certain initial non-theorems (formally only one is required) are H-tagged. Any other T-tagged node a must initially meet the condition that the following holds for a: 3(y) (y E derivations(a) A\ (T-tagged(ruZe(y)) v ruZe(y) = T Q( ) XT(x) ) A andZista(T-tagged, antecedents (y))). However, if this condition later ceases to be met due to erasures, the T-tag remains. Note: We specifically allow the rule to be rule of inference 19, since this is the only rule of inference which is not a theorem. By induction, only

-161 theorems may be T-tagged and hence rule (y) must name a rule of inference. Similarly any other H-tagged node a must meet the condition that the following holds for a: T-tagged(a) v 3(y) (y E derivations(a) A andlista (H-tagged, antecedents (y)))) If this condition later ceases to be met because of erasures, the H-tag is erased. An H-tag on a node means that the node is the termination of a proof outline (from theorems) via heuristics. 3.6.3 The Task of prove refineproof takes the bug value E and the number K and tries to produce a new net by performing constructions on M such that = and T-tagged( ) hold. K is a positive real number which tells how hard to try (how many unlikely possibilities to try). refineproof first checks to see if i is already T-tagged. If not, it tries to make more precise the A various derivations of E. With each derivation it looks at, it first calls itself recursively to see if it can get T-tags on the proper nodes to permit T-tagging of. If for a given derivation all this fails, it assumes the rule used was a heuristic and calls expandheuristic. This tries to replace the old heuristic-type derivation with a brand new derivation. It searches out new rules, parameter values, and antecedents to use. At worst, the search is random; at best,the old derivation will give a great deal of information as to what the new derivation should be like. Hopefully the rules needed will be nearby in the net (otherwise it was a bad heuristic, or at least refineproof is on the wrong track ). Perhaps a similar problem has been encountered before (this, of course, is usually the case for any given subproblem ). In that case there will be a record in the net that the heuristic was used to prove theorem X, and there will also be a record of how X was

-162 finally proved. It is the job of expandheuristic to find the right X. Then the effector attempts to construct a derivation of (which mimics that of X This construction is done recursively by the function prove. It uses a sequence of pairs named by the dummy variable 9. This sequence (or set) of ordered pairs may be regarded as a mapping from the X derivation being mimicked to the derivation being constructed. Corresponding meta level nodes in the two derivations are identical. All subroutines return a dotted pair ( a. S) where a is the current bug value and 8 is the current k sequence. Thus, it is the function prove which has the task of deciphering the meaning of a heuristic by first trying to mimic a previous use of the heuristic. 3.6.4 Example: A heuristic which is a composition of two rules. (See Figure 13, solid portion.) Here we see a simple example of a portion of a net illustrating use of a heuristic, namely, T(p(x)) DT(e(x)). to prove the formula named by 0( ) i.e., the formula one gets by applying e to i. This formula, as we see,was first "proved" directly from () via the heuristic. Later (that it was later rather than earlier is not obvious from the net) this single step was refined by the creation of the two step derivation via (. If this was the first use of the heuristic, the refining process must have been fairly difficult. (Locating the two rules to use would have taken place in randomprove via bestnetrule(, 9), where names our heuristic. bestnetrule would have found the right rules by tracing the derivation of the heuristic via.) However, after such a refined proof has once been achieved, then it is not too difficult to refine a second use of that heuristic in the same way. Suppose the portion of the net shown solid in Figure 13 is already constructed, and suppose the heuristic is now used to "prove" another theorem 9()

-163(See the broken-line derivation of e(Q) via the heuristic in Figure 13.) Let 83 be the node containing ( ) We can see how refineproof( %) would evaluate here if we refer to Section 4.11 and the effector function definitions (K is a number). These definitions are not complete but are merely outlines of the sort of definitions required. (E.g., stands here for the name of a number related to K, but smaller than K. I have not specified in these cases how the new number is calculated). (Some obvious improvements in the definitions come immediately to mind, such as diagonalization of sets of searches that are here handled consecutively.) I have made in these definitions one violent simplifying assumption: that each rule contains only one free variable. This is ridiculous, of course, but allows the definitions to be written much more simply since the searches foi parameter values are straightforward searches rather than complicated diagonal ones. The simplification is made only to make the principles of search easier to see, and could not be made in a real machine. In a real machine, these searches would be diagonalized in one of the obvious ways. Happily, the simplifying assumption just happens to be true for the example of Figure 13. The function parameters is changed to parameter and returns a pair instead of a sequence of pairs. Hence we can use the definitions in Section 4.11 to clarify the evaluation of refineproof(j, ).

II II I II II II II II II II II CD -e a / --- -- CD -e- o -e- e- e- < U U U 0 O M Ixx x x4 I ^< \-/ ~ ~ ~ ~ ~ ~ \-' ~ ~ ~ ~ ~ ~ / I-.. I*;: 4....,'. 0~~ 4 I I I __> IT - I tSD ___ r -— I __I I I I' t / X \ I I I I.. t

-165 * —-- 0000 *@ Si fb IO 00 @00oo00. oo o 1 l.LI0000 0 o v 0 0o 0 1 ~<~~a o a o 0 I 0000000 0 0 0 0 0 r- I& —- - I I | 2:. — r. — I.. X:. 0 -T X \ Figure 14. The following hold: (= Q ) = X (= 6o() (Ri" v, ^^J V)

-166 - The evaluation proceeds through the following stages (assume 81 and 83 are H-tagged; nets to be described by bug values are indicated periodically; the bug values in (1), (2), and (3) describe Figure 13 -including broken line and dotted line portions): (1) Refineproof (^ ( ) (2) Expandheuristic( () ) (3) Refinebyexample( Q, (, (), QQ ) Now the effector makes erasures to get Figure 14, solid portion. Bug values below describe the solid and dotted (....) line portions of Figure 14. (4) Prove (, (I{ (, a), 1 9 9') ( ) The third argument is a sequence, but we abbreviate it as a set so that we need write repeated members only once.) (5) Parametergenerate ( ( {^ v (C'' 3~/ }') a V ) (giving w:= ( in the parametergenerate program; see definition of parametergenerate.) (Assume: result(Q, x, ) = holds.) (6) Parametercheckgen (Q, ) I, [ ~, P(j I a l J, Now the effector constructs the broken line ( —— ) part of Figure 14. Bug values below describe the solid, dotted and broken line portions of Figure 14. ( requiredantecedents (, (x9, r ) ( ) holds ) (7) CompZetederiv( (, ( a 9 a, (1y ),a Y) } d, =C) (define A =: 2a r, 3 9 a l2' ("' 22 X ) (8) Prove C(I,(, o, +))

-167 (9) Parametergenerate (, (,,, () (giving w = x, r ) in the parametergenerate program.) (10) Parametercheckgens (P,(X, (, ), I (, (, P7, I)) The effector now instructs the circled line (oooe,) part of Figure 14. Bug values below describe all portions of Figure 14. (define Q =: {( 3', (1, B, ((6, )), (11) CompZetederiv( (,Q, (, ++(Q) (12) Prove (B( % ),(, ++ ) This time the effector executes join(, ) to combine two nodes of Figure 14. Since 1 has an H-tag, so does the joined node and, defining:= operate(join(,4 )), prove returns the constant naming (() * f(() ) Undoing the recursions, all the H-tagged(a(C)) occurrences hold because completederiv H-tags B2 and 83 in turn. Prove finally gives an answer of form B (for some sequence _) where an H-tag is on 3 and the net 3 constructions we have mentioned have been made. This is returned to refinebyexample which restores the erased connections and calls refineproof(, + ) where t is the new with the new constructions. The original problem (refining the a step) has been broken into two subproblems (refining the P2 step and refining the l1 step). 3.6.5 Less Trivial Situations In the example, we had an entire simple model laid out before us to mimic. When this is not the case, or when the right model is buried among

-168 incorrect models, the task is harder and the chances for error greater. Such cases require random searches through the net to find the required formula. Such search algorithms can be sophisticated or simple. We have only indicated where they are used. We have used them in sequential rather than diagonal manner, since we are not trying to set up the ultimate in search algorithms here but only indicate the principles by which they operate. The random search algorithms are, Rcheck, overallcheck, and randomprove. Each searches for a different kind of formula. Details are discussed below. 3.6.6 parametertreegenerate Even when an entire model is available, and no random net searches are required, the situation can be rather complicated. Good heuristics, however, give us simple situations. In the previous example we had to break up a single step heuristic derivation into a two-step one. Sometimes one must break a single step into three steps, but a good heuristic would be in the net in such a way that this process is done in two refineproof stages, first breaking the single step into two, and then breaking one of these into two more. In general, the complicated entire model situations are handled just like the example. There is one situation, however, which is common, but did not arise in the example. When the effector arrived at parametergenerate it always found that ad(v) e proJl1() held, for the current values of dummy variables v and Z. One can't always count on this. Suppose the model were as in Figure 15. Here ad(v) names y. y is not in projl(Z), but it is a derivative of a member of prol (), the derivation being via rule. We call such a rule a parameter rule because it generates a parameter value.

-169Such rules are easy to recognize since they are all of form (T(x1) ^ T(x2) A... A T(xn)) D T((x1, x2,..., x)) for some individual function T. Hence it is trivial to follow a model back through one of these rules and mimic it in the new proof. This is the job of parametertreegenerate. etacheck then tries to use the generated parameter value for the job it is supposed to do. Of course, our definitions are simplified since we are assuming the rule contains only one free variable, only one parameter.

II 1 I I1 II II II I I II -e-ac - -- - 0 f 3 ~-3 c3 > 3 I -' ) t' 1 X X x y o'- U U > U H - (D -.

-171It is important to distinguish parameter rules from other rules since they perform quite a different function and are really only individual function type expressions written in rule form. We make the simplifying assumption that it will never be necessary to search through a cascaded series of parameter rules to get to the member of proj1( ) we want. In general, this makes a lot of sense because we can automatically arrange our construction rules to collapse down a cascaded series into one rule, the process being trivial, recursive, and reversible. The result merely shows composed individual functions to the right of the implication sign. I have not taken the trouble to arrange for the collapsing, but I have arranged the random search functions (which generate these rules) in such a way that cascades do not develop. The parameter rules are treated like individual functions, not like real rules. 3.6.7 Suppose the Model Fails at Some Point Suppose the effector is looking for a formula to fit in the proof, and none of the corresponding parts of the model proof lead it to members of proj1l(). ( having the current value for the dummy variable 9.) The model has failed at this point. Before the effector rejects the whole model, it might be a good idea for it to take a look at nearby formulae to see if they are what it is looking for. When no model exists, the whole proof attempt must consist of this sort of search. The situation is not as bad as it seems. In Figure 13, for example, even if none of the solid lines existed (i.e., if there was no model) a search of the net near a would quickly bring the effector to the correct rules of inference. The random search, then, begins at an appropriate node or nodes and searches nearby nodes first. It should search first those nearby nodes with high values on value-type flags. (Each random search algorithm looks at a

-172different set of flags.) The algorithm looks at a number parameter K, which tells it, in effect, how thoroughly to search. By the rewarding and punishing routine, the search avoids repeating a trial too often. (We can fix things so it never repeats a trial if we wish.) The algorithm carries along the Z sequence. The algorithm will be constructing new nodes and hooking them onto old ones. If these old ones are in proj2(') then this is good to know because at that point the effector can stop its random search and go back to the model. (I have indicated this in the algorithms by a recursion back to prove and hence to checklistprove, but I have indicated it only for antecedents.) Theoretically, given a high enough K, these random searches could search the whole net. But the net could easily have become split in two parts by erasure of certain nodes. We must keep track of both halves, not only to prevent one part from being garbage collected, but to allow the random search algorithms to search both halves. I have indicated three random-search algorithms. Type of Formula Sought Algorithm Name Selecting Function Used normal rule randomprove bestnetrule parameter rule overallcheck bestnetparameterrule parameter value Rcheck bestnetparametervalue The first is more general and incorporates the other two. It can be used when there is no model at all. The selecting functions are like bestlist, but instead of selecting nodes from a sequence they select them from the whole net, weighting them as we discussed above. The first argument of a selecting function names a bug value (or list of bug values in bestnetparametervalue) designating a node (or nodes) around which the effector will concentrate its search. The second argument names the number parameter.

-173 As we mentioned, bestnetparameterrule is rigged to discourage cascading. If it doesn't immediately find what it wants, a nearby parameter rule with high value, it constructs one. Remember that the rule is just an individual function. Usually the required individual function will be buried in a nearby node (e.g., see Figure 8 where the function I, for, is buried in ). Bestnetparameterrule actually searches through selected rules for the proper individual function and constructs the required parameter rule via the obvious meta rule. It returns a bug value designating the constructed node. Since the effector will never want to T-tag the rule, this odd generation causes no problems. If this sort of search fails, bestnetparameterrule searches simultaneously (and diagonally) for several individual functions (the search for each being like the above search) and composes them to create a variable rule. Unless K is rather high, this search will be quickly given up. 3.7 General Considerations A more detailed discussion of the alogrithms here presented would be pointless since their definitions are only a simplified outline of one possible approach to effector interpretation of heuristics. Much more sophisticated algorithms are possible. The more sophisticated the effector algorithms, the simpler the heuristics and memory net structure need be. Simple algorithms imply complicated models with. many intermediate stages in proof-refining between the original heuristic proof outline and the final rule of inference proof. Since we are interested in adaptation of the net, we presumably would like rather simple effector algorithms (i.e., we don't want to program too many techniques into the machine; we want the machine to discover them for itself), but not as simple as the algorithms presented here. The prove algorithm presented here does exhibit an important property shared with its more complicated relatives. We described this property in Section

-174 1.6.3.2. In terms of our notation here we can summarize the property by saying that prove never looks at T-tags, only at H-tags. 3.8 Conclusion: Discussion of Adaptation. Notice that by the reward and punishment scheme not only is the effector able to make random searches without always repeating itself, but when it has completed a proof (if we write the algorithms correctly) it will have rewarded (or punished if it was unsuccessful) just those nodes we would want it to try (or avoid) next time; those strategies which work well get higher reward and are more likely to be tried in the future. The exact reward for each of the rewarded nodes can be modified by a multiplier (the same multiplier for each node) set by the user on the basis of how good a proof was produced. Other effector algorithms can then help further distribute the reward. The idea, of course, is to erase, from time to time, very poorly rewarded nodes. Thus the net adapts slightly with each problem it solves or fails to solve.

-175 There is a special effector algorithm which, after completion of a "to prove" problem, makes random applications of rules in the net to formulae in the net, thus generating new formula nodes. We shall call this algorithm the random generation algorithm. This algorithm makes a random choice of a formula node containing a rule (weighting as more likely to be chosen those nodes which have accumulated more reward). It then makes a random choice of formula nodes to apply the rule to (weighting as more likely to be chosen those nodes which, have accumulated more reward and are closer to the rule chosen). The algorithm attempts to apply the chosen rule to the chosen formulae. If the application is successful, the algorithm adds to the net a formula node containing the resulting formula and a derivation node whose connections show how the new formula was derived. Thus the random generation algorithm is very simple. Nevertheless it allows the machine great flexibility in adaptation. By its operation, formulae are added to the net in areas which the machine itself has determined (via its reward distribution algorithms) are important areas. The formula nodes created are not for use in the solution of a current problem, but are rather for use in the solution of future problems which are similar to those problems in whose solution the formula nodes chosen by the algorithm participated. One way in which these new formulae might be used in future problems would be as rules. If the new formula generated is in the form of a rule, it can easily be employed as a rule in future problems. If such a new formula was generated from theorems by a rule of inference, it can be used as a new "derived" rule of inference. If not, and this is the more common case, it can still be used as a heuristic. In Section 1.7 we discussed the generation of new heuristics by the machine. It is usually through the random generation algorithm that such a generation

-176 takes place. Figure 8 diagrammed the result of such a generation. We can now see that the derivation diagrammed in Figure 8 is no different from any step in a proof outline. It simply happens that the formula generated is in the form of a rule. We see now that the rule generated (Heuristic 1 in Figure 8) may be either closely related or quite unrelated to the rules (Rule 1 in Figure 8) and to other formulae (the * in Figure 8) from which it was generated; the degree and nature of the relationship is determined by the rule which was used as a rule in the derivation. (The node containing this rule is not shown in Figure 8. It is the node from which emanates the lower of the two thick arrows in Figure 8.) We can call this last rule a meta rule since it was used to generate a rule. A rule which is used to generate a meta rule can be called a meta meta rule. Thus we can think of a rule as being used at a particular level: the object level, the meta level, the meta meta level, or a still higher level. Now a single rule may be used successively at any number of different levels. Thus, it is not the rule which is at a particular level, but only a particular use of it. Of course, there will be some rules useful at a particular level and useless at other levels. This specialization is essential to proper adaptation. Efficient generation of new rules at any particular level depends on the presence of rules useful at the next higher level. As a corpus of rules is developed, useful at a particular level Z, the generation of rules at the next lower level (level k - 1) becomes efficient. Until this corpus is developed (through reward of rules at level. which produce rules rewarded at level 9 - 1) the generation of rules at level 9 - 1 is inefficient. Note that though the generation of rules at level. - 1 is inefficient to begin with, it still takes place, since any rule in the net can be used at level 9. (In fact, we suspect that if level 9 is a very high level, any rule useful at level 9 - 1 is useful at level 9.) Thus, the net is never in a position of having no rules to use at a given level, since

-177 any rule can be used at any level. It is only after a good deal of adaptation has taken place that rules are developed which are specialized for use at a particular higher level. Rules specialized for use at the lower levels will appear first (at the object level almost immediately)and only later will rules specialized for use at the higher levels appear. Thus a hierarchy of rules is gradually built up. (We have been talking as though, given a particular use of a rule, it were always possible to tell at what level the use occurred. This is, of course, not always the case, especially before the hierarchy is built up. Suppose Rule A is used to generate Rule B. Then suppose Rule B is used once at the meta level and once at the meta meta level. At what level was Rule A used when it was used to generate Rule B? However, it is useful to think of the use of a rule as if it occurred at a particular level.) Now a generated rule may be either a heuristic or a rule of inference. If it was a heuristic, its "proof" must have been only a proof outline (it can never be fully refined). Even if the generated rule was a rule of inference (and hence a theorem) its "proof" might not be fully refined. (This was almost certainly the case when the rule was first generated.) In such a case, the rule is a rule of inference, but, since its node does not have a T-tag, the effector does not know that it is anything more than a heuristic. If it is a useful heuristic, however, the effector will spend time trying to refine its proof. This is because the rule will accumulate reward and thus there will be many attempts to use it. At each attempt, the effector makes another try at refining the rule's proof outline. (For the object level, see the command to calculate p in the refineproof routine, Section 4.11.2. For the meta level and higher levels we need other effector algorithms.)

-178By allowing any given rule to be used on any one level or several different levels, we can start the machine with an especially simple memory —with a single corpus of rules to be used on all levels —and then allow the corpus of rules to specialize. (Alternatively, we could have restricted each rule to a given level, beginning each level with the same original corpus of rules. This plan, however, does not allow a new rule useful at one level to be tried out at another level.) Our general procedure for generating new rules is what permits the development of a hierarchy of rules. That rules generated may be not only heuristics, but even rules of inference, depends on the procedure (discussed in Section 2.2.1.2) for adding new "derived" rules of inference to the axiomatic system. (The procedure for generating heuristics is merely a generalization of the procedure for generating "derived" rules of inference.) This procedure in turn depends on both the procedure for generating algorithmic names of functions and on the procedure fbr using Rules 18 and 19 to shift a theorem from one level to another in our language. All these procedures were explained in Section 2 and it was shown in Section 2.2.1, that addition of these procedures to our axiomatic system does not destroy consistency. By showing, in Section 2, the existence of a consistent self-describing system which incorporates these procedures, we have shown that there exists, an axiomatic system which may be used as a basis for the sorts of adaptive theorem proving machines discussed in Sections 1 and 3. The complete specification of one such adaptive theorem prover is not possible until various reward plans have been investigated and compared. We are only beginning this investigation and it promises to be a long one. Our initial work will be on the so-called "reproductive plans" which Dr. Holland has investigated (see description in [Holland, 1969]).

-179 3.9 Postscript: Other Object Theories Formally, the machine proves theorems in a particular axiomatic system. We can cause it to prove theorems in another system simply by adding that system's axioms and rules of inference tD the memory net in the following way. We define a predicate Sys in the same way we defined T except that the definition uses the axioms and rules of inference of the new system instead of the old system. Then Sys is true on theorems in the new system. If we want the machine to prove a theorem a of the new system, we simply ask the machine to prove Sys( ) The old system is present, then, "overseeing" the new system and operating at the meta level to generate new rules of inference and heuristics for the new system. The new system, then, is "subordinate" to the old. There is no counterpart to Rule 19 for Sys so the new system stays at the object level, as it were. Rules of inference for the new system are theorems of the old system, not theorems of the new system. By the above method we can have the machine prove theorems in trigonometric identities, group theory, or propositional calculus. In the case of propositional calculus, the new system is a subsystem of the old system, so for any a, Sys( ) D T( ) holds. As long as all the special propositional calculus rules of inference require the generated theorem to be a formula of propositional calculus (a decidable question and thus easy to add as a condition, see below), we can have the machine use T instead of Sys throughout. Let us consider the Newell, Shaw, Simon [Newell, Shaw, and Simon, 1961] formulation for propositional calculus. (Ignore for now their so-called abstract operators.) They formulate problems in terms of transforming a

-180propositional calculus formula a into 8 by means of (reversible) legal transformations. These transformations, or operators, we may write as individual functions opl, op2,..., op. Since our machine works with theorems, we shall attempt to transform the propositional calculustheorem a E B into theorem a = a by means of applying the opj's to the right hand side. To decide what operation to use, Newell, Shaw, and Simon have a set of difference tests which we can regard as a set of binary predicates D1, D2,... D Let us make the following definitions. D (x,y) = (D (x,y) v... v D (x,y)) prop(a) holds iff a names a well formed formula of propositional calculus. diff(x,y) = [ prop(x) A D(x,y) ) y); atom(x) 0; diff(a(x), a(y)) 0 ~ diff(a(x), a(y)) 1 - diff(d(x), d(y)) ] There are many ways of embedding a fixed scheme such as Newell, Shaw, and Simons in our system. One is to code it all into one massive rule of inference. This would only be interesting if the machine then dissected it into a set of smaller rules. Let's begin with such a set of smaller rules. We shall ignore, in this illustration, their three rules which use two antecedents. Since the second antecedent is found by random search, the models for these rules are essentially the same as for other rules. (Eg., see Fig. 5, Sec. 1.) (The ( then become everywhere.) (If these three rules are also to operate on subexpressions, further modification is necessary.)

-181The basic rules of inference are, for each j such that 1 < j < n pj (prop(x) A prop(y) A T(x opj(y))) D T(x)y). In addition we need ~ (prop(x) A x = y) D T(x y). Associated heuristics will be V a(z) = D T(z) (like e D T(x) in the old system) W D T(x Qy) iT Di(x,y) D T(x y) (one for each i such that 0 <i< m) Tij (Di(xy) 1 A i.0X opj (Y))) T(x ( y) (one for each i, j, such that 1 <i< m and 1 < j < n ) When Do (Q, ) holds, Newell, Shaw, and Simon stop trying to transform the 8 into a directly by the opj's, and begin trying to similarly transform its sub-expressions. We can generate such subgoals by properly employing the following rule of inference a (prop(x) A prop(y) A T(diff(x, y))A T(S(add(diff(x,y)),ad(diff(x,y)),x) (S(add(diff(x,y)), ad(diff(x,y)),y))) D T(x y) These rules (and no axioms) are sufficient for our machine to essentially simulate the Newell, Shaw, and Simon techniques. The simulation will be most efficient if the net contains various "models" of the sort we discussed above. The models in Figures NSS1-NSS6 are sufficient to force the machine to follow a probabilistic Newell, Shaw, and Simon algorithm. (In these figures, the symbol in a node indicates the node's contents, not its name.) Adaptation causes changes in the value-type flags and hence changes in strengths of connection between the Di's and op.'s. (Of course, we can prevent such adaptation by simply returning the net to its original state after each problem.) In these models the rules frequently have two free variables so the simplified refineproof algorithm of the last example won't work here. There is no essential difference, however, between the model-following technique of this example and that of the last.

-182 One model: Figure NSS 1. A model for each ri., even for i = 0::1 Figure NSS 2.

-183 A model for each T. 13 (i / 0): Figure NSS 3. A model for each T.. (i i 0): 13J Figure NSS 4.

-184 There are two models for T. o Here is one: Figure NSS 5. Here is the second model for r: o Figure NSS 6.

-185 Thus a simple net may be constructed which simulates the Newell, Shaw, and Simon system, and this construction does not need most of the axioms and rules of the old system. If these are added, however, then we have the possibility of adding new rules of inference and heuristics. Newell, Shaw, and Simon point out that if for many a's, H(,(opj ( )) holds, then T is a good candidate for a new D.. If we have our old system in the net, the machine can prove the statements of form H(x,opj(x)). The following meta rule will convert such a statement into a rule like the Tij's: 13 (operator(y) A T( (fx ( y, )) D T( t) x x y,y)) where we define operator(y) = T ( 3 Qy )) y T (i.e. y doesn't have to be one of the original operators; it can be a new one we've generated). For practice one would want H(x,opj(x)) to be a theorem for more than one j. It need not, however,hold for all x, but perhaps only for a subset of all x. If D defines the subset then instead of n(x,op,(x)) we would perhaps only require that (x) D ( r (x, opj (x))A l(x, oPk(x))) be a theorem. Much less ambitious than this would be simply adding meta rules to combine the Di's and op.'s in various ways to derive new D.'s and opj's. Also, we can introduce abstract operators. These can be simply introduced as intermediate rules between the Hi's and pj's. Some care must be taken, however, in setting up the models. The power of the abstract operators is that one can't H-tag the formula one is trying to prove until one has a complete derivation via abstract operators. Hence the machine must not call in u or X in the models because they give a false H-tag. (This is a general problem, not limited to

-186the Newell, Shaw, and Simon net.) This can be fixed by more complicated modeling and by low values on nodes containing u and w so bestnetrute won't find them. This is an example of the sort of difficulty that could be solved by diagonalization of refineproof so that instead of always refining the last illegitimate step of a proof outline, it refined the most illegitimate step, as measured by value-type flags on the heuristics.

4. TABLES 4.1 Table 1. Alphabet p,q,r,s,p,... x,y z,u,v,w,xt,... f,g,h,f,... P,Q,R,P,,... (propositional variables) (individual variables) (individual function variable bases) (predicate function variable bases) a, ~ nil D,=,pv,Iv,Ifvb,Pfvb *,newpv,newiv,newifvb,newpfvb cond,pcond qu v, 3,x A1!,, label (listbinders) -187

-188 4.2 Table 2. Basic Recursive Functions The basic function expressions which are complete recursing by vitue of special evaluation procedures. (Any machine employing our system would have these procedures stored in it in a manner similar to the way LISP SUBR's are stored.) function expressions D Pv Iv Pfvb Ifvb * a a"defined" in d ) Table 3 newpv newiv newpfvb newifvb type (Pfvb,Pv,Pv) (Pfvb, Iv, Iv) (Pfvb,Iv) (Pfvb,Iv) (Pfvb,Iv) (Pfvb,Iv) (Ifvb,Iv,Iv) (Ifvb,Iv) (Ifvb,Iv) (Ifvb,Iv) (Ifvb,Iv) (Ifvb,Iv) (Ifvb,Iv) Meanings of the Predicate expressions: formula a D 36 holds a = 3 a and 3 Pv(a) a names Iv(a) a names Pfvb(a) a names Ifvb(a) a names holds if and only if or a does not hold name the same S-expression a propositional variable an individual variable a Predicate function variable base an individual function variable base

-189Meanings of the individual function expressions: term names'a*g the cons of' and T a(a) the car of ~1 d(a) the cdr of' newpv(a) the first prepositional variable of index > those in' newiv(a) the first individual variable of index > those in i newpfvb(a) the first predicate function variable base of index > those in 5 newifvb(a) the first individual function variable base of index > those in'E

-1904.3 Table 3. Defined Complete, Recursive Functions of a General Nature Complete recursing function expressions: ~p =: p D p v q =: -pDq p A q =: ~(pD-q) p - q =: (pDq) ^ (qDp) listbinder(x) =: x =()v =(v x =(v x =( v x = (Q) predatom(x) = x = ( v x = v x = () v x = ( newfatom(x) =: x = newpf x = v x = ewifvb Pfatom(x) = Predatom(x)v x = v x = ( Ifatom(x) =' newfatom(x) v x = nonvaratom(x) =: x =)v x = v x = (c-jv x = pc v x = ( v Zistbinder(x) v Pfatom(x) v Ifatom(x) v x = v x = 0 atom(x) =: Pv(x) v Iv(x) v Pfvb(x) v Ifvb(x) v nonvaratom(x) a(x) =i (y) ((atom(x) A y = x) v ( 3(z) y*z = x)) d(x) = i(y) ((atom(x) A y = x) v ( (z) z*y = x)) x ~ y = ~x = y last(x) = [atom(x) - x; -lZast(d(x))] length(x) =: [atom(x) - 0; +()*length(d(x))] andZistcar(PIvX) =: atom(x) v (PIv(a(x)) A andlistcar(PI,,d(x))) andZista(PIv,x) =: andlistcar(PIv,X) A Zast(x) = 0 andZistlistcar(P IVI,x,y) =:x = 0 v (~atom(x) A ~atom(y) A P v (a(x),a(y)) A andlitstistcar(PIviv,d(x),d(y))) andZistlista(P IIV vx,y) =. andlistlistcar(P II, x,y) ^ length(x) = length(y) A Zast(x) = 0 A Zast(y) = 0 ast(x) =: [atom(d(x)) -+ a(x); ~ -+ ast(d(x))]

-191 maplistcar(f,x)) =: [atom(x) + x; @ -> fIv(a(x))*maplistcar(fIv d(x))] contains(P,x,y) =: x = y v (atom (y) A ~P(y) A (contains(P,x,a(y)) v contains(P,x,d(y))) Subst(P,x,y,z) = [x = z + y; atom(z) v P(z) +- z; + mapZistcar((X(u) Subst (P,x,y,u)),z)] reverseconc(x,y) = [atom(x) + y; 0 +- reverseconc(d(x),a(x)*y)] orZistcar(P Iv x) =iandZistcar((X(x)P Iv (x)),x) x e y =:orlistcar((X(u)u = x),y) typep(x) =x = v x = v ((a(x) = v a(x) = (Q) A andZista(typep,d(x))) typeZist (x) =- andZista(typep,x) Pfv(x) = Pfvb(a(x)) A typeZist(d(x)) Ifv(x) = Ifvb(a(x)) A typelist(d(x)) variable(x) = Pv(x) v Iv(x) v Pfv(x) v Ifv(x) varlist(x) =' andlista (variable,x) type(x) = [Pv(x) +; Iv(x )+ P; Pfv(x) - )*d(x); Ifv(x) -+ *d(x); ~ + 0] newvart(x,y) =: [x = + newpv(y); x = t) newiv(y); atom(x) - 0; a(x) = + newpfvb(y)*d(x); a(x) = + newifvb(y)*d(x); + + 0]

-192 exprtype(x) = [x = v x = -p; predatom(x) v newfatom(x) + Pf; X = v f; x =0 ); X = + Pv)P variable(x) + type(x); atom(x) + 0; a(x) = I( (; a(x) =a(x) a (x) = v a(x) = a(x) = (6)v a(x) = (); a(x) =(: + [exprtype(add(x)) = ) - ( * mapZistcar(type, ad(x)); exprtype(add(x)) = v f * maplistcar(type, ad(x)); +- 0 ]; a(x) = ) type(ad(x)) a (exprtype(a(x))) = (); a (exprtype(a (x))) = ~ - 0 ] args(x) = d(exprtype(x)) newvarex(x, y) = newvart (exprtype (x), y) mol(x) =:atom(x) v a(x) = v variable(x) x < y = contains(moZ, x,y) xa y=: x y A X ~ y x. y = contains((X(y) (moZ(y) v (listbinder(a(y)) A x e ad(y)) v (a(y) = x = ad(Y)))), x,y) x y y = x, y A x y x. y = [ x y + 0; a(x)= + (; + x a(Y) v x d(y) ]

-193 S (x,y,z) =:Subst.(moZ,x,y,z) Sf(x,y,z) =:Subst((X(u) ( moZ(u) v (Zistbinder(a(u)) v x e ad(u)) v (a(u) = ( x == ad(u)))), x,y,z) Snf(x,y,z) = Subst((X(u) (mol(u)v a(u) =Qv a(u) =)) x,y,z) SsZ(x,y,z) = [atom(x) + z; 6 + S (newvarex(a(x), (x*(y*z))), a (y), SsZ (d(x), d(y), S (a(x), newarex (a (x), (x* (y*z))) ),)))] SsfZ(x,y,z) = [atom(x) -+ z; +Sf(newvarex(a(x), (x*(y*z))), a(y), SsfZ(d(x), d(y), Sf(a(x), newvarex (a (x), (x*(y*z))), z))) ) freecheck(x,y,z) = [mol(z) v x z z; Zistbinder (a (z))- andZista ( (X(u) ( u u y)), ad(z)) freecheck(x,y,add(z)); a(z) = (a + ad(z) - y A freecheck(x,y, add(z)); ~ - freecheck(x,y, a(z)) A freecheck(x,y,d(z)) ] formsimp(Piv,x) = [ atom(x)+ Pv(x) v x = v x =; a(x) = (p -+a( ast(x)) = ^ andlista((X(y) (formsimp(P I, a(y)) ^formsimp(PIv,ad(y)) A dd(y) = 0)), d(x)) a(x) = A a(x) = Aa(x) ( = ((x) dad(x) = 0) A andZista(Iv,ad(x)) A formsimp(PIv, add(x))Addd(x) = 0; + -* (Pfatom(a(x)) v Pfv(a(x))) A andZistZista((X(u,v) u = exprtype(v)), args(a(x)),d(x)) A andZista(PIv, d(x)) ]

-194 termsimp(P I,x) =: [atom(x) + Iv(x); a(x) = A^ dd(x) = 0 + ~; a(x) = (3a ( ast(x)) =)A andlista((X(y) (formsimp(PI,a(y)) A termsimp(P Iad(y)) A dd(y) = 0)), d(x)); a(x) = IQ Iv(aad(x)) A formsimp(PIv, add(x)) A dad(x) = 0 A ddd(x) = 0 + -* (Ifatom(a(x)) v Ifv(a(x))) A andlistlista((X(u,v) u = exprtype(v)), args(a(x)),d(x)) A andZista(PIv, d(x)) ] Simplexpr(x) = Pfatom(x) v Pfv(x) v Ifatom(x) v Ifv(x) v formsimp(Simplexpr, x) v termsinmp(Simplexpr, x) Simpleformula (x) = formsimp (S'impZexpr, x) Simpteterm(x) =: termsimp (Simptexpr, x) formtofunctiontp (P,x) = Pfatom(x) v Pfv(x) v Ifatom(x) v Ifv(x) v (a(x) =() varlist(ad(x)) A P(add(x)) ^ (exrtpe(add(x)v eprtype(add(x)) = prtype(add(x)) = ) ddd(x) = ) v (a(x) = (b A (Pfv(ad(x)) v Ifv(ad(x))) A type(ad(x))=exprtype(add(x)) A aadd(x) = ^ varlist(adadd(x)) A P(addadd(x)) A (exprtype(addadd(x)) = v exprtype(addadd(x) = 9) A ad(x) ^ addadd(x) A ad(x) Snf(ad(x), 0,addadd(x)) A ddd(x) = 0 A dddadd(x) = 0 )

-195 expression (x) = formtofunctiontp (expression, x) A [atom (x) - Pv(x) v x = v x = vIv (x); a(x) = A dd(x) = 0; a(x) = ( ) -c a ast(x)) = andZista ( ((y) (expression(a(y)) A expression(ad (y)) A dd(y) = 0 A exprtype(a(y)) = A exprtype(ad(y)) = )),d(x)); a(x) = ( a( ast(x)) =)^andZista((h(y) (expression(a(y)) A expression(ad(y)) A dd(y) = 0 A exprtype(a(y)) = (P A exprtype(ad(y)) = )),d(x)); a(x)=:ava(x) =) v a(x a andZista(Iv,ad(x)) A epression(add(x)) A(a(x) = D) O dad(x) = 0) A ddd(x) = 0 A exprtype(add(x)) = (; a(x) =C- Iv(add(x)) A expression(add(x)) A dad(x) = 0 A ddd(x) = 0 A exprtype(add(x)) = ();0 formtofunctiontp (expression, a (x)) A andZistlista(( (u,v) u = exprtype(v)), args (a (x)), d (x)) A andZis ta (expression,d(x)) ]

-1964.4 Table 4. Axioms Predicate calculus axioms I. p D (q D p) 2. (s D (p D q)) D ((s D p) D (s D q)) 3. ((p D ) D ) D p 4. V (x) (pD P(x)) D (p D V(x) P(x)) 5. V (x) P(x) D P(x)

-197 6. reflexivity of = x = x 7. replaceability of = and a. x = y D (P(x) D P(y)) b. (p - q) D (P(p) D P(q)) 8. Peano axiom 3 -atom(x*y) 9. Peano axiom 4 x*u = y*v D (x = y A u = v) 10. Peanoaxiom 5, induction ( V(x) (atom(x) D P(x))A V(x,y) ((P(x) A P(y)) D P(x*y))) D V(x) P(x) 11. E definition 0 - (~ ~0) 12. 3 definition 0(x) P(x) - V(x) P(x) 13. 3! definition 3!(x) P(x) - ( 3(x) P(x) A V(x,y) ((P(x) A P(y)) Dx = y)) 14. i definition 3! (x) P(x) D P(I(x) P(x)) 15. Disjointness of atom classes a. (Iv(x) v Pfvb(x) v Ifvb(x) v nonvaratom(x)) D ~Pv(x) b. (Pv(x) v Pfvb(x) v Ifvb(x) v nonvaratom(x)) D -Iv(x) c. (Pv(x) v Iv(x) v Ifvb(x) v nonvaratom(x)) D - Pfvb(x) d. (Pv(x) v Iv(x) v Pfvb(x) v nonvaratom(x)) D -Ifvb(x)

-198 16. New variables are variables a. Pv(newpv(x)) b. Iv(newiv(x)) c. Pfvb (newpfvb(x)) d. Ifvb (newifvb (x)) 17. New variables are new a. -newpv(x) 9 x b. ~newiv(x) 4 x c. ~newpfvb(x) Q x d. ~newifvb(x) Q x 18. Generatability of certain functions.(See Table 6 for definition of Pfstep.) 0 D Pfstep(x)

-199 4.5 Table 5. Rules of Inference Rules of inference: (T is defined in Table 7) 1. Modus Ponens (T(x y) A T(x)) DT(y) 2. Generalization (T(y) A Iv(x) ^ - x ) y) D T( x y ) 3. Change of bound variable (T(y) A variable(x) A type(x) = type(z)A~x u A^ z.. u A S(u,S(x,z,u),y) = S(u,S(x,z,u),v)) D T(v) 4. Substitution of simple expression for variable (T(y) A variable (x) A SimpZexpr(z) A type(x) = exprtype(z) A freecheck(x,z,y)) D T(Sf(x,z,y)) 5. Substitution of function expression for function variable (T(y) A T(u) A z-Q u A (Pfv(x) v Ifv(x))A type(x) = exprtype(z) A freecheck(x,z,y)) D T(Sf(x,z,y)) 6. Application of a X expression to arguments (where the X expression is not inside another function expression). (T(u) A expression(v) A aa(y) = A andZistZista((X(x,z)freecheck(x,z, adda(y))),ada(y),d(y)) A Snf(y,SsfZ(ada(y),d(y),adda(y)),u)= Snf(y,SsfZ(ada(y),d(y),adda(y)),v) A (y u v (andZista((X(z) ~ z adda(y)),ada(y)) A andZistZista((X(x,z) a V z D x ^ adda(y)),ada(y),d(y))))) T(v)

-200 7. Function recursion (T(u) A expression(v) A a(y) = la ^ ad(y) ^ add(y) ^ freecheck(ad(y),y,add(y)) A Snf(y,Sf(ad(y),y,add(y),u) = Snf(y,Sf(ad(y),y,add(y)),v)) D T(v) 8. pcond rule (Simpleformula(y) A Pv(w) A andZista((X(x) (SimpleformuZa(x) A a(x) = c ^ (x) = w)),z) ^ u = SsZ(z,maplistcar(adad,z),y) A v = SsZ(z,mapZistcar((X(x) (pcon*dd(x))),z),y) A ~ w u A w Q v) D T (y w u w( v() ) 9. cond rule (SinplZeformuta(y) A Pv(w) A andlista((X(x) (Simpteterm(x) A a(x) = c) A aad(x) = w)),z) A u = SsZ(z,maplistcar( adad,z),y) A v = Ssl(z,maplistcar((W(x) ((cn *dd(x))),z),y) A ^- w Q u A - w Q v) D T(y (43 w Gu A w w v()) 10. pcond initiation (Simple formula(y) A Simpleformula(z) A S(z, (pcond ) z,u) = S(z, (pcond, z,y)) D T(y u) 11. cond initiation (SimpleformuZa(y) A Simpleterm(z) A S(Z, conC z,u) S(z,,cond, (, z,y)) D T(y u)

-201 12. Listbinder notation (T(u) A SimrpZeformuta(x) A (a(x) = (v a(x) = ) A dad(x) 0 A nf(x, (a(x) Qaad(x) a(ax) x dad(x) Q add(x),u) = Snf(x, (a(x) ()aad(x) (Q )a(x) 9dad(x) padd(x) $,v)) D T(v) 13. Definition of qu T( ( y* quy (x*y)Q) 14. Variables are variables a. Pv(x) D T(v( x0)) b. Iv(x) D T( x() ) c. Pfvb(x) D T( (Pfvb((qu ) d. Ifvb(x) D T ( x() 15. Different atoms are unequal (atom(x) A atom(y) A x $ y) D T( x q Y)) 16. Generation of Predicate function expressions (T((z * u) v) A a(z) =(A^ Z V A z v j1 A varzist(u) A andZista((X(x)'- x D v),u)) T( D (lae) newvarex(z,((z*u) *v)) ),u S(z,newvarex(z, ((z*u)*v)),v) () *u)) 17. Generation of individual function expressions (T((z*u) (v) A a(z) = A^ z v A,z v A varlist(u) A andZista((X(x)' x 4 v),u)) DT( )newiv((z*u)*v) ( newiv((z*u)*v) () ( (lael newvarex(z, ((z*u)*v)) u 0 S(z, newvarex(z, ((z*u) *v) ),v) *u) )

-20218. Dropping one level T(x) D T( T(q x() 19. Raising one level T( ( x()) D T(x)

- 203 4.6 Table 6. Defined Complete Nature Recursive Functions of a Specific Rulel(x,y,z) =: a(y) =(^ z = ad(y) A x = add(y) Rule2(x,y) = a(x) = (A Iv( aad(x) A dad(x) = 0 A y = add(x) A ddd(x) = 0 A c aad(x) 4 add(x) RuZe3 (v,y,x,z,u) =: variable (x) A type(x) = type(z) A ~ x 4 u ^ v z Q u A S(u,S(x,z,u),y) = S(u,S(x,z,u),v) ) findx3(v,y) = [moZ(y) - y; a(v) = a(y) - findx3(d(v),d(y));'+ findx3(a(v),a(y)) ] findz3(v, y) =: findu3 (v,y,x) [moZ(y) -+ v; ac(v) = a(y) -+ findz3(d(v),d(y)); ~ + findz3(a(v),a(y)) ] = [atom(y)+ 0;Zistbinder(a(y)) A ad(y) # ad(v)[x e ad(y) + y; ~ + v]; a(y) - A ad(y) $ ad(v)+ [x = ad(y) -+ y; 6 -+ v]; findu3(a(v), a(y), x) # 0 -- findu3(a(v),a(y),x); + findu3(d(v),d(y),x)] RuZe3(v,y) =:RuZe3(v,y, findx3(v,y),findz3 (v,y), findu3(v,y, findxs(v,y))) RuZe4(u,y,x,z) =, variable(x) A simplexpr(z) A type(x) = exprtype(z) A freecheck(x,z,y) A u = Sf(x,z,y) findx4 (u,y) =: findx3 (u,y) findz4 (u,y) =:findz3 (u,y) Rue 4 (u,y) = RuZe4(uy, findx (u,y),findz(u,y)) Rule (v,y,u,x,z) =' z u A (Pfv(x) v Ifv(x)) A type(x) = exprtype(z) A freecheck(x,z,y) A v = Sf(x,z,y)

-204 findx (v,y) = findx3 (v,y) findz5(v,y) = findz (v,y) Rule (v,y,u) = RuleS (v,y,u, findx (v,y), findz (v,y) RuleS(v,u,y) = expression(v) A aa(y) = ) andZistlista((X(x,z) freecheck(x,z,adda(y))), ada(y), d(y)) A Snf(y, Ssfl(ada(y),d(y), adda(y)),u) = Snf(y, SsfZ(ada(y), d(y), adda(y)),v) A (y - u v (andZista((X(z) ~ z 0 adda(y)), ada(y)) A andisttista((X(x,z)O l z Dx ^ adda(y)), ada(y), d(y)))) findy 6(v,u) = [(atom(u) A atom(v))+ 0; aa(v) = A u = SsflZ(ada(v), d(v), adda(v)) - v; aa(u) = A v = SsfZ(ada(u), d(u),adda(u)) + u a(v) ~ a(u) + findy6(a(v), a(u)); -+ findy6(d(v), d(u)) ] Rule6(v,u) =: Rule6(v,u, findy (v,u)) Rule' (v,u,y) = expression (v) A a(y) = ( )ad (y) 4 add(y) A freecheck(ad(y), y, add (y)) A Snf(y, Sf(ad(y), y, add(y)),u) = Snf(y, Sf(ad(y), y,add(y)),v) findy7(v,u) =: [ (atom(u) A atom(v))+ 0; a(v) = (a A v ~ u + v; a(u) = (abel u v + u; a(v) $ a(u) -* findy7(a(v),a(u)); ~ + findy7(d(v), d(u)) ] Rue7 (v,u) =; RuZe (v,u, findy (v,u)) 7 7 ~~~~~~~7

-205RuZe8(x,z,y,w,u,v) = Simpleformula(y) A Pv(w) A andZista((X(x) (SimpleformuZa(x) A a(x) =AA aad(x) = w)), z) A u = SsZ(z, maplistcar(adad, z), y) A v = SsZ(z,maplistcar ((X(x) ( *nd*dd(x)), z),y) A W UA Wq VAX = Q y w u A( wC v findz8(y,u) =: [atom(y) v u = y -+ 0; a(y) = ( A adad(y) = u +Qy; ~~ + reverseconc(findz8(a(y), a(u)), findz8(d(y), d(u))) ] RuZe'.(0x,y,w,u,v) =:RuZe (x, findz8(y,u), y,w,u,v) Rule8 (x)=:RuZe8(x, ad(x), adadadd(x),addadadd(x), addaddadd(x)) RuZle(x,z,y,w,u,v) = Simpleformula(y) A Pv(w) A andlista((X(x) (Simpleterm(x) A a(x) = A ^ aad(x) = w)), z) A u = SsZ(z,mapZistcar(adad,z), y) A v = Ssl(z,mapZistcar((X(x)( nd *dd(x))), z),y) A -W. UA -W u. V x=y wX( u Y'^ W U) WDv)Q findz9(y,u) =: [atom(y) v u = y -+ 0 a(y) = (j)A adad(y) = u-+)y,; + - reverseconc(findz9(a(y), a(u)),findz9, (d(y),d(u))) ] Rule (x,y,w,u,v) ='RuZe'(x,findz (y,u), y,w,u,v) RuZeg (x) =: Rue (x,ad(x),aadadadd(x),addadadd(x),addaddadd(x)) Rule0 (x,z,y,u)'SSimpleformula (Y) Simpleformula(z) A S(z, u)= S(z, (pcond,)y) A x = (yQu)

-206findz10 (yu) = [atom(Y) v u = y -+ 0; y = ( pcon u =(pcond, a(y) = a(u) + findz10(d(y), d(u)); ~ + findzlo(a(y), a(u)) ] Rule (x,y,u) =, Rue (xfindz 0y,u) 10 io~x~findz10 10 u) Rulel (x) = RuZelo(x,ad(x),add(x)) Rulel (x,z,y,u) =; Simpleformula(y) A SimpZeterm(z) AS (z, ( onz,u) = S(Zc,) A x = (y u) findzll(y,u) = [atom(y) v u = y - 0; y = ci;i: u)) - u; U y Y; u = -conjp y + -; a(y) = a(u) - findzl (d(y),d(w)); + - findzll(a(y),a(u)) ] Rulel (x,y,u) = Rule (x,findzl (y,u)y,u) Rule (x) =t RuZel (x,ad(x),add(x)) RuZe12(v,u,x) = SimpleformuZa(x) A (a(x) = v a(x)= )O dad(x) 0 A Snf(x,a(x) acad(x)~ ) a(x) dad(cx) < add(x) u) Snf(x, a(x) ( aad(x) a (x)Q dad(x) add (x),v) findx12(v,u) =- [atom(u) v u = v 0; a (u) = v a(u)=: [ad(u) = (aad(v) v; ad(v) = aad(u) u; 5 -+ findx 2(add(v),add(u)) ] a(v) = a(u) -+findxl12(d(v), d(u)); ~ +- findxl2(a(v), a(u)) ] RuZe12(v,u) =: Rule12(v,u, findx (v,u))

-207 Rue 13(v) = v = ( adadad(v) adaddad(v) = (qu)adadad(v) *adaddad(v) ) ) Rue14 (u) =; (u = P adad(u) ( Pv(adad(u))) v (u = ( adad(u7) A Iv(adad(u))) (u = b((q adad(u) QA Pfvb(adad(u))) v (u = fv adad(u) A Ifvb(adad(u))) RuZle5(u) = u = adad(u)( ^ )q adadd(u) Aatom (adad(u)) A atom (adadd(u)) A adad(u) Z adadd (u) Rule 6(w,y,u,v,z) = y = ((z*u) v) A a(x) = z v ^ z ^A varZist(u) ^ andlista((X(x) x i v),u) A w = ( 1 l newvarex(z,((z*u)*v)) j u 9 S(z,newvarex(z, ((z*u)*v)),v) ) *u) Rule6 (w,y) = Rule6 (w,y,dad(y),add(y),aad(y)) Rule'7(w,y,u,v,z) =? y = ((z*u) v) A a(z) = Ao Z^ vA^ v A varZist(u) A andZista((X(x) ~ x v),u) A w = newiv((z*u)*v) ()newiv((z*u)*v) ) ( (a newvarex(z, ((z*u)*v)) ( 7) u s (z,newvarex(z,( (z*u) *v)),v) Q *u) Rule7 (w,y)=: RuZe 7(w,y, dad(y), add(y), aad(y))

-208 Pfstep(x) = ~ atom(x) A (orlistcar((X(y) orlistcar((X(z) Rulel(a(x),z,y)),d(x))), d(x)) v orllstcar((X(y) Rule2 (a(x),y)), d(x)) Y orzistcar((X(y) Rule3 (a(x),y)), d(x)) v orZistcar((X(y) Rule4 ( a(x),y)), d(x)) v orlistcar((X(y)orZistcar((X(z) Rule (a(x), z,y)), d(x))), d(x)) v orlistcar((X(y) Rule6 (a(x),y)), d(x)) v orZistcar((X(y) Rule7(a(x),y)), d(x)) v RuZe8(a(x)) vRule9(a(x)) v Rule 10 (a (x) Rule 1 (a (x)) vorZistcar((X(y)RuZe 12 (a (x), y) d(x)) v Rule 13(a(x)) Rule14 (a(x)) v Rule 1(a(x)) vorlistcar((X(y) RuZel6(a(x),y)), d(x)) v OrZistcar ( (X (y)RuZe 17 (a(x),y)), d(x)) ) Pcaxiom (x) x = q D (qDp) x = (s D (p D q)) D ((s:D p) D (s D q)) x = ((p D ) D )D p x = V(x) (p D P(x)) (p D V(x) P(x)) x = V() (x) D P(x)

-209 Eqaxiom (x) x= = v x = x = y D (P(x) D P(y)) v x = (p q) D (P(p) D P(q)) Peanoaxiom (x) =: x =: atom (x*y) v x = X*u = y*v D (x = y A u = v) v x =( V (x) (atom(x) D P(x)) A V(x,y) ((P(x) P(y))D P(x*y))) D (x) P(x) Definitionaxioms (x) = x = 0 ^ ( @ D @ ) v x = 3(x) P(x) E- V(x) -P(x) x =! (x) P(x) - ( (x) P(x) A V(x,y) ((P(x) A P(y)) D x = y)) v x =!(x) P(x) D P(i(x) P(x)) Atbmkindaxiom (x) =, x = /Iv(x) v Pfvb(x) v Ifvb(x) v nonvaratom(x)) D -Pv(x) v x = (Pv(x) v Pfvb(x) v Ifvb(x) v nonvaratom(x)) D' Iv(x) v x = (Pv(x) v Iv(x) v Ifvb(x) v nonvaratom(x)) D - Pfvb(x) v x = (Pv(x) v Iv(x) v Pfvb(x) v nonvaratom(x)) D Ifvb(x) v x = Pv(newpv(x)) v x = Iv(newiv(x)) v x = Pfvb(newpfvb(x)) v x = Ifvb(newifvb(x)) v x = ~newpv(x) Q x v x = ~newiv(x) v x v x = -newpfvb(x) 4 x v x = ~newifvb(x). x,. _____/

-210 Axiom(x) =:Pcaxiom(x) v Eqaxiom(x) v Peanoaxiom(x) v Definitionaxiom(x) v Atomkindaxiom(x) v x = Pfstep(x Proof(x) = (Pfstep(x) v Axiom(a(x))) A^ atom(x) A (d(x) = 0 v Proof(d(x))) 4.7 Table 7. Definition of T and Immediate Consequences T(x) =: 3(y) (Proof(y) A a(y) = x) Theorems in 1. (T(y) A 2. (T(y) ^ 3. (T(y) A 4. (T(y) A 5. (T(y) A 6. (T(y) A 7. (T(y) A 8. Rule (x) 9. Rule9(x) the form of rules of inference: T(z) A RuZel(x,y,z)) D T(x) RuZe2 (x,y)) D T(x) RuZe3 (x,y)) D T(x) Rule4 (x y)) D T(x) T(z) A Rule (x,y,z)) D T(x) RuZe6(x,y)) DT(x) RuZe7(x,y)) D T(x) D T(x) D T(x) 10. 11. 12. 13. 14. 15. 16. 17. RuZe 10(x) D T(x) Rule11 (x) D T(x) (T(y) RuZe 12 (x,y)) D T(x) RuZe 13(x) D T(x) RuZe14(x) D T(x) RuZe5 (x) T(x) (T(y) A RuZel6(x,y)) D T(x) (T(y) A Rulel7 (x,y)) D T(y)

-211 4.8 Table 8. Non-Recursive Definitions Especially Useful for Meta-Theorems. Some Immediate Consequences. Pfe(x) = a(exprtype(x)) =( A ^ 3(y) (T(y) A x. y) Ife(x) = a(exprtype(x)) = A ^ (y) (T(y) A x 4 y) function(x) =: Pfe (x) v Ife(x) wfexpression(x) function(x) v [atom (x) + Pv(x) v x = v x = v Iv(x); a(x) = A dd(x) = 0; a(x) = a(ast(x)) =(^ andlista((A(y) (wfexpression(a(y)) A wfexpression(ad(y)) A dd(y) = 0 A exprtype(a(y)) = ( A exprtype(ad(y)) = ) )),d(x)) a(x) = n a(ast(x)) = A andlista((X(y) (wfexpression(a(y)) A wfexpression(ad(y)) A dd(y) = 0 A exprtype(a(y)) = () exprtype(ad(y)) = )),d(x)); a(x) = v a(x) = x) a(x) = (-*andlista(Iv,ad(x)) A wfexpression(add(x)) A (a(x) = ) dad(x) = 0)A ddd(x) = 0 A exprtype(add(x)) = a(x) = +- Iv(aad(x)) prss(x)) wfpre on x)) A dad(x) = 0 A ddd(x)= 0 A exprtype(add(x)) = (; 3 -+ function (a(x)) A andlistZista((X(u,v) u = exprtype(v)),args(a(x)),d(x)) A andlista (wfexpression,d(x)) F(x) = wfexpression(x) A exprtype(x) = Tm(x) =, wfexpression(x) A exprtype(x) = form(x) = F(x) v Tm(x) wffragment(x) =: a(y) (wfexpression(y) A x Q y)

-212 Theorems: wfexpression(x) - (form(x) v function(x)) F(x) - [atom(x) + Pv(x) v x = v x =; a(x) = ) a(ast(x)) == ^andlista((X(y) (F(a(y)) A F(ad(y)) ^ dd(y) = 0)),d(x)); a(x) = v a(x) = a(x) = *) + andZista(Iv,ad(x)) A F(add(x)) A (afx) = (3 dad(x) = 0) A ddd(x)=0; ~ -*Pfe(a (x))AandZistZista((X(u,v) u = exprtype(v)),args(a(x)),d(x)) A andlista (wfexpression, d (x)) ] Tn(x) = [atom(x) + Iv(x);a(x) = ( dd(x) = 0 -; a(x) =( ) a(ast'(x)) = andlista((h(y) (F(a(y)) A'm(ad(y)) A dd(y) = 0)),d(x)); a(x) = + Iv(aad(x)) A F(add(x)) A dad(x) = 0 A ddd(x) = 0; + Ife(a(x)) A andZistlista((X(u,v) u = exprtype(v)),args(a(x)),d(x)) A andlista (wfexpression,d (x)) ] Pfe(x) _ (Pfatom(x) v Pfv(x) v (a(x) = ^ varlist (ad(x) A F(add(x)) A ddd(x) = 0) v (a(x) = ae A Pfv(ad(x)) A Pfe(add(x)) A ad(x) 4 add(x) A ~ad(x). Snf(ad(x), 0,add(x))A ddd(x) = 0 A type(ad(x)) = exprtype(add(x)) A 3(y) (T(y) A x Q y) ) ) Ife (x) (Ifatom(x) v Ifv(x) v (a(x) = A varlist(ad(x) ) A Tm(add(x)) A ddd(x) = 0) v (a(x) = ae Ifv (ad (x))A Ife(add(x)) A ad(x) ~ add(x) A ~ad(x) 4 Snf(ad(x), 0,add(x)) A ddd(x) = 0 A type(ad(x)) = exprtype(add(x)) A 3(y) (TCy) A x y) ) )

-213 (T(y) A variable (x) A wfexpression(z) A type(x) = exprtype(z) A freecheck(x,y,z)) D T(Sf(x,z,y)) The above theorem, or meta-theorem, summarizes Rules 4 and 5. This could be used as a rule, but unlike our original 19 rules, this rule contains function expressions which are not complete recursing. For that reason this rule is not equivalent to Rules 4 and 5 as a rule; it is only equivalent as a meta- theorem. Let us combine the above theorems on Pfe and Ife and make some minor changes to obtain: function(x) - Pfatom(x) v Pfv(x) v Ifatom(x) v Ifv(x) v (a(x) =(A varlist(ad(x)) A form(add(x)) A ddd(x) = 0 ) v (a(x) = a A (Pfv (ad (x)) v Ifv (ad(x))) A type(ad(x)) = exprtype (add (x)) A aadd(x) = QA viarlist(adadd(x)) A form(addadd(x)) A ad(x) A addadd(x) A aad(x) 4 Snf(ad(x), 0,addadd (x)) A ddd(x) = 0 A dddadd(x) = 0 A C(y) (T(y) A x 4 y)) Note the similarity of this theorem to the definition statement of formtofunctiontp. In fact, the similarity is shown by the following theorem: function (x) - (formtofunctiontp (wfexpression, x) A a(x) = ( ) 3(y) (T(y) A x Q y))) Bearing this theorem in mind, we see that expression and wfexpression are almost identical, the only difference being that the 3(y)(T(y) A x y) condition is added in a couple of places in wfexpression. The following theorem indicates this relationship. wfexpression(x) E (expression(x) A V(z)((z 4 x A a(z) = ab ) D 3(y) (T(y) A z - y)))

-214Unlike wfexpression, expression is recursive. We have the theorem wfexpression(x) D expression(x) We now define: Pfetp(x) = expression(x) A a (exprtype(x)) = Pf Ifetp(x) =: expression(x) A a (exprtype(x)) = I functiontp(x) = Pfetp(x) v Ifetp(x) Ftp(x) = expression(x) A exprtype(x) = Tmtp (x) = expression (x) A exprtype(x) = ( formtp (x) Ftp (x) v Tmtp(x) And now we have these theorems: functiontp (x) = formtofunctiontp (expression,x) function(x) = (functiontp (x) A (y) (T(y) A x y)) wfexpression(x) - (expression (x) A V(z)((z x A functiontp(z)) D 3(y) (T(y) A z. y))) wfexpression(x) = (expression(x) A 3(y) (T(y) A x 10 y)) form(x) (formtp(x) A 3(y) (T(y) A x Q y) F(x) = (Ftp(x) A 3(y) (T(y) A x Q y) Tm (x) - (Tntp(x) A 3(y) (T(y) A x Q y) Pfe(x) = Pfetp(x) A 3(y) (T(y) A x? y) Ife(x) - (Ifetp(x) A 3(y) (T(y) A x A y)

-2154.9 Table 9. Definitions for Handling Recursive Functions; apZ C= y =; [mo(x) - x = y; listbinder(a(x))- a(x) = a(y)A SsfZ(ad(x),ad(y),add(y)) c add(x); a(x) = a(x) = a(y) A Sf(ad(x),ad (y),add (y))c add(x) + -+ andZistZista( cm,x,y)] repZacefx(v,y,u) =: [atom(v) - u; a(v) e y -* repZacefx(d(v),y,Sf(a(v),newvarex(a(v),y*u),u)); 3 - replacefx(d(v),y,u) ] freefix(x,y, z) =s [moZ(z) v - x z + z; listbinder(a(z)) + a(z) *replacefx(ad(z),y,freefix(x,y,d(z))) a(za(= a a(z) *replacefx( ad(z) ),y,freefix(x,y,d(z)))) + freefix(x,y,a(z)) * freefix(x,y,d(z)) ] freefixZists (x,y, z) = [atom(x) + z; ~ +- freefixlists(d(x), y,freefix(a (x),y,z))] Sffix(x,y, z) =:Sf(x,y,freefix(x,y, z)) SsffixZ (x,y,z) =: Ssfl (x,y, freefixlists(x,y, z))

-216 aploneatom(x) =: [aad(x) Z qu -+ x; a(x) = Pv - [ Pv(adad(x))'); a(x) = Iv + [ Iv(adad(x)) +); a(x) = Pfvb + [ Pfvb(adad(x))+(; a(x) = Ifvb + [ Ifvb(adad(x)) ); a (x) = newpv - qu, newpv(adad(x)); a(x) = newiv (qu, newiv(adad(x)); a(x) = newpfvb + (qu, newpfvb (adad(x)); a(x) = newifvb + (qu, newifvb(adad(x)) ) a(x) = a a (adad(x) )); a(x) = l(qu, d(adad(x)) + x] +@]; e + ]; ~ + ]; ~+(g]; apltwoatoms(x) = [a(x) = ( + [ad(x) = ~ v add(x) = (g + (;ad(x) =@ * add(x); -+ x]; aad(x) ~ I) v aadd(x) # + x; a(x) = ( + [ adad(x)= adadd(x) +*;~+ ]; a(x) = ( (cdad(x) * adadd(x)) ); 0 + x]

-217oneapZ (x) [mo(x) + x; a(x) = o)neapZ(ad(x)) ); a(x) = ( ) v a(x) = c) [atom(d(x)) - x; aad(x) = + adad(x); aad(x) = a(x) * dd(x); + a(x) *( (oneapZ(aad(x)) O adad(x) * dd(x)) ]; a(x) = (v a(x)= v a(x) = - apttwoatoms( Oa(x) Q oneapl(ad(x)) 9 oneapZ(add(x)) ); a(x) = v a(x) = v Pfatom(a(x)) v Ifatom(a(x)) + aploneatom( a (x) g oneapZ(ad(x)) ); atom(a(x)) x; aa(x) = () + SsffixZ(ada(x),d(x),adda(x)); aa(x) = ( e + Sffix(ada(x),a(x),adda(x))*d(x); ~+ x] apl(x) =s [oneapZ(x) = x -+ x; 0 + apZ(oneapZ(x)) ]

-218 4.10 Examples 4.10.1 Example 1. Reflexivity and Transitivity of = 1. = x 2. ((X(y)y = x),x) 3. x = y D (((X(y)y = 4. x = yD(x = x Dy 5. x = yDy=x 6. y = z D (((X(y)y = 7. y = z D (y = x Dz 8. z = y (y = x Dz x),x) D ((X(y)y = x),y)) = x) x),y) D ((X(y)y = x),z)) = x) = x) Axiom 6 Rule 6 Axiom 7, Rule 5 Rule 6 by P.C. Axiom 7, Rule 5 Rule 6 From Lines 5 & 7, by P.C. In future examples, where I use substitution instances of Lines 5 & 8 followed by P.C. operations, I will merely write the result with the comment "properties of =." 4.10.2 Example 2. Proof of -atom(x) - x = a(x) *d(x) (basic theorem for a and d ). consedpred(x) =: atom(x) D 3,(y) 3(z) y*z = x 1. p D( -p D q) 2. f(x) D (~f(x) D 3(y) 3(z) y*z = x) 3. atom(x) D ( atom(x) D 3(y) 3(z) y*z = x) 4. atom(x) O consedpred(x) 5. V(x) (atom(x) D consedpred(x) ) By P.C. By Rule By Rule By Rule By Rule 4 twice 5 6 2 Since consedpred does not recurse, we were able to generate it without use of Rules 16 or 17. In future examples I will not bother to explicitly generate such functions since it is always trivial via Rule 6.

-219The axioms and rules used to generate Line 3 are those of the Predicate calculus, followed by Rules 4 and 5. In the future, statements so simply derived will not be explicitly derived but will be merely stated with the comment "by P.C." 6. 7. 8. 9. 10. 11. 12. 13. 14. u * v = u * v. Axiom 6 3(y) 3(z) y * z = u * v By P.C. consedpred(u * v) By P.C. ~ Rule 6 V(x,y) (consedpred(x) A consedpred(y)) D consedpred(x * y) By P.C. v(x)consedpred(x) By substitution of consedpred for P in Axiom 10; and P.C. with Lines 5 & 9 ~atom(u) D 3 (x) 3(z) x * z = u By P.C. 4 Rule 6 3(z) x * z = u D ((atom(u) A x = u) v 3(z) x * z = u) By P.C. 3(x) 3(z) x * z = u D 3(x) ((atom(u) A x = u) v 3(z) x * z = u) By P.C. ~atom(u) D 3(x) ((atom(u) x = u) v 3(z) x * z = u) From Lines 11 & 13 by P.C. temponeu(x) =- ((atom(u) A x = u) v 3(z) x * z = u) 15. - atom(u)D 3(x) tempone (x) Rule 6 16. (-'atom(u) A temponeu(x) A temponeu(y))D (3(z) x * z = 3I(y) y 17. (x * z = u y * w = u) D x * z = y * w 18. x * z = y * w x = y 19. ( 3(z) x * z = u A 3(z) y * z = u) D x = y 20. (~atom(u) A terpone (x) ^ tempone (y)) D x = y 21. ~atom(u) D V(x,y) (( tempone (x) A tempone (y)) D x = y) U A * z = u) By P.C. ~ Rule 6 Properties of = Axiom 9 From Lines 17 & 18, by P.C. From Lines 16 & 19 By P.C.

-220 22. ~atom(u) D 3! (x) tempone (x) 23. -atom(u) D tempone (i(y) tempone (y)) 24. ~atom(u) D tempone (a(u)) 25. ~atom(u) D 3(z) a(u) * z = u We can, from Line 11, similarly derive 26. atom(u) D 3(z) z *d(u) = u 27. (a(u) * w = u ^ z *d(u) = u) D a(u) = z From Axiom 13 and Lines 15 & 21 by P.C. From Axiom 13 and Line 22 by P.C. Rule 6 twice Rule 6 ~ P.C. Properties of = and Axiom 9 From Line 27 by P.C. From Line 28 by P.C. From Lines 25, 26, & 29 Axiom 7 Axiom 8 From Lines 32 & 33 by P.C. From Lines 31 & 35 28. 29. 30. 31. 32. 33. 34. (a(u) * w = u A z * d(u) = u) D a(u) *d(u) = u ( 3(z) a(u) * z = u A 3(z)z*d(u)=u) = u = a(u) *d(u) atom(u): u = a(u) * d(u) ~atom(x) D x = a(x) *d(x) x = a(x) *d(x): (atom(x) D atom(a(x) *d(x))) -atom(a(x) *d(x)) x = a(x) *d(x) D -atom(x) 35. S atom(x) = x = a(x) *d(x)

-221 4.10.3 Example 3. Some More Theorems About a and d; an Alternative Induction Axiom. By a technique similar to that of Example 2, we can prove 36. atom(x) = x = a(x) and 37. atom(x) x = d(x) Corollaries to Line 35 are: 38. x*y = a(x*y) *d(x*y) Substitute x*y for x in Line 35 & use Axiom 8 39. x = a(x*y) Axiom 9 40. y = d(x*y) Axiom 9 41. x*y = u D (x = a(x*y) D x = a(u)) Axiom 7 42. u = x*y D x = a(u) From Lines 39 & 41 43. u = x*y D y = d(u) Similarly all these theorems will be referred to below as simply properties of a & d In view of 42. and 43., we can show the following: 44. u = x*yD(((P(a(u)) A P(d(u))) D P(u)) ((P(x) A P(y)) DP(x*y))) Via Axiom 7 45. ((P(a(u)) A P(d(u))) D P(u)) D (u = x*y D ((P(x) A P(y)) D P(x*y))) By P.C. 46. V(u) ((P(a(u)) A P(d(u)) D P(u)) D (u = x*y D ((P(x) A P(y)) D P(x*y))) By P.C. 47. u = x*y D ( (u) ((P(a(u))A P(d(u))) D P(u)) D ((P(x) A P(y)) D P(x*y))) By P.C. 48. 3(u) u = x*y D( V(u) ((P(a(u)) A P(d(u))) D P(u)) ((P(x) P(y)) D P(x*y))) By P.C. 49. x*y = x*y Axiom 6 50. 3(u) u = x*y From 49 via P.C.

-222 51. V(u) ((P(a(u)) Ap(d(u))) P(u)) = ((P(x) A P(y)) D P(x*y)) From Lines 50 & 48 52. V(x) ((P(a(x)) A P(d(x))) P(x)) ~ V(x,y) ((P(x) A P(y)) D P(x*y)) from this and Axiom 10, we get an alternative induction axiom 53. (V(x) (atom(x) P(x)) A V(x) ((P(a(x)) A P(d(x))) P P(x))) D V(x)P(x) 4.10.4 Example 4. Course of Values Induction (and a Corollary): Prove (V(x) ( V(z) (z < x = P(z)): P(x)) V (V(x) P(x)) Define, for this example B(x) =' V(z) (z v x = P(z)) q(x) =: V(z) (z < x = P(z)) = P(x) Problem is to prove (V(x) q(x)) z (V(x) P(x)). (Note: moZ(x) a ( q(x) = P(x)) holds.) The proof proceeds as follows: 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. z. x (z x v z = x) z o x a (z v a(x) v z. d(x)) From defn.of contains B(x) = (z 4 x ) P(z)) (B(a(x)) A B(d(x))) D ((z Q a(x) m P(z)) A (z 6 d(x) m P(z)) (B(a(x)) A B(d(x))) 3 ((z 4 a(x) v z Q d(x)) = P(z)) (B(a(x)) A B(d(x))) (z 4 x z P(z)) From 55 and 58 (B(a(x)) A B(d(x))) W V(z) (z < x D P(z)) ( t(x) A B(a(x))A B(d(x))) 2 P(x) P(x) = (z = x M P(z)) (B(a(x)) A B(d(x)) A P(x)) 3 ((z < x = P(z)) A (z = x 3 P(z))) From 59 and 62

-223 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. ((z 4 x 3 P(z)) A (z = x = P(z))) = (z 4 X = P(z)) (B(a(x)) A B(d(x)) A P(x)) = (z Q x = P(z)) (B(a(x)) A B(d(x)) A P(x)) = B(x) $(x) = ((B(a(x)) A B(d(x))) D B(x)) ( V(x) (x)) (V(x) (B(a(x)) A B(d(x))) ) B(x))) atom(x) ~z < x atom(x) O (z < x = P(z)) atom(x) z ( V(z) (z a x z P(z))) (((x) A atom(x) P(x) (atom(x) A P(x)) = (z Q x D P(z)) (atom(x) A P(x)) B B(x) (x) = (atom (x) B(x)) (V(x) $(x)) D (V(x) (atom(x) z B(x)) ) From 54 From 59, 62, & 64 From 61 and 66 From definition of contains From 62, 64, and 70 From 72 and 74 By Line 53 we get 77. CV(x) (x)) ( (V(x)B(x)) From 68 and 76, but we also get 78. B(x) = (x < x = P(x)) so we have the following: 79. B(x): P(x) 80. (V(x)B(x)) => (V(x) P(x)) 81. (V(x)>(x)) z (V (x) P(x)) From 77 and 80 82. (V(x) (V(z) (z < x = P(z)) = P(x))) 3 (V(x)P(x)) Rule 6 Other induction rules may then be derived as corollaries as follows. 83. -atom(x) z d(x) < x 84. ( (z) (z < x = P(z))) ) (d(x) < x = P(d(x))) 85. ~atom(x) z ((V(z)(z 4 x P(z))) P(d(x)))

-224 86. Patom(x) ((P(d(x) D P (x)) D (x)) 87. (P(d(x)) D P(x)) D (- atom(x) D (x)) 88. ( V(x) (P(d(x)) D P(x))) D (-atom(x) D (x)) 89. P(x) D (x) 90. (atom(x) D P(x)) D (atom(x) D F (x)) 91. ( V(x) (atom(x) D P(x))) D (atom(x) D (x)) 92. ( (V(x) (atom(x) D P(x))) A^ (V(x) (P(d(x)) O P(x)))) D (x) 95. ((V(x) (atom(x) P(x))) A (V(x)(P(d(x)) D P(x)))) D (V(x) P(x)) (Same proof works with d replaced by addadd, for example) 4.10.5. Example 5. Generation of Some Labeled Functions, e.g., maplist. This may not be the most efficient generation of these functions. nestp"(x) =: andZistlistcar(( (y,x) (~ atom(x) A y = d(x))),d(x),x) Easily generatable 1. nestp' (x) - nestp'(x) Axiom 6 2. nestp'(x) E (d(x) = t v (-atom(d(x)) A'atom(x) A^atom(a(x))Aad(x)=da(x) A andlistZistcar((X(y,x) (Catom(x)A y = d(x))),dd(x),d(x)))) Rules 6 & 7 several times

-225 3. nestp'(x) =(d(x) = 0 v (~atom(d(x)) A ~atom(x) A ~atom(a(x)) A ad(x) = da(x) nestp'(d(x))))) Rule 6 nestp(x) =: d(x) = 0 v (~atom(x) A atom(d(x))A ~atom(a(x)) A ad(x) = da(x) nestp (d(x))) Now apply Rule 16 to Line 3 with z = nestp U =; (X) v =: d(x) v ( =atom(d(x)) A vatom(x) A ad(x) = da(x) nestp'(d(x)))) to get 4. D nestp(x) 5. orlistcar(P,x) - orlistcar(P,x) 6. orlistcar(P,x) -= andlistcar((X(x) P (x)),x) 7. orlistcar(P,x) ~(atom(x) v ( ~P(a(x)) A andlistcar((X(x)'P(x)),d(x)))) Rule 7 once & 6 twice 8. orlistcar(P,x) (- atom(x) A (P(a(x)) v ~andlistcar((x(x) P (x)),d(x)))) By P.C. 9. orlistcar(P,x) ( ( ~atom(x) A (P(a(x)) v orZistcar(P,d(x)))) Rule 6 10. x e y orlistcar((X(u) u = x),y) Rule 6 11. x e y ( (~atom(y) A (a(y) = x v orlistcar((X(u) u = x),d(y)))) Line 9 and Rule 6 12. x e y ( (~atom(y) A (a(y) = x v x e d(y))) Rule 6 nestpp'(x) = nestp(x) A 3(y) (atom(y) A y e x) (generate this via Rule 6)

-226 13. nestpp'(x) - (d(x) = 0 v (-atom(x) A ~atom(d(x)) A - atom(a(x)) 3(y) (atom(y) ^ (~ atom(x) 14. 3(y) (atom(y) ^ (-atom(x) A (a(x) = y v (3(y) (atom(y) A a(x) = y) v 3(y) A ad(x) = da(x) A nestp(d(x)))) A A (a(x) = y v y e d(x)))) Rules 7 & 6 several times x e d(y)))) = (~ atom(x) A (atom(y) A y e d(x)))) By P.C. 15. 3(y) (atom(y) A a(x) = y) E atom(a(x)) By P.C. 16. nestpp'(x) E (d(x) = 0 v (aatom(x) A ~atom(d(x)) A atom(a (x) A ad(x) = da(x) A nestp(d(x)))) A atom(x) A (atom(a(x)) v 3(y) (atom(y) A y e d(x)) From Lines 13, 14 and 15 by P.C. 17. y E d(x) = atom(d(x)) From Line 12 18. nestpp (x) = ( atom(x) A ( (atom(a(x)) A d(x) = 0 ) v (-atom(d(x)) A -atom(a(x)) A ad(x) = da(x) A nestp(d(x)) 3(y) (atom(y) y e d(x))))) A From Lines 16 and 17 by P.C. 19. nestpp (x) = ( atom(x) A ((atom(a(x)) A d(x) = 0 ) v ( atom(d(x)) A ^atom(a(x)) A ad(x) = da(x) A nestpp (d(x))))) By Rule 6 nestpp(x) =: -atom(x) A ((atom(a(x)) A d(x) = 0 ) v (~atom(d(x)) A ^atom(a(x)) A ad(x) = da(x) A nestpp(d(x))))) 20. ~ > nestpp(x) By Rule 16 nestpred(x,y) =: a(y) = x A nestpp(y)

-227 21. nestpred(x,y) = nestpred(x,y) By P.C. 22. nestpred(x,y) - (a(y) = x ^ ~atom(y) A ((atom (a(y)) A d(y) = 0 ) v (~atom(d(y)) ^ atom(a(y)) A ad(y) = da(y) A nestpp(d(y))))) Rules 7 and 6 and P.C. 23. nestpred(x,y) - (a(y) = x A -atom(y) A ((atom(x) A d(y) = 0 ) v (~atom(d(y)) A ~atom(x) A ad(y) = d(x) A nestpp(d(y))))) Via Axiom 7 24. nestpred(x,y) - (~atom(y) A a(y) = x A ((atom(x) A d(y) = ) v (-atom(x) A ~atom(d(y)) A nestpred(d(x),d(y)))) Rule 6 25. nestpred(x,y) ~ ~ atom(y) By P. C. 26. nestpred(d(x),d(y)) ~ -atom(d(y)) Rule 4 and 5 27. (a(y) = x A d(y) = 0 ) - a(y) *d(y) = x * 0 Property of = 28. ~atom(y) E y = a(y) *d(y) Basic theorem for a & d 29. (~atom(y) A a(y) = x A d(y) = 0) y = x * 0 From Lines 27 and 28 by P.C. 30. nestpred(x,y) - ((atom(x) A y = x * 0) v (~atom(x) A ~atom(y) A x = a(y) A nestpred(d(x),d(y)))) From Lines 24, 26, and 29 by P.C. nest (x) = l(y) nestpred(x,y) 31.! (y)nestpred(x,y) m nestpred(x, (y)nestpred(x,y)) Substitution into Axiom 14,plus Rule 6 32. 3! (y)nestpred(x,y) m nestpred(x,nest'(x)) Rule 6 Thus nest' has been generated.

-228 33. nestpred(x,x*y) - ((atom(x) A x*y = x * 0) v ( ~atom(x) A ~atom(x*y) A x = a(x*y) A nestpred(d(x),d(x*y)))) From Line 30 by Rule 4 34. (~atom(x) A nestpred(d(x),y)) = nestpred(x,x*y) By P.C. & properties of * and a 35. nestpred(x,x*y): 3(y)nestpred(x,y) By P.C. 36. (~atom(x) A nestpred(d(x),y)): 3(y)nestpred(x,y) From 34 and 35 by P.C. 37. -atom(x): ( 3(y)nestpred(d(x),y): 3(y)nestpred(x,y)) By P.C. 38. atom(x) z d(x) = x Already proved 39. atomr(x) (3(y)nestpred(d(x),y) D 3(y)nestpred(x,y)) Via Axiom 7 40. V(x) ( 3(y)nestpred(d(x),y) D 3(y)nestpred(x,y)) From 37 and 39 by P.C. 41. atom(x) nestpred(x,x * 0) Put 0 for y in Line 33 and use P.C. 42. V(x) (atom(x) 3a(y)nestpred(x,y)) By P.C. 43. V(x) 3(y)nestpred(x,y) From 40 and 42 via substitution into induction rule corollary, Section 4.10.4 44. 3(y)nestpred(x,y) By P.C. 45. (-atom(x) A nestpred(x,y)): (nestpred(d(x),d(y)) A x = a(y)) From Line 30 via P.C. 46. ( atom(x) A nstpred(x,y) A nestpred(x,z)) (nestpred(d(x),d(y)) A nestpred(d(x),d(z)) A a(Y) = a(z)) By P.C. and properties of =

-229 47. V(y,z) ((nestpred(d(x),y) A nestpred(d(x),z)) D y = z) = ((nestpred(d(x),d(Y)) A nestpred(d(x),d(z))) = d(Y) = d(z)) By P.C. 48. (-atom(x) A V(y,z) ((nestpred(d(x),y) A nestpred(d(x),z)) m y = z)) = ((nestpred(x,y) A nestpred(x,z)) = (a(Y) = a(z) A d(y) = d(z))) From 46 and 47 by P.C. 49. ~atom(x) D ( V(y,z) ((nestpred(d(x),) A nestpred(d(x),z)) y = z) = V (y,z) ((nestpred(x,y) A nestpred(x,z)) y = z)) By P.C. and properties of = 50. atom(x) d(x) = x Already proved 51. atom(x)W (V(y,z) ((nestpred(d(x),Y) A nestpred(d(x),z)) Y = z) D V(y,z) ((nestpred(x,y) A nestpred(x,z)) y = z)) Properties of 52. V(x) (V(y,z) ((nestpred(d(x),y) A nestpred(d(x),z)) z y = z) z V(y,z) ((nestpred(x,y) A nestpred(x,z)) y = z)) From 49 and 51 by P.C. 53. (atom(x) A nestpred(x,y)) ) y = x * 0 From Line 30 via P.C. 54. atom(x) ((nestpred(x,y) A nestpred(x,z)) 3 y = z By P.C. & properties of = 55. V(x)(atom(x) V(y,z) ((nestpred(x,y) A nestpred(x,z)) > y = z)) By P.C. 56. V(x) V(y,z) ((nestpred(x,y) A nestpred(x,z)) D y = z) From 52 and 55 via substitution into induction rule corollary, Section 4.10.4. 57. V(y,z) ((nestpred y) tpred,) red(x,z))= y = z) By P.C.

-230 58. 3! (y)nestpred(x,y) From 44 and 57 via substitution into Axiom 13 59. 3! (y)nestpred(x,y), nestpred(x,nest'(x)) Repeat of Line 32 60. nestpred(x,nest'(x)) Modus ponens 61. (nestpred(x,y) A nestpred(x,z)) z y = z From 57 by P.C. 62. (nestpred(x,nest'(x))A nestpred(x,y)) = y = nest'(x) Rules 4 and 5 63. nestpred(x,nest'(x)) Repeat of Line 60 64. nestpred(x,y) D y = nest'(x) From 62 and 63 by P.C. 65. y = nest'(x) D (nestpred(x,nest'(x)) > nestpred(x,y)) Axiom 7 66. y = nest'(x) nestpred(x,y) From 63 and 65 by P.C. 67. nestpred(x,y) E y = nest'(x) From 64 and 66 68. (atom(x) A nest'(x) = x * 0) v (~atom(x) ^A atom(nest'(x)) A x = a(nest(x))^ nestpred(d(x),d(nest'(x)))) From Line 30 and 60 69.,atom(x) ( atom(nest(x) ) x = a(nest (x)) nestpred(d(x),d(nest (x)))) By P.C. 70. nestpred(d(x),d(nest'(x)))- d(nest'(x))=nest (d(x)) Substitution into Line 67 71. atom(nest (x))- nest (x) = a(nestC(x)) *d(nest'(x)) Properties of a and d 72. catom(x): (nest'(x)= a(nest'(x)) *d(nest(x)) A a(nest'(x)) x A d(nest(x)) = nest'(d(x))) From 69, 70, and 71 by P.C. 73. ~-atom(x) = nest'tx) = x *neest(d(x)) Properties of = and P.C.

-231 74. ( xg(x) 3 f(x) = x * f(d(x))) E (-g(x) f(x) = [ -* x * f(d(x))]) Via Rule 11 with z = ( (x 75. ( atom(x) = nest'(x) = x * nest'(d(x))) = (~atom(x): nest'(x) = [ -+ x * nest(d(x))]) Rule 5 76. -~atom(x) = nest'(x) = [ - x * nest (d(x))] From 73 and 75 by P.C. 77. atom(x) = nest'(x) = x * 0 From 68 by P.C. 78. f(x) = [g(x) - x * 0; 0 + x * f(d(x))] ((g(x) = f(x) = x * 0) A (- g(x) f(x) = [ e + x * f(d(x))])) Rule 9 with z =' QCg(x) - x * 0; - x * f(d(x))]) 79. nest'(x) = [atom(x) + x * 0; +- x * nest(d(x))] ((atom(x) ~ nest'(x) = x * 0) A (~atom(x) nest'(x) = [ ~ + x * nest'(d(x))])) Rule 5 80. nest'(x) = [atom(x) x * 0; + x * nest"(d(x))] From 76, 77, and 79 by P.C. nest(x) = [atom(x) + x * 0; 0 + x * nest(d(x))] 81. a(z) z = nest(x) From 80 by Rule 17 with z =: nests u =: (x) v = atom (x) + x * 0; 0 + x * nest'(d(x))] So nest has been generated!

-232 maplist'(f,x) =: maplistcar(f,nest(x)) 82. maplist-(f,x) = [atom(x) + x; o + f(a(nest(x)))*mapZistcar(f,d(nest(x)))] From Axiom 6 and Rules 4, 5, 7, & 6. 83. (atom(x) = mapZist'(fx) = x) (~atom(x) = mapZist'(f,x) = [ 0 + f(a(nest(x)))* mapZistcar(f,d(nest(x))) ] ) Via Rule 9 as before 84. nest(x) = [atom(x) x * 0; + x * nest(d(x))] From Axiom 6 via rules 4,5,7, & 6. 85. (atom(x) = nest(x) = x * 0) A (-atom(x) = nest(x) = [ 0 - x * nest(d(x))]) Via Rule 9 as before 86. ~atom(x) m nest(x) = [ e + x * nest(d(x))] By P.C. 87. ~atom(x) = nest(x) = x * nest(d(x)) Via Rule 11 as before 88. -atom(x) = (a(nest(x)) = x A d(nest(x)) = nest(d(x))) Properties of a, d,and=. 89. (atom(x) z maplist'(f,x) = x) A (-atom(x) = mapZist'(f,x) = [ 0 - f(x) * mapZistcar(f,nest(d(x))]) From 83, 88, and Axiom 7 by P.C. 90. (atom(x) z mapZist'(f,x) = x) A (~atom(x): mapZist'(f,x) = [ 0 + f(x) *maplist(f,d(x))]) Rule 6 91. maplist'(f,x) = [atom(x) - x; -+ f(x) *maplist'(f,d(x))] Via Rule 9 as before maplist(f,x) =: [atom(x) - x; 0 f(x) * maptist(f,d(x))] 92. 3(z) z = mapZist(f,x) Via Rule 17 as before

-233 If we wished we could go on to prove inductively such theorems as mapZist (f,x) = map ist' (f,x) and maplistcar(f,x) = maplist((X(x) f(a(x))), x) we could similarly define andtist (P,x) =: andlistcar(P,nest (x)) andlist(P,x)= [atom(x) -+;, - P(x) A andZist(P,d(x)) ] and prove theorems andlist(P,x) = andlist'(P,x) and andZistcar(P,x) = andlist((X(x) P(a(x))), x)

-2344.10.6 Example 6: The Predecessor Function Suppose we have S]defined as in Section 2.1.1.9, and suppose we have as theorems the Peano axioms for[s, i.e. A -(0 = E (x)) B (x) = S(y) x = y C (P(0) A V(x) (P(x) D P( (x)))) DV(x) P(x) (Note: This does not give us enough power to prove the true statements of form3 (Q) =. To prove these we would have to restore the ordering of the atoms which we threw away in Section 2.1.1.10. One way to do this would be to add back into the system the rules in the footnote near the end of Section 2.1.1.9.) We now consider the predecessor function whose ordinary name is predecessor(x) = t(y) ((x = 0 ^ y = x) Sv (y) = x). We first want to generate an algorithmic name of this form: (X,(x),((label, f,(A(z, x) [ (z) = x-z; x = 0+x; f( f (z), x, x)])),, x)) We use the dummy z to count up from zero to the predessor. Our first job is to introduce the z. This is easy. Define dunmypred(z, x) =: predecessor(x). Now dummypred is a complete function of z and x. We easily prove 3!(y) ((x = 0Ay = x) v[2 (y) = x) so by axiom 14 we get 1. (x = 0 ^durnypred(z, x) = x)v S (dummypred(z, x)) = x. We then proceed as follows: 2. dummypred(z, x) =predecessor(x) 3. dummypred(z, x) = dummypred(js] (z), x) 4. -x = 0D dummypred(z, x) = [ ( -+ dummypred (D (z), x)] by Rule 11 5. x = 0Ddummypred(z, x) = x from line 1 and A above 6. dummypred(z, x) = [x = 0 +; +- dummypred([s (z), x)] from 4 and 5 by Rule 9

-235 7. -2SJ (z) = x D dunmypred(z, x)= [x = 0 - x; @ + dummypred(S (z), z)] 8. S (z) = x D E (dummypred(z, x)) = x from lines 1 and A above 9. (z) = x durnmypred(z, x)=z from Lines 8 and B 10. dummypred(z, x)= [ S (z) = x + z; x = 0 - x; ~ + dummypred( j (z), x)] Hence we can generate the algorithmic name dummyalg(z, x) S [[](z) = x - z; x = 0 + x; +- dummypred([S (z),x)] via Rule 17. Thus from the ordinary name, dwnmypred we generate the algorithmic name dummyalg. Notice, however, that dummypred and dummyalg name different functions. dummypred names a complete function, and dummyalg does not, the function named by dummyalg being undefined for 0 < x < z, This algorithmic name, then, contains less information than the ordinary name from which it is generated. This is not unusual. Of course an algorithmic name cannot contain more information than the name from which it was generated. An ordinary name for the function named by dummyalg might be given by dummyatg(z, x) = i(y) ((x = 0^ y = x) (z < x A\ (y) =x)) if we had defined < which we have not. Thus it is not immediately obvious how to generate the algorithmic name dummyaZg from an ordinary name like the one indicated here. Might it be the case that there exists an algorithmic name for a partial function which cannot be generated from any ordinary name of that function, but which can be generated from an ordinary name of some more complete function? I do not know the answer to this question. Now define predalg(x) =- dummyalg(0, x). Note that although dummyaZg is not a complete function, predaZg is a complete function.

-236 Of course if[Sis constructed as in Section 2.1.1.9, we have a much simpler way to generate predaig. In Section 2.1.1.9 (x) = * - (x)) holds, wher i ie gn i l and is like gn except that the substitutions of (qu, p) for p, (qu, 1) for 1, and (qu, 2) for 2 have been made. Hence E and n are already generated algorithmic names and we can define predalg(x)- gn (d( E(x))). We can also define x5y = (x); E (y). 4.10.7 Example 7: The p Schema Here we shall sketch the method we use to generate an algorithmic name for a function defined by the p schema of recursive function theory. In our notation, such a definition would be of the form {(x) =: i((y) (f(y, x) ^ V(v) (I(v, x) D v = y)). Again we introduce a dummy z to get the algorithmic name (X, (x), ((label, f, (X(z, x) [n(z, x) -+ z; e-+ f (Sm(z), x)])), 0, x)). We are tempted to define the partial function dummyphi(z, x) = I(y)(z [y^ y (y,x)^ V(v) ((z ( vA H (v, x))D y E v)), and generate an algorithm for it. But we first need H(z,x) D dummyphi(z, x) = dummyphi( E (z),x) which does not necessarily hold when the z and x range outside tne domain of definition of the function. The closest we can come to generating our algorithmic name is 3(w) (z[ ]w ^ n(w, x)) D dunmyphi(z, x) = [ r(z, x) + z;>+ dwmmyphi(Fj(z), x)] If only we could prove A. ^3(w) (z g]w A n(w, x)) D dwnyphi(z,x) = dummyphi( s](z), x) we could prove durmnyphi(z, x) = [n(z, x) + z; 0 + dummyphi( S(z), x)]. Then we could generate dumrnyphialg(z, x) = [ n(z, x) + z; d + dwrnyphiaig([(z), x)], and we could define phiag(x) = dummyphiaZg(0, x) as we wished. But we cannot prove line A. It does not necessarily hold.

-237However, if we artificially complete the partial function dummyphi we can do it. Instead of dummyphi, use dummyphicomplete, defined by dwmmyphicomplete (z, x) =: (y) ((~3(w) (z ]w A H(w, x)) ^ y = 0) v( 3(w) (z5W A I(w, x)) A (z y A^ (y, x) A V(v)((z v A n(v,x)) D v=y)))), in the preceding discussion and then the generation works. Once again, although dummyphialg gives an algorithm for the function dummyphi,we found it most difficult to generate dummyphialg from dummyphi and found it more convenient to generate it from dummyphicompZete which is quite a different function. Notice also: we cannot prove dummyphi(Z, x) = dummyphiaZg(z, x) since we have no information on the value of dummyphi when ~'(w) (z E w A l(w, x)) holds. To prove that two function names name the same function, it is not sufficient to prove that they give the same values over their domains of definition. If the functions are partial, the names may be regarded as incomplete descriptions of complete functions (not necessarily recursive). In that case, the two descriptions must give enough information for us to say that any two complete functions that might be described by the two descriptions must give identical values everywhere, though we might not be able to say what those values are. For example, let us define newdumiyphiaZg(z, x) =: [ (w)(z ]w A II(w, x))- durnyphiaZg (z, x); -+ dummyphi(z, x)] Then we can prove dummyphi(z, x) = newdummyphiatg (z, x) even though both functions are partial, i.e., both descriptions are incomplete. 4.10.8 Example 8: T(x) F(x) We shall give only a very sketchy outline of the proof of T(x) D F(x). We can easily prove Proof(x) D (y E x D T(y)) We now induct on the length of proof as follows: Temporarily define 1(x) =: y e x DF(y).

-238 Then we can prove the following sequence. 1. (n(d(x) ^ Pfstep(x)) D I(x) by induction on expression named by x 2. Axiom(x) D F(x) 3. (f (d(x) ^ axiom (a (x))) D II(x) from 2. 4. ((Proof(d(x) D n(d(x)))AProof(x)) D 1(x) from 1 and 3. 5. Proof(x) D I(x) from 4 by induction theorem of Example 4. 6. T(x) D F(x) from 5. 4.10.9 Example 9: Sketch of Proof of T(x) D T(apZ(x)) We shall here give an outline of how one would prove T(x) D T(apl(x)) We can give no more than the barest outline because the proof is extremely long and tedious. We begin by developing alternative formulations of some rules of inference. The mechanism we have in Rule 3 for handling bound variables is very inefficient. We need a predicate which says that two expressions are identical except for alphabetic differences in bound variables. Such a predicate is = (Table 9). We will need a whole corpus of theorems incorporating this which will allow us to automatically make alphabetic changes of bound variables as needed. We shall not develop these theorems here. By way of example we have the following theorem which is provable from Rule 3 and equivalent to Rule 3. Alternative Rule 3: (T(x) A x Oy) DT(y) The proof is rather tedious. Note also: is an equivalence relation. The form of Rules 4-7 is inconvenient for some purpose because in order that freecheck hold where required we must, in general, do some preliminary juggling of bound variables via Rule 3. We would like a recursive, legal, way to do the juggling. To this end we have defined freefix and freefixlists. in Table 9. Their most important properties are summarized by the theorems

-239 below. (The proofs are rather tedious.) z =freefix (x, y, z) (x = yASnf(x, y, z) = Snf(x, y, w)) D z w freecheck(x, y, freefix(x, y, z)) z = freefixZists(x, y, z) andZista((X(u) freecheck(u, y, freefixlists(x, y, z))), x) So that we can also prove andZista ((X (v) (Zistbinder ( v) v v= (la), y) D andZistZista((X(u, v) freecheck(u, v, freefixZists(x, y, z))), x, y) Write B for qaa(y) pada(y)Q freefixists (ada(y), d(y) adda(y)) ( * d(y) so y= and u = Snf(y, B, u) and v rSnf(y,8,v) hold. Then we can make the following substitutions into the theorem which is Rule 6. for y put 8 for u put Snf(y, 8, u) for v put Snf(y, 8, v) This gives, after some manipulation and replacement by equivalences, (T (Snf(y, S, u)) expression(Snf(y, V, v)) ^aa(y) = A andZistlista(( X(x, z) freecheck(x, z, freefixlists (ada(y), d(y), adda(y)))), ada(y), d(y)) A Snf(8, SsffixZ(ada(y)., d(y), adda(y)), Snf(y, 8, u)) = Snf(8, SsffixZ ( ada(y), d(y), adda(y)), Snf(y,8, v)) A (S ~ Snf(y, P, u) v (andlista ((X (z) z I adda( a)), ada (8)) A andZistlista (( X(x, z).. el z D x A adda(1)), ada(S), d(B))))) D T( Snf(y, S, v))

-240 Using this, we can prove alternative Rule 6: (T(u) A expression(v) A aa(y) =() A Snf(y, Ssffixl (ada(y), d(y), adda(y)), u) = Snf(y, SsffixZ( ada(y), d(y), adda(y)), v) A newvarex(y, u)! Snf(y, newvarex(y, u), u) ) T(v) Likewise we can prove alternative Rule 7: (T(u) A expression(v) A a(y) = a A ad(y) d add(y) A Snf(y, Sffix (ad(y), y, add(y)),u) = Snf(y, Sffix( ad(y), y, add(y)), v) ) D T(v) alternative Rule 4: (T(y) A variabte(x) A Simplexpr(z) A type (x)= exprtype(z)) D T( Sffix(x, z, y)) alternative Rule 5: (T(y) A T(u) A z ~ u A ( Pfv(x) v Ifv(x)) A type(x) = exprtype(z)) T( Sffix(x, z, y) In these formulations, the bound variables take care of themselves automatically. Except for the alternative Rule 6, the alternative rules are as powerful as the original rules. From the alternative formulations of Rules 6 and 7 used as theorems, together with other rules, we can prove ( F(x) D T(x oneapl(x))) A (Tm(x) D T(x oneapl(x))) The proof is by tedious induction on the S-expression named by x. We shall nol

-241 reproduce this proof here. From the above theorem, T(x) D T( oneapZ(x)) follows as a special case. A glance at apt convinces us that it is a non-contradictory function expression. It is even the name of a complete function. It may not be obvious, however, just how this function is to be generated. The following function expression names the same function. aplt (x) =: i(y) ( oneapZ(y) = y A 3(z) (y = a(z) ^ andlist(( X(v) (v =Qxv v =( x va(v)= oneapZ(ad(v)))), z)) (andlist is defined at the end of Section 4.10.5.) However, to prove the necessary recursion theorem on apZ' to allow us to generate apt via Rule 17 is very complicated. It requires a set of list handling functions and a corpus of theorems about them to allow us to prove the required uniqueness of the z in the definition of apl'. Some of these functions (e.g. maplist) and theorems were developed in earlier examples. The methods referred to here have been illustrated on less complicated examples. These functions, theorems, and methods are necessary both to handle proofs and to handle sequences of formulae like the z sequence above. One use of these methods, induction on length of proof, was indicated in our sketch of the proof of T(x) D F(x) in Section 4.10.8. A similar, but more complicated, induction on the length of the z list of the apl' definition gives us T(x) D T( apZ(x))

-242 4.11 Implementation Routines —Effector Algorithms Used in Section 3 4.11.1 Basic Functions and Notation. These algorithms utilize the following basic functions described in Section 3. 1. Moving functions derivations rule down antecedents derivativesusing variable 2. Explicit net-changing functions join cons tructderivation constructparameterderivation 3. Node predicates T-tagged H-tagged 4. Node modifiers T-tag H-tag 5. Search functions bestlist bestproductinis t bestnetrule bestne tparameterrule bes tne tparame tervalue

-243 6. Implicit net-changing functions erasepunish punish 7. Hybrid moving function jumpback 8. Net-changing functional operate 9. Timed LISP functions result requiredantecedents 10. LISP functions pair proj1 proj2 find variablein The notation for the following algorithm definitions is similar to the notation of the "definition" statements in Section 4.3, which, in turn, resembles LISP m-notation. In the following definitions we also employ the LISP PROG notation, together with the operate statements described in Section 3.4.3. In these definitions we also use set theory notation to abbreviate sequences the order of whose members is unimportant. E.g. QU {I, I U y j y I e A H-tagged (y) } names a sequence whose members are: (1) the members of the sequence a; (2) 8 and y; (3) those members of the sequence 6 on which H-tagged holds. The dummy

-244 variables in the following algorithm definitions will not be restricted to the variable symbols of Section 2, but may be any Greek or Latin letter. The reader must be careful to not confuse, for example, the bug value i with the dummy variable (or bug) i which ranges over bug values. It will be clear from context which is meant. 4.11.2 Routines Which Return a Bug Value refineproof(E, k) = prog(( A ); [ T-tagged(S) + return(i) ];:- derivations() return(refineproff(E, A, k)) ) refineproff(, A, k) = prog((6, rr, p, q, r ) 6 = bestlist(A, k ) [6 = 0 - return()] T:= ruZe(6) p = refineproof(i, + k) = operate(p) [ T-tagged(p) v p = )3 ~, ( I T (x return(refinederiv ((6), k)) ]; C = expandheuristic (6), p, k) [T-tagged(C) - return( )] = operate ( punish(C, down(6))) return( refineproff( ((), (A) - { 6, + k)) ) refinederiv ( 6,k) =: refinederivv (6, antecedents (), k)

-245 refinederivv(6, T, k) =? prog((c, ) ) [T = 0 -- return(T-tag(down(6))) ]:= refineproof(a(T), k) [ T- tagged () - return(down(6)) ]; 4 = operate(G) return(refinederiv ()(6), f)(d(T)), k)) ) expandheuristic(6, p, k) =, expandheuristicc(6, derivativesusing( p)- t6, p, k) expandheuristicc(6, r, p, k) =, prog((y, C, ) ) y = bestlist( r, k) [y = 0 + return(down ()) ] C:= refinebyexampZe( 6, y, p, k) [ T-tagged(c) -+ return() ];: = operate(punish (, Y )) return(expandheuristicc ( (6), )( r- y} ), k(p), k)) ) In the definition of refinebyexcmple below, t and 0 will stand for functions which are described by the following paragraphs. If 6, y, and e are bug values with w the same as t, with 6 and r derivation nodes, and with = down(9) holding, then t(9, ) names a bug value n with these two properties: first,n]jmay be formed from g A \ A by erasure of nodes 6 and y and removal of all H-tags from E; second, n is that node referred to above from which the H-tags were removed. (Remember, we made ad hoc provision for saving all of a net if an erasure splits it in two.)

-246 Suppose now that Mmay be formed from 1nby addition of certain nodes and change of certain flags. Then we can imagine a net E which can be formed from W by adding back those two nodes that were erased from [ to make E Can we unambiguously state where these two nodes should be added back, or are several such E's possible? In most cases there is only one i possible which gives the added nodes the same position relative to other nodes that they had in ]. When more than one is possible one can rely on the fact that all these nets are part of the same patched net to determine the unique "natural", the Rwhich has the largest number of its nodes "equivalent" to nodes in m (where we call a node in E "equivalent" to a node in 5 if the two nodes are the same node of the patched net). a and i are to be the same nodes of the patched net. 0 is defined for bug values described as above, so that 0 (t, n, a) names refinebyexample(5, y, p, k) =:prog( (E, x, (,, ); a= down(6) x = down(y):= operate( D( 6, y)) = {- xg Q,, (Qad paraoneter(y) ) ad(parameter(6)) ) U pair (antecedents (y), antecedents ()) G;= prove(~ (f), t(x), (), ((p), k ) [~H-tagged (a ()) - return (S) ] 0 = operate( e0(, n, C)) return(refineproof( 0( a(G)), + k) )

-247 4.11.3 Routines Which Return a Bug Value Paired With a Sequence of Pairs prove(E, x, Q, p, k) =: prog((m, 4, C, L ); [ H-tagged ( ) - return(E * k) ] [ x = 0 + go(A) ] m:= y y | y ^ ad(y) }; [ x ~ proj (m) +,:= operate(join(E, ad(find(x, m)))) ]; [ x e proji(m) A E = ad(find(x, m)) + return( prove ((,( ), (x), (9) P(p), k) ] c:= checkprove(S,x,Z,p,k) [ H-tagged(a(f)) - return(i) ] A L'= {y I y e ^ ad(y) = } %:= checklistprove(E, L, Q, p, k) [ H-tagged (a ()) + return () ] return(randomprove(,, p, k)) ) checkprove(l, x, Q, p, k) =, checkprovv(,derivations(x),., p, k) checkprovvu(, A,Q, p, k) =, prog(( 6,, ~ ) 6,= bestZist( A, k) [ 6 = - return (~ * Q) ]::= parametergenerate(g, 6, a, p, k) [ H-tagged(a(c)) + return (): = operate(punish( j, jumpback (, 6 ))) return(checkprovv?( (,), ~( d( A )), i(t), P(p), k)) )

-248 checkZistprove (, L, Q, p, k) =: prog(( y, ), S,C ) y = bestproductinlist(L, k) [y = 0 + return( *Q)], = operate (join (, ad (y)) ) C:= checkprove(C(F), 4(a(y)), (Q), (p), k) [ H-tagged(a(C)) - return(r) ] i= operate( punish(4, jumpback(C,.a(y)))) return( checklistprove( (4), 1 (d(L)), P(Z), P(Z) + k)) In the following definition it isimportant that = bestnetrule(,) never hold.bestnetrule can be modified so it never does. randomprove(,, a, k). = prog((P, C, $ ) p = bestnetrue (a, k) [p = 0+ return (C * Q) ]:"= overaZZcheck(p, E, Q, a, k) [ H-tagged(a(C)) + return () ]:i:= operate (punish (, jumpback (S,p))) return (randomprove ((0), i() ), 4(a), + k)) ) parametergenerate(E, 6, Z, a, k) = prog((p,v, x,,c ) p = rule (6) v = parameter(6) X = antecedents (C) [ ad(v) e projl(l) + go(A) ] w Q ( a(v), ad(find(ad(v), )); [ resuZt(p, w, k) T + go(A) ] C:= parametercheckgen(p, L, E, Q, k, 6, v, X, a) [ H-tagged( a(z)) + return (C) ] A return(parametertreegenerate (,p,v,derivations (v),,k, 6,v,xo)) )

-249 Note: In the above definition we must assume E is not. parametertreegenerate ( i, p, v, Tr,, k, 6, v, X, o) = prog ((~, R, y, n, r, ~);:= bestlist(r, k) [ c = 0 + return( overaZZcheck(p, E, Q, a, k)) ] R = ruZe(c) y.= variable() [ - ad(y) e proJl () -+ go(A) ] n *= (a(y) ad( find( ad(y), ))) [ -result (, result (R, n )) = -+ go(A) ]:= etacheck(p, R, n, Q,, k, E, y, 6, v, X, ), [ H-tagged( a(c) + return(i) ] A = operate (punish(v, jumpback(v, ))) return (parametertreegenerate ( (E), (p), P(v), (7r) -'(s)),i.(Q),+ k, i(6), P(v), 4(x), O(a))) ) parametercheckgen( p, w, E,, k, 6, v, X, a) = prog(( y, 4 ) Y = constructderivation(p,w, requiredantecedents(p, w,k), ); 4:= operate(y) return( completederiv(X, c(6), (t U pair (x,antecedents(Y)), 6 y, I ad(v) ad(w) t ),'(a)c, + k )) ) compZetederiv( Y, 6, Q, a, k) = completederivv(y,antecedents (y),antecedents (6), Q, a, k)

-250 completederivv( y, rr, T, Q, a, k) = prog((?, $ [T =0 + return(H-tag(down(Y))) ] = prove( a(r), a(T),k,a, k) [ H-tagged( a(5)) + return(down(Y)) ]: = operate( a(c)) return(compZetederivv(4(Y), 4( d(r)), 4(d(T)), overalZcheck( p, E, Q, a, k) =: prog((R, 0, 4, R,= bestnetparameterrule(p, k) [R = 0 + return(S * Z) ] 0 = operate(R) R = Rheck((p), R, 9e(), 0(Q), k, 0, 0, a) [ H-tagged( a(r)) - return(I) ] = operate( erasepunish(p, jumpback(p, R))) return(overallcheck(t(p), 4(0), 4(Z), 4(a), ); d(c), K(a), k)) ); ) +k)) ) Rcheck( p, R, E, Q, k, 6, v, X, a) =: prog((r, n, ); n:= variablein(R) bestnetparametervalue(proj2 (9) U o{a, k) j [ ad(n) = 0 + return(S *Q) ]:.= etacheck(p, R, n, n, Q, k, 0, 0, 6, v, x, a ) [ H-tagged(a()) -+ return () ]:= operate( erasepunish(~, jumpback(E, n))) return( Rcheck(((p), q(R), q(E), q(Q), + k, q(6), q(v), (X), b(oa))) )

-251 etacheck(p, R, n, $, a, k, c, y, 6, v, X, o) =: prog((m, y, P, C ) m'= ~ y = constructparameterderivation(R, n, Jad(n) 0, resuZt(R,, n, k)) [ E: 0 - m:= UQ U E Y t ] [C 0A y 0 - m: U ^t Y ], ad(y)? ad(n)) t ] ~ = operate () = parametercheckgen(4(p), a (v) down(Y) f, q((), c(m), k, (6), (v), (X, x (a) ) [H-tagged( a(C)) -+ return (C) return(E * Z) ] )

REFERENCES Amarel, Saul, "On the Automatic Formation of a Computer Program which Represents a Theory" in Self-Organizing Systems, edited by Yovits, Marshal C., et al., Washington, D.C.: Spartan Books, 1962. Church, Alonzo, Introduction to Mathematical Logic, Princeton, New Jersey: Princeton University Press, 1956. Friedberg, R. M., "A Learning Machine: Part 1," I.B.M. Journal of Research and Development 2, 1, 2-13 (1958). Holland, John H., "Adaptive Plans Optimal for Payoff-Only Environments". To appear in Proceedings of the Second Hawaii International Conference on System Science (1969). Holland, John H., "A Logical Theory of Adaptive Systems Informally Described," University of Michigan Engineering Summer Conferences (1961), pp. 1-51. McCarthy, John, Paul W. Abrams, Daniel J. Edwards, Timothy P. Hart, and Michael I. Levin, LISP 1.5 Programmers Manual, Cambridge, Massachusetts: Massachusetts Institute of Technology, Computation Center and Research Laboratory of Electronics, 1962. Mendelsohn, Elliott, Introduction to Mathematical Logic, Princeton, New Jersey: Van Nostrand, 1964. Newell, A., J. C. Shaw, and H. A. Simon, "Report on a General Problem Solving Program," University of Michigan Engineering Summer Conferences (1961), pp. 1-27. Polya, Gyorgy, How to Solve It, Princeton, New Jersey: Princeton University Press, 1945. -252

UNCLASSIFIED I Security Classification DOCUMENT CONTROL DATA R&D (Security classification of title, body of abstract and Indexing annotation must be entered when the overall report ie classiied) I 1. ORIGINATIN G ACTIVITY (Corporate author) 2a. REPORT SECURITY C LASSIFICATION LOGIC OF COMPUTERS GROUP Unclassified 611 Church Street 2b GROUP The University of Michigan, Ann Arbor, Mich. 3. REPORT TITLE A Self-Describing Axiomatic System as a Suggested Basis for a Class of Adaptive Theorem Proving Machines 4. DESCRIPTIVE NOTES (Type of report and Inclusive dates) Technical Report S. AUTHOR(S) (Last name, firt name, initial) Westerdale, Thomas H. 6- REPO RT DATE- ^ TOTAL NO. OF PACES 7. NO. OF RES March, 1969266 9 8a. CONTRACT OR GRANT NO. 9a. ORIOINATOR'S REPORT NUMBER(S) DA-31-124-ARO-D-483 b. PROJECT NO. 08226-7-T c. 1. OTHEM R RPORT NO(S) (Any other numbere that may be assin~ed tils rport) d. 10. A VA IL ABILITY/LIMITAtiON NOTICES Distribution of This Document is Unlimited. 11. SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY U. S. Army Research Office (Durham) Durham, North Carolina I i~~~~~~~~~~~~~............. 13 ABSTRACT An explicitly self-describing axiomatic system is presented whose set of rules of inference continally increases in size as new theorems are proved. A proof of consistency relative to formal arithmetic is outlined. Modified LISP programs are the function constants of the system. A class of possible adaptive theorem proving machines is outlined. Such machines construct proofs by successively refining proof "outlines" which employ heuristics. New heuristics are generated by the same mechanism used to generate rules of inference and theorems. In the notation of the axiomatic system, a heuristic or a rule of inference is itself a well formed formula. F I M DD.1 JAN 6 1473 UNCLASSIFIED - - Security Classification

UNCLASSIFIED L Security Classification 14. LINK A LINK B LINK C KEY WORDS. ROLE WT ROLE WT ROLE WT............,,I,,,, INSTRUCTIONS 1. ORIGINATING ACTIVITY: Enter the name and address of the contractor, subcontractor, grantee, Department of Defense activity or other organization (corporate author) issuing the report. 2a. REPORT SECURITY CLASSIFICATION: Enter the overall security classification of the report. Indicate whether "Restricted Data" is included. Marking is to be in accordance with appropriate security regulations. 2b. GROUP: Automatic downgrading is specified in DoD Directive 5200.10 and Armed Forces Industrial Manual. Enter the group number. Also, when applicable, show that optional markings have been used for Group 3 and Group 4 as authorized. 3. REPORT TITLE: Enter the complete report title in all capital letters, Titles in all cases should be unclassified. If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis immediately following the title. 4. DESCRIPTIVE NOTES: If appropriate, enter the type of report, e.g., interim, progress, summary, annual, or final. Give the inclusive dates when a specific reporting period is covered. 5. AUTHOR(S): Enter the name(s) of author(s) as shown on or in the report. Enter last name, first name, middle initial. If:rlitary, show rank and branch of service. The name of the principal author is an absolute minimum requirement. 6. REPORT DATE. Enter the date of the report as day, month, year, or month, year. If more than one date appears on the report, use date of publication. 7a. TOTAL NUMBER OF PAGES: The total page count should follow normal pagination procedures, i.e., enter the number of pages containing information. 7b. NUMBER OF REFERENCES: Enter the total number of references cited in the report. 8a. CONTRACT OR GRANT NUMBER: If appropriate, enter the applicable number of the contract or grant under which the report was written. 8b, 8c, & 8d. PROJECT NUMBER: Enter the appropriate military department identification, such as project number, subproject number, system numbers, task number, etc. 9a. ORIGINATOR'S REPORT NUMBER(S): Enter the official report number by which the document will be identified and controlled by the originating activity. This number must be unique to this report. 9b. OTHER REPORT NUMBER(S): If the report has been assigned any other repcrt numbers (either by the originator or by the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATION NOTICES: Enter any limitations on further dissemination of the report. other than those II imposed by security classification, using standard statements such as: (1) "Qualified requesters may obtain copies of this report from DDC." (2) "Foreign announcement and dissemination of this report by DDC is not authorized." (3) "U. S. Government agencies may obtain copies of this report directly from DDC. Other qualified DDC users shall request through —.... (4) "U. S. military agencies may obtain copies of this report directly from DDC Other qualified users shall request through,. (5) "All distribution of this report is controlled. Qualified DDC users shall request through If the report has been furnished to the Office of Technical Services, Department of Commerce, for sale to the public, indicate this fact and enter the price, if known. 11. SUPPLEMENTARY NOTES: Use for additional explanatory notes. 12. SPONSORING MILITARY ACTIVITY: Enter the name of the departmental project office or laboratory sponsoring (paying for) the research and development. Include address. 13. ABSTRACT: Enter an abstract giving a brief and factual summary of the document indicative of the report, even though it may also appear elsewhere in the body of the technical report. If additional space is required, a continuation sheet shall be attached. It is highly desirable that the abstract of classified reports be unclassified. Each paragraph of the abstract shall end with an indication of the military security classification of the information in the paragraph, represented as (TS). (S), (C), or (U). There is no limitation on the length of the abstract. However, the suggested length is from 150 to 225 words. 14. KEY WORDS: Key words are technically meaningful terms or short phrases that characterize a report and may be used as index entries for cataloging the report. Key words must be selected-so that no security classification is required. Identifiers, such as equipment model designation, trade name, military project code name, geographic location, may be used as key words but will be followed by an indication of technical context. The assignment of links, rules, and weights is optional. UNCLASSIFIED Security Classification