THE UNIVERSITY OF MICHIGAN SYSTEMS ENGINEERING LABORATORY Department of Electrical Engineering College of Engineering SEL Technical Report No. 54 A RELATIONAL MODEL OF DATA FOR THE DETERMINATION OF OPTIMUM COMPUTER STORAGE STRUCTURES by L. Scott Randall under the direction of Professor Keki B. Irani September 1971 Under contract with Rome Air Development Center Research and Technology Division Griffiss Air Force Base, New York Contract No. F30602-69-C-0214

ACKNOWLEDGEMENTS The research which has culminated in this dissertation was made possible only through the efforts of many individuals. I am particularly indebted to the members of my doctoral committee, especially Professor Keki B. Irani who served as chairman and who devoted many hours of his time to maintaining a constant involvement in the development of this thesis. I am most grateful to those who provided me with the financial assistance necessary to complete this work. Deserving of special mention are the Advanced Research Projects Agency which provided early support under contractDA-49-083 OSA-3050 and the Rome Air Development Center which provided some support in the latter stages of the work under contract F30602-69-C-0214. I also wish to thank Ms. Kass Barker for her efforts in typing the final manuscript and Miss Karen Hasse for her excellent proof reading. Finally, for her continued encouragement, understanding, and sacrifices I dedicate the efforts represented by this dissertation to Susan, my wife. L. Scott Randall Ann Arbor, Michigan August, 1971

TO SUSAN iii

TABLE OF CONTENTS Page LIST OF TABLES vii LIST OF FIGURES ix LIST OF SYMBOLS xii Chapter I INTRODUCTION 1 1. 1 A Brief History of Computer Data Repre sentations 4 1. 2 Definitions and Philosophy 15 1. 3 Research Objectives 23 II A MATHEMATICAL MODEL OF DATA STRUCTURE 25 2. 1 Relations 26 2. 2 A Relational Model of Data Structure 29 2. 2. 1 Schematic Representation 40 2. 2. 2 Implications of the Model 44 2. 2. 3 Uniqueness Considerations 45 2.3 Use of the Model 49 III A MODEL OF STORAGE STRUCTURE 54 3. 1 Initial Storage Structure Model 56 3. 2 Transformations 59 3. 3 Additional Model Characteristics 66 3. 4 Decision Variables 76 3. 5 Quantification of Model 85 3.6 Flexibility of Model 98 iv

TABLE OF CONTENTS (Continued) Page IV ANALYSIS OF THE STORAGE STRUCTURE MODEL 122 4. 1 Measures of Performance 123 4. 2 Time Cost Function 129 4. 2. 1 Primitive Operations 132 4. 2. 2 Elementary Time Costs 138 4. 2. 3 Sub-Procedures and their Time Costs 157 4. 2. 4 Primitive Operation Time Costs 185 4. 3 Storage Cost Function 203 V A PROCEDURE FOR THE DETERMINATION OF A MINIMUM COST STORAGE STRUCTURE 207 5. 1 Reducing the Number of Feasible Solutions 208 5. 2 Optimization Procedure 216 VI APPLICATION 218 6. 1 The Problem: A System for Medical Diagnosis 219 6. 1.1 General Description 219 6. 1. 2 A Particular Diagnostic System 221 6. 1. 3 The Data and its Data Structure 230 6. 1.4 The Operations 253 6. 1. 5 The Environment 258 6.1.6 The Solution 264 6. 1. 7 Justification of the Solution 274 6. 1. 8 Comparison with an Existing Storage Structure 278 6. 2 Near Optimum Solutions 302 6. 3 Sensitivity Analysis 314 6.3. 1 Variations in kl... 315 6. 3. 2 Variation in the Frequencies of Operations 334 6. 3. 3 Solution for the Primary Operation 336 6.4 Summary 337

TABLE OF CONTENTS (Continued) Page VII CONCLUSION 343 Appendix A: METHODS OF IMPLEMENTATION FOR THE PRIMITIVE OPERATIONS 352 Appendix B: TABULATION OF e. FOR i e 1, 2,..., 10} 360 Appendix C: SUMMARY OF z.(t) AND z*(t) FOR iE {1,2,...,10} 1 376 Appendix D: SUMMARY OF TIME COSTS FOR PRIMITIVE OPERATIONS 388 Appendix E: SOLUTION LISTING FOR CASE 1 OF THE MEDICAL DIAGNOSIS PROBLEM 394 Appendix F: SOLUTION LISTING FOR CASE 2 OF THE MEDICAL DIAGNOSIS PROBLEM 409 Appendix G: SOLUTION LISTING FOR CASE 3 OF THE MEDICAL DIAGNOSIS PROBLEM 422 Appendix H: SOLUTION LISTING FOR PRIMITIVE OPERATION Q7 IN THE MEDICAL 435 DIAGNOSIS PROBLEM BIBLIOGRAPHY 462 vi

LIST OF TABLES Table Page 3-1 Definition of Elements of Matrix K 90 3-2 Values of Elements of Matrix K 91 3-3 Numbers of Blocks, by Type, in the SSM 94 4-1 Primitive Operations 134 4- 2 Summary of Methods of Implementation for Primitive Operations 139 4-3 Summary of Sub-Procedure Descriptions 160 4-4 Summary of Probabilities 196 6-1 Symptoms for Congenital Heart Disease 235 6-2 Heart Disease Types 237 6-3 Symptom-Disease Probability Matrix 239 6-4 Tests for Heart Disease Diagnosis 246 6-5 Statistics for Heart Disease Problem 248 6-6 Averages for Heart Disease Problem 249 6-7 Summary of Parameter Values for Heart Disease Problem 251 6-8 Transformed Parameters for Heart Disease Problem 252 6-9 Case 1 Optimal Solutions 266 6-10 Operation Time Costs for the SLIP Structure versus the Optimal Structure 287 6-11 Basic Operation Time Costs for the SLIP Structure versus the Optimal Structure 293 vii

Table Page 6-12 Specifications for the Basic SLIP Structure 297 6-13 Near, Optimal Solutions for Case 1 305 6-14 Alternative Values for Parameters of Heart Disease Problem 316 6-15 Optimal and Near Optimal Solutions for Case 2 318 O O 6-16 Characteristics of Relations r.. r20 338 6-17 Integer Approximation of Statistics in Table 6-16 339 B-1 Elementary Time Cost e2 361 B-2 Elementary Time Cost e4 364 B-3 Elementary Time Cost e6 365 B-4 Elementary Time Cost e8 366 B-5 Elementary Time Cost e10 367 B-6 Elementary Time Cost e9 368 B-7 Elementary Time Cost e7 371 B-8 Elementary Time Cost e5 373 B-9 Elementary Time Cost e3 374 B-10 Elementary Time Cost e1 375 B-10 lemetary ime ost Vll37

LIST OF FIGURES Figure Page 1-1 Newell, Shaw, and Simon List Structure 6 1-2 A Threaded List 10 2-1 Data Structure Model 41 3-1 Initial Storage Structure Model 58 3-2 A Portion of the Initial Storage Structure Model 61 3-3 Stacking Transformation 63 3-4 Duplication Transformation 64 3-5 Secondary Effects of Duplication Transformation 65 3-6 Simultaneous Duplication 67 3-7 Elimination Transformation 68 3-8 Combination of Stacking and Duplication 69 3-9 Substitute Structure for b6-blocks 72 3-10 Storage Structure Model 74 3-11 Storage Structure Model for all ki=l 101 3-1_2 Transformed Storage Structure Model - 1 102 3-123 Transformed Storage Structure Model - 2 104 3-14 Transformed Storage Structure Model - 3 106 3-15 Software-Simulated Associative Memory Cell 108 3-16 Software-Simulated Associative Memory Structure 110 3-17 Set-Theoretic Data Structure (Childs) 114 3-18 Transformed Storage Structure Model - 4 119 ix

Figure Page 4-1 Minimum Fractional Reduction of CPU Time Necessary to Offset an Increase of 100 Pages in Program Size for Cost Function C=k[T(1+0. 0 1P) ] 126 4-2 Possible Cost Function Behavior 128 4-3 Example of Stacking-Duplication-Stacking 146 6-1 Section of a Decision Tree 225 6-2 Schematic of Solution 1 for Case 1 270 6-3 Optimal Storage Structure for Case 1 273 6-4 Schematic of Solution 2 for Case 1 275 6-5 State List 280 6-6 Attribute List 281 6-7 Test List 282 6-8 Basic SLIP Structure 295 6-9 Time Costs by ~-family for Case 1 304 6-10 Best Solution for ~-families 19 and 55 for Case 1 310 6-11 Best Solution for p-family 4 for Case 1 311 6-12 Best Solution for ~-family 37 for Case 1 312 6-13 Best Solution for 0-family 46 for Case 1 313 6-14 Time Costs by ~-family for Case 2 317 6-15 Best Solution for 0-families 10 and 28 for Case 2 (Optimal Solution) 328 6-16 Best Solution for 4-families 19 and 55 for Case 2 329

Figure PAIe 6-17 Best Solution for — family 4 for Case 2 330 6-18 Best Solution for 0-family 37 for Case 2 331 6-19 Best Solution for q-family 1 for Case 2 332 6-20 Best Solution for 0-family 46 for Case 2 333 6-21 Time Costs by ~-family for Case 3 335 6-22 Time Costs by ~-family for Operation Q7 340 6-23 Optimal Solution for Primitive Operation Q7 341 xi

LIST OF SYMBOLS This list contains in roughly chronological order the symbols which are used with some frequency in this dissertation. Symbol Significance A~ The set of data items P The set of relations k ~ Cardinality of P r A The set of data items used as sources ka Cardinality of A a 11 The set of data items used as targets k~ Cardinality of 11 p d. An element of A Pkr. An element of P di rj Pk A relation instance a2 The set of all relation instances Pi The set of relations for which di e A is a source i 2 A target set ze The set of source/relation symbol pairs for which II2 is the target set n A set in a partition of H m II3 The set of all H2 of which H is a subset m m xii

Synmbol Significance r The set of all relation symbol/target set pairs A2i A set of sources sharing common relation m symbol/target set pairs r The set of relation symbol/target set pairs m 2 shared by the elements of A A set in a partition of A A3 The set of all A2 of which A is a subset Af~~~ m k.i A decision variable which indicates whether or not the ai-ring of the SSM is an explicit ring of (forward] pointers A. A decision variable which indicates whether stacking or duplication is applied to the ai-ring of the SSM 0! A decision variable which indicates whether or not the ai-ring of the SSM contains head pointers Ci' A decision variable which indicates whether or not the a.-ring of the SSM contains reverse pointers A decision variable which indicates whether or not the bi-blocks of the SSM are eliminated T. A decision variable which indicates whether or not the bi-blocks of the SSM contain type code fields al' a2 Decision variables which indicate whether or not the bl-blocks and the bll-blocks of the SSM, respectively, contain description block indicators P1, P2, P3 Decision variables which indicate whether or not the b5-blocks, b6-blocks, and b7-blocks of the SSM, respectively, contain relation symbol name fields * * *

Symbol Significance k ~ The average cardinality of the a.-rings of the SSM before transformation k. The average cardinality of the a._rings of the SSM 1 1 after transformation K A matrix of products of various k. K.. An element of K 11 ma The number of copies of each b1-block generated by transformation of the SSM m The number of copies of each bll-block generated by transformation of the SSM m The number of copies of a given b5-block associated r1 with the (copies of the) bl-block representing a given source m The number of copies of a given b7-block associated ~2 with the (copies of the) b1l-block representing a given target mi The (average) total number of bi-blocks in the SSM mr The expected number of times a given relation p symbol is associated with a given target m The expected number of b6-blocks which represent the same relation symbol Qi A primitive operation T The time cost function ai The relative frequency of primitive operation Qi t.. The time cost of primitive operation Qi using 13 method j t. The minimum of t.. over all j 1 1] fi The time required to follow a forward pointer in an at-ring xiv

Sy bol Signif icance Si The time required to step from one block to another in a stack for an a.-ring hi The time required to follow a head pointer in an ai-ring s The time required to step from one field to another in a block e. The time cost required to move from one element block of an ai-ring to another element block of that ring 6i A binary-valued variable which indicates whether movement is toward or away from the head of an a.-ring when the element blocks are stacked upon the head S1(j, i1) A general time cost form S2(j, il. i2) A general time cost form Si3(j, i. i2, i3) A general time cost form S4(j, il, i2, i3, i4) A general time cost form fe The value of ei when 6i = 0 1 eo The time cost required to move to or from the head of an ai-ring from the element block nearest the head e.* The time cost required to move from an arbitrary element block of an a.-ring to the head A k The value of (k+l)/2,i' ot Sub-procedures zi(t), z*(t) The time costs of ai and a), respectively fa f The times required to follow a description block ~P ~ indicator from a bl-block and from a bll-block, respectively XV

Symbol Significance F a, Fr F The times required to follow a pointer in source, relation, and target rings, respectively cd, cr The times required to compare with a given quantity a data item description and a relation symbol name, respectively vd vr The times required to fetch a data item description and a relation symbol name, respectively Ca, Va The times required to compare and to fetch, respectively, the data item description associated with a given b1-block CPI VP The times required to compare and to fetch, respectively, the data item description associated with a given b11- block Cr, Vr The times required to compare and to fetch, respectively, the relation symbol name associated with a given b6-block C,Vr The times required to compare and to fetch, r1' rl respectively, the relation symbol name associated with a given b5-block C C V The times required to compare and to fetch, r2' r2 respectively, the relation symbol name associated with a given b7-block Ai An inclusion variable z0 z0"* Expressions indicating the number of multiplies of so required by zi(t) and zi(t), respectively Ta, Tr, T The times required to find the heads of source, a' r' p relation, and target rings, respectively x. A probability 6 A binary-valued variable which indicates whether or not m = 1 r p mz The product mr m P P xvi

Symbol Significance ui The storage required for each bi-block of the SSM Ud The (average) storage required for a description block ua, Ur'u p The storage required for the source, relation, and target ring heads, respectively S The storage cost function sf, sh. The storage required for forward and head pointers 1 1 in an ai-ring, respectively Sd The storage required for a description block indicator sp The storage required for source, relation, and target ring pointers Sr The storage required for a relation symbol name field St The storage required for a type code field S The upper bound for storage available to contain the SSM xvii

ABSTRACT A RELATIONAL MODEL OF DATA FOR THE DETERMINATION OF OPTIMUM COMPUTER STORAGE STRUCTURES by Lo Scott Randall Chairman: Keki Bo Irani In solving any problem on a digital computer one must store within the computer memory certain pieces of information, or data, upon which the program implementing the solution process makes its decisions or performs its calculations. For virtually all problems the manner in which the relationships of accessibility among the various items are portrayed by the organization of these items in the computer memory can have a marked effect upon the efficiency of the solution process. The objective of the research reported here is the development of a rigorous quantitative method for the automatic design of optimal computer memory representations for data, To this end, we define a relational model of data structure within which we can specify the logical ordering or structure of the data involved in the solution of a given problem. We then develop a decision model for the specification of the storage structures (ie., the computer memory representations) which can represent an arbitrary data structure. xviii

We define two basic measures of performance - a time cost function and a storage cost function - for use in comparing the relative merits of a collection of storage structures. The time cost function reflects the number of time units required to perform certain of a set of primitive operations using a particular storage structure, and the storage cost function reflects the total number of storage units occupied by the storage structure. Finally, we present a procedure which determines of the storage structures which can feasibly represent a given data structure the storage structure for which the time and storage costs satisfy certain optimality conditions. A storage structure is considered to be optimal if it minimizes the time cost function, subject to an upper limit on its storage cost. To demonstrate the feasibility and the effectiveness of the techniques we develop, we apply them to a problem for which a solution program already exists and for which fairly extensive data about system performance are available. Our results demonstrate conclusively that significant improvement in program efficiency can be obtained by applying these techniques, as opposed to the historically intuitive and qualitative methods used for storage structure design. xix

Chapter I INTRODUCTION The major thrust of the research reported here is the development of a procedure for the automatic design of optimal schemes for the representation of data within a computer memory. In solving any problem on a digital computer one must store in the computer memory certain pieces of information upon which the program implementing the solution process makes its decisions or performs its calculations. For most problems involving more than a few items of information the manner in which the relationships of accessibility among the various items are portrayed by the organization of these items in the computer memory can have a very marked effect upon the efficiency of the solution process. (As an extreme, imagine an hypothetical organization for which the items most frequently used are least accessible, versus an organization for which these items are most accessible.) Clearly, we should like to organize the items of information in the computer memory in such a way as to minimize the overall cost of accessing them. On the other hand, we may also wish to minimize orto limit the amount of memory devoted to the representation of these items. Unfortunately, these two goals are often in conflict. We may be able to minimize the cost of access at the expense of the memory required to represent the items of information and vice versa, but seldom can

2 we easily do both. This problem is further complicated by a lack of rigorous, objective techniques for the evaluation and comparison of alternative organizational schemes, let alone any such techniques for their design. For the most part the design of a scheme for the computer representation of data (i. e., the items of information) associated with the solution of a given problem is an art fraught with the subjectivity and the personal prejudices of the practitioner.. The designer calls upon his experiences with previous systems, combines this information with a good portion of intuition and Voilat' comes up with the perfect solution' Indeed, this technique may produce a good solution to the problem at hand. In fact, if the designer is alert, the solution is probably better than the other alternatives which he may have considered. He still has no assurance, however, that there is no better solution to the problem. If the system in which his organizational scheme is to be imbedded will see only limited use, the designer may not care. Otherwise, he may console himself with the fact that he has done the best he can. Perhaps we have overstated our case. The fact remains, however, that the designers of data representation systems have, at best, primitive tools with which to work,

3 Our objective in this research has been to develop a rigorous framework within which we can discuss the logical ordering or structure (or accessibility) of the items of information associated with a given problem, certain measures of performance with which we can compare objectively the relative merits of alternative computer representations for these items, and finally a procedure for choosing from the Possible alternative representations that representation which best satisfies certain given optimality conditions. The various sections of this chapter will be devoted to presenting some pertinent historical background, defining terms, and describing in a qualitative manner our approach to the problem.

1.o 1 A Brief History of Computer Data Representation Due to the fact that computer memories have been organized in a linear, sequential manner since the earliest days of the stored-program computer, it is not surprising to find that all manner of data has been shaped and bent, as it were, to fit within vectors and rectangular arrays kept in consecutive memory locations. On the other hand, neither is it surprising to find those who would rebel against the structure imposed upon their data by the inherently sequential nature of computer memories. The first really significant break with the vector of consecutive locations was made in 1956 when Newell, Shaw, and Simon [59] introduced the concept of linked allocation and the pointer in a memory organization scheme imbedded in a list processing language (IPL) designed for heuristic problem-solving. For their purposes Newell, Shaw, and Simon defined a list to consist of an ordered set of items of information any item of which could be another list or an element, where an element was some basic unit of infor:nation constrained to fit within a single word (i. e., location) of memory. A list was implemented by using a set of location words, each of which was subdivided into three fields, two fields containing addresses and one field containing a type code. One address located the item

corresponding to the given location word of the list and the other address located the next location word of the list. The type code in each location word simply indicated whether the item associated with that location word was a list (0) or an element (1). Thus, if we were to define two lists A and B as A = (a, B, e) B = (b,c,d) where a,b, c, d, and e are elements, then the Newell, Shaw, and Simon list structure representing these two lists would appear as in Figure 1-1, where an arrow - called a link or a pointer - originating from a given block (representing a location word) represents the address of the block, or word, to which it points. The real innovation of this data organization lies in the fact that the addresses of the various items in a list need bear no particular relation to one another; that is, the addresses need not be consecutive. This has several implications. First, an item can be deleted from a list simply by deleting its location word. Inserting an item into a list is also facilitated. Secondly, location words on different lists (or the same one, for that matter) may contain the address of the same item of information. Finally, two or more lists may share the same region

A: I I I I P- I I 1 _ e B I, c I I Bigre1-. ewllShward imo -F —Aucr I[ib ] -J Nl d Figure 1-1. Newell, Shaw, and Simon List Structure

of memory with no list overflowing until all memory has been exhausted. The most obvious disadvantages of the scherme are that the location words require extra memory space, with the result that approximately half of the storage used is taken up by location words, and that the ability to computer the address of the next item on a list or to determine the address of the previous item on the list is lost. The work performed by Newell, Shaw, and Simon inspired many others to use the linked memory scheme (which at the time was often called NSS memory), and these techniques gradually evolved as basic programming tools. The first article dealing specifically with the application of linked data organizations to garden-variety problems was published by Carr [ 8 ] in 1959. In this article Carr indicated that linked lists can readily be manipulated in ordinary programming languages without the facade or restrictions of sophisticated list languages. Despite this observation, however, list processing languages continued to evolve, each with its own specific memory organization scheme. At first one-word nodes or elements were used for the linked

memory schemes, but about 1959 the usefulness of several consecutive words per node and lists structured via multiple links was being discovered. The first article dealing specifically with this idea was published by Ross [65] in 1961, and a second article dealing with multiword list items was published by Comfort [14] in 1964. The growth in popularity of linked lists gradually produced such refinements as the circular list, or ring, in which the last item of the list contains a pointer to the first item, and the doubly linked list, in which each item of the list contains a pointer to the previous item as well as the succeeding item. The origins of these concepts cannot be attributed to specific individuals, probably because these ideas occurred naturally to many people. On the other hand, one of the main factors which lead to the widespread use of these techniques was the introduction of certain list-processing languages and systems which utilized them, in particular Weizenbaum's Symmetric List Processor (SLIP) [78,79] Another novel concept, first introduced in 1960 by Perlis and Thornton, was that of the threaded list. In this scheme each item of a list contained three fields, one of which was a code indicating how the other two fields were to be interpreted. In particular, one of these fields contained either an item of data or a pointer to a sublist (that is, a list which functioned as an element) and the other field

contained a pointer to the next item on the list or, if the item were the last on the list, a pointer to the item for which this list served as a sublist. To avoid treating as a special case the last item of any list which was not actually a sublist of any other list, a so-called head was introduced, whic h used the li st in question as a sublist. Thus, the list (a, (b,c), d, e) where (b,c) acts as a sublist, would have the threaded representation shown in Figure 1-2. Although this scheme permits rapid sequencing through levels of sublists without having to "remember" when a sublist has been encountered, it suffers the disadvantage that sublists may not be shared. The fact is that circular lists, doubly linked lists, and threaded lists are all designed to facilitate searching or sequencing through a list and at the same time to maintain the ease of inserting and deleting items. Although additional pointers can increase somewhat the complexity of the procedures used for modification and do require additional storage above the corresponding simpler linked list, these disadvantages are often more than offset by increased ease of searchingy As we have already indicated, one of the main factors resulting

m CD. Ao 0 Y3 - CD CD CD

in the exposure of programmers to list-processing techniques was the development and availability by the mid-1960's of a number of list processing languages and systems. The first widely used system of this sort was IPL-V [60], a relatively low-level interpretive system descended from Newell, Shaw, and Simon's IPL. Also fashioned after IPL was a system called FLPL [23], a set of FORTRAN subroutines for list manipulation developed by Gelernter. A third system called LISP [54], developed by McCarthy, involved list concepts similar to the first two but was implemented in a somewhat different manner. The most recent system of that period was Weizenbaum's SLIP [79] o As important as these systems and all others of their type were ( and still are:), they all suffer from one major flaw, the same flaw which characterized the vector of consecutive locations and which in fact lead to their development: inflexibility. Each of these systems provides a fixed type of structure to which the data to be represented must be tailored. Ideally the memory representation should conform to the data and the application at hand. In 1966, however, Knowlt.on introduced a general, relatively lowlevel, list processing system called L [41,42], the features of which were quickly incorporated into a number of other systems, L6 allows the programmer to choose blocks of sizes to his liking, to define fields

within these blocks as he sees fit, and in general to manipulate his structures as he desires. While not a panacea, L6 is certainly a step in the right direction. Dissatifaction with certain aspects of linked memory schemes (in addition to dissatisfaction with the sequential schemes) when applied to certain types of problems caused some individuals to seek still further alternatives. In particular, Feldman [20] and Rovner [67] were concerned with problems in the area of artificial intelligence for which an associative memory, in which reference to information stored in the memory is made by specifying the contents of a part of a cell (word) instead of an address, would be decidedly advantageous. Unfortunately, the cost of a hardware memory of this type and of the size required for such problems was prohibitive. Therefore, in 1965 Feldman proposed his version of a software-simulated associative memory for a computer with conventional memory and introduced the concept of hash-coding as a form of address determination. The basic idea behind hash-code addressing is that given the contents of some part of a memory cell (where a cell may consist of more than one word), the address of that cell can be determined by applying some transformation to the contents given. In Feldman's scheme each cell contains three principal fields which we will designate Fl, F2, and F3, the contents of which are

presumably related to one another in some fashion. A triple consisting of a value for each of the fields F1, F2, and F3 is assigned an address in the memory by "hash-coding" the values for fields F1 and F2 In Feldman's case the hash-coding is implemented by shifting the value for field F1 left a number of places, performing a partial add between this result and the value for field F2, and insuring that the result is even. Retrieving the value of field F3 given the values of fields F1 and F2 then simply involves hashing the values of F1 and F2 to determine the address of the cell containing the value sought. Obviously, there are a number of problems which can arise in using such a scheme, In the first place there mnky be more than one value for FQ associated with a given pair of values for F1 and F2. This situation is called multiplicity. Secondly, it is possible that two distinct pairs of values for F1 and F2 may hash to the same address. This situation is called overlap. Finally, the cell accessed by hashing a particular pair of values for F1 and F2 may have been used as a free-storage cell to handle the overlap or multiplicity of another cell. This situation is called conflict and requires moving the present contents of that cell, a very costly operation. Feldman handles the overlap and multiplicity problems - and provides for such operations as determining the values of F1 and F2 given the value of F3 - by including certain link fields in each cell.

About a year after Feldman introduced his version of an associative processor, Rovner. proposed certain extensions to the scheme to make it feasible for such a system to be used in the paged environment of a virtual memory machine. The basic concepts, however, remained unchanged. From this point on in time the number of memory representations has grown very rapidly. The principles involved in each of these "new" representations are, however, basically no different from those we have presented so far. In a recently published text, Knuth [43 ] presents the basic principles of memory representations in a very readable form. In this reference the various fundamental memory representations are divorced from all languages and systems in which they have been previously imbedded and are considered on their own merits. For instance, rather than discussing how a given situation might be handled using SLIP, Knuth considers how doubly linked lists (basically the structures generated by SLIP) might be utilized. In this section we have attempted to present some of the highlights in the evolution of computer memory data representations. The reader should not expect this to be an exhaustive consideration of all existing organizational schemes, for it surely is not. Neither should the reader expect this to be a tutorial on the basic principles of computer memory

representations. For such a discussion the reader is referred to the text by Knuth. This section is intended primarily to give the reader a flavor for the problems and considerations involved in the design of data representations for the computer memory. 1l 2 Definitions and Philosophy Up to this point our discussions, out of necessity, have been of a rather intuitive and subjective nature. We have used such imprecise terms as "computer memory representation" and "data organization scheme" in an attempt to describe the problem with which we are concerned. In this section we intend to define a number of terms so that our future discussions may assume a less ambiguous posture. We should note at the outset that the individuals (and the corresponding literature) concerned with computer memory data organizations may generally be grouped into two rather ill-defined and overlapping classes. On the one hand are those individuals who are primarily concerned with the representation of data within the main store (e.g., core memory) of a computer, and on the other hand,are those who are primarily concerned with the organization of data within the secondary store (e.g., drum and disc) of a computer or within a hierarchy of memory devices. The former might be called "data structure* enthusiasts" and the latter, "file management enthusiasts". * The term "data structure" is used here in a very broad, generic sense which differs from our later definition of the term,

Fundamentally, the problems which face these two groups are identical. There are differences, however, in the respective environments in which their solutions to these problems must operate. The inherently sequential nature of accessing information on a disc may dictate a different organizational scheme than the random access nature of core memory. We do not wish to dwell upon these differences (for the similarities may more than outweigh them) but merely wish to indicate that there are frequently (but not always) differences in the terminology used by the two groups. For example, whereas the data structure people are generally concerned with the interrelationships among data items or data elements, the file management people are generally concerned with the interrelationships among.files and records. For the purposes of this exposition we will reside in the camp of the data structure people. Let us now proceed with the definition of some terms. In solving a particular problem on a digital computer, the program which implements the solution process operates upon some collection of objects which are used as the basis for decision or calculationo Each of these objects represents an occurrence, or an instance, of some physical or conceptual quantity and is characterized by an ordered pair consisting of a data name and a data value. The data name indicates the

quantity itself, such as planet, river, city, etco, and the data value indicates the particular instance of this quantity. The data name/data value ordered pair (or, equivalently, the object characterized by the ordered pair) will be called a data item. A data item is denoted by its corresponding- data name, and an instance of a data item is a specific data name/ data value pair. For example, the ordered pair (CITY, ANNARBOR) is an instance of the data item CITY. The definitions of data name, data value, and data item which we have just presented are essentially the same as the definitions used by McCuskey[ 55], who points out that in common high-level programming language usage the data value corresponds to the "data" which is stored in the computer memory while the data name corresponds to "data about data" which appears in the source program and enters a symbol table during compilation. The data name and the data value are represented in the computer memory by sequences of elementary objects called symbols which are members of some alphabet, or set of all symbols, and which are known to the program. One possible alphabet is the EBCDIC character set. The ordered pair of symbol sequences which represent a data name/ data value pair is called a data item description, or simply a description.

It should be clear that a given data item may have any of a number of descriptions depending upon the alphabet chosen. Let us now examine the actual steps involved in solving a problem on a digital computer. First, of course, we must define the problem to be solved. This involves specifying such things as the information (i. eo, the data items) which the solution process is to be given initially and upon which it is to operate, the algorithms (perhaps in the form of flow charts) which are pertinent to the solution process, and finally the results which the solution process is expected to determine. These specifications must be complete and concise and must reflect the requirements of any solution of the problem, manual or automatic. Second, we must specifythe logical ordering or structure of the various data items - that is, the logical relationships of accessibility among the various data items - and the logical access processes which may be used to find any data item in the structure. The result of this step is a specification of the data structure of the data items involved in the solution of the problem. Third, given the specification of the data structure for the problem, we must determine a suitable physical organization for the data items within the computer memory. The resulting representation is called the storage structure of the data.

Finally, we must actually generate the code for the program required to implement the solution process and the specified storage structure. For the purpose of this exposition we will concern ourselves primarily with the second and third of these steps. We assume that we are given the problem definition as an initial starting point and that we produce the specifications used by the system implementor. The reader should note the distinction we have made between the terms "data structure" and "storage structure". This distinction was first made by D'Imperio [17] in 1964 and later by Mealy [57] in 1967. The reader is cautioned, however, that not all authors use these terms in the same way. In fact, it is frequently the case that the term data structure is used to encompass the spectrum of both data structure and storage structure as we have defined them. In the context of our definitions, data structures arise from the interpretations we give to certain aspects of a problem and its solution, and they are not necessarily invariant, inherent, or necessary characteristics of the data items themselves. To contrast the concepts of data structure and storage structure once again we might say that data structures are simply theories of the structure of the real world, and storage structures are computer representations of these theories.

20 It is patently clear that even for the simplest of data structures there are a multitude of storage structures which may be used to represent them. Consider for instance a collection of n data items di, where i e {1,2, -.,n}, for which the corresponding data structure is linear and sequential in nature, such that data item dl is the first in the sequence, data item dn is the last in the sequence, and data item di is preceeded by data item di_1 and followed by data item di+1 Some of the more obvious choices of storage structures which may be used to represent this data structure include 1) a vector of consecutive memory locations, 2) a linked linear list, and 3) a doubly linked list. Herein, of course, lies our problem: determine the storage structure which best represents a given data structure. Before considering our approach to the problem, however, let us make some general observations concerning storage structures. All storage structures may be classified according to one or more of three basic types of organization: sequential, list, and random. These types, to which we alluded in the previous section, are described by Dodd* [19], so our consideration of them will be relatively brief. In sequential organization data items are stored in the computer memory in locations relative to other data items according to a * Note that Dodd's treatment of the subject is from the viewpoint of file management and his definition of data structure corresponds to our definition of storage structure.

specified sequence. Most conlnlonly, the data items are stored in consecutive locations of the memory. We could, however, use any other well defined sequence. For instance, the data items could be stored in locations whose addresses are given by increasing relatively prime numbers (although the utility of such a scheme might be questionable), The basic concept of list organization is that pointers are used to separate the logical ordering of the memory locations containing data items from the physical ordering of these locations. In general, a pointer may be anything which allows the accessing mechanism to locate the memory cell containing a given data item, but almost without exception a pointer is considered to be a vaiue representing the address of the cell containing the data item. Finally, in random organization data items are stored and retrieved on the basis of some predictable relationship between the data item and the address of its assigned memory location. We may distinguish three basic types within random organization: direct address, dictionary look-up, and calculation. For the direct address method of random organization an arbitrary absolute address is assigned to a data item by the programmer and this address is used every time the data item is to be accessed. The dictionary look-up method involves maintaining a dictionary or symbol table containing pairs of keys and addresses for the data items of interest. To determine the location of a given data item, the

dictionary is searched for the key (perhaps the data name) associated with the data item; the address associated with this key indicates the location of the desired data item. Lastly, the calculation method converts the key of a given data item into an address (not necessarily unique) by performing a standard calculation or transformation upon it. This method includes the hashcoding scheme we discussed in the previous section. We wish to reiterate the fact that most storage structure schemes are not purely sequential, list, or random in nature, but utilize the properties of all of these schemes to lesser or greater extents, In later discourse we will desire to make frequent reference to a particular class of sequentially organized storage structures - namely, those storage structures for which the data items are stored in consecutive cells of memory. To facilitate our discussions, let us define such storage structures to be stacks*. To be more specific, let a stack be defined to consist of a vector, or block, of consecutive memory locations which contains a'lumber of data items ordered in the same manner as the memory locations which contain them. Let us now conclude our discussion of the philosophy of data representation by summarizing its two most important points. First, the concepts of data structure and storage structure are distinct. Note that this definition of the term stack is not necessarily a standard definition and differs, in particular, from the definition used by Knuth [43 ]

23 Data structure refers to the logical ordering or structure of data as we interpret it for the solution of a given problem. Storage structure refers to the physical representation of data structure within a computer memory. Second, a given data structure can invariably be represented by a number of distinct storage structures 1.3 Research Objectives It is evident that determination of a storage structure to represent the data structure associated with the solution of a given problem is a process involving a large number of tradeoffs. By now the qualitative implications of these tradeoffs are relatively well understood. What is needed, however, are rigorous, objective techniques for evaluation of the quantitative aspects of the tradeoffs. The result of developing such techniques should be more intelligent systems design, leading in turn to programs operating with higher productivity at a lower cost. Our approach to development of these techniques is threefold. First, we will develop a rigorous framework in the form of a relational and set-theoretic model for the description of data structure. The relational view of data, which has been advocated by others (in particular, Childs [11], Codd [12], McCuskey[55], and Mealy [57]) appears to be superior in several respects to a graph or network model since it does not superimpose any structure upon the data which might fall within the realm of storage structure.

24 Second, we will develop a decision model for specifying the storage structures capable of representing any data structure as given by our data structure model. Finally, we will develop certain measures of performance which enable us to compare the time and storage characteristics of the storage structures described by our storage structure model. We will also present a procedure for examining the measures of performance for each of the set of storage structures which can feasibly represent a given data structure and for determining that storage structure for which these measures of performance best satisfy certain optimality condition s.

Chapter II A MATHEMATICAL MODEL OF DATA STRUCTURE In order to address ourselves to the problem of choosing an optimal storage structure for a particular collection of data, we need a rigorous framework within which we can discuss the structure of data. The subject of this chapter then is the specification of a mathematical model, or abstraction, of data structure. 25

26 2. 1 Relations Since, as we have indicated, our model of data structure will be a relational model, we devote this section to defining the concept of a. relation. A propositional function defined on the Cartesian product Ax B of two sets A and B is an expression denoted by P(x,y) which has the property that P(a,b), where a and b are sustituted for the variables x and y in P(x,y), is true or false for any ordered pair (a,b) e Ax B. For example, if A is the set of all composers and B is the set of all musical compositions, then P(x,y) = "x composed by y" is a propositional function on Ax B. In particular, P(Berlioz, Symphonie Fantastique) "Berlioz composed Symphonie Fantastique" and P(Bach, 1812 Overture) = "Bach composed 1812 Overture" are true and false, respectively. The expression P(x,y) by itself is called an open sentence in two variables or, simply, an open sentence.

A relation r may be deftned to consist of the following: (1) a set A (2) a set B (3) an open sentence P (x,y) in which P (a,b) is either true or false for any ordered pair (a, b) e Ax B. Thus, r is called a relation from A to B which we will denote by r =(A, B, P(x,y)) Furthermore, if P(a,b) is true, we will denote this fact by arb and if P(a,b) is not true, we will denote this fact by a b Let r = (A,B,P(x,y)) be a relation. We define the solution set R of the relation r to consist of the elements (a,b) in Ax B for which P(a,b) is true. That is, R = {(a,b) ae A, beB, P(a,b) is true} Notice that R, the solution set of relation r from A to B, is a subset of A x B. Let R be any subset of A x B. Then we can define a relation r = (A, B,P(x,y)) where P(x,y) = "The ordered pair (x,y) belongs to R". The solution set of this relation r is the criginal set ER.

28 Thus, to every relation r = (A,B, P(x,y)) there corresponds a unique solution set R which is a subset of A x B, and to every subset R of Ax B there corresponds-a relation r = (A,B,P(x,y)) for which R is the solution set. Since this one-to-one correspondence exists between relations r = (A,B, P(x,y)) and subsets R of Ax B, we can redefine a relation as follows: A relation r from A to B is a subset of A x Bo Although this definition may appear somewhat artificial, it has the advantage that the undefined concepts of "open-sentence" and "variable" are not used.

29 2. 2 A Relational Model of Data Structure In this section we will present a rigorous, mathematical model for data structure. In the course of doing so we will define a rather large number of sets. Since our motive for defining a particular set may not present itself until other sets have been defined, we caution the reader of this possibility ahead of time. We raise one further word of caution: Several distinct sets may be designated by the same symbol (a capital Greek letter) with a superscript used to differentiate among them. In general, sets designated by a common symbol and distinguished in this manner will share some common (subjective) properties. Thus, no special significance other than its role as a device for differentiation should be attached to the value of a superscript. Let us now proceed with the development of our model for data structure. The intrinsic structure, or the data structure, of any collection of n data items may be described in the following manner. Let the set A~ consist of the n data items in question: ~o =(di i = 1,2,.,n} where di is the i-th data item in the collection.

30 Let the set P (capital rho) consist of all relations of interest in AO (i.e., from Ao to AO): P={r.Ij =1,2, *, krO} where r. is the j-th relation and kO~ is the number of different j r relations of interest. When we refer to the "relations of interest", we mean that for the purpose of solving a particular problem involving the data items of Ao we may assume that the elements of Ao are related to one another only via the relations in P (whereas these data items may actually be related to one another via other relations as well). The set Ao and the relations of P then define the intrinsic structure of the given collection of n data items. This information in itself is of little more than academic interest, but we may expand upon it somewhat in order to gain the insight required of our model. Let Rj be the solution set of relation rj E P. Rj then consists of a set of ordered pairs (di, dk) in Ao x A0. Let us define Aj to be a subset of Ao such that every element of Aj is the first element of at least one ordered pair (di,dk) in Rj. Aj = {dildiEA0, (didk)E Rj for somedkc A0}

Further, let us define II to be a subset of a~ such that every element of Hj is the second element of at least one ordered pair (di,dk) in R. Hlj= {jdkdk AO, (di,d-.eRj for some die AO} Let kO r ko 11= Un._ j=1 J Clearly, A and II are subsets of A0. In fact either A or II or both may be identically equal to A0, but this is not necessarily the case (which is, of course, our reason for defining A and ii), The set P may now be defined to consist of relations from A to ii. The elements of P are exactly the same as before. As a practical matter we will require that A0 = A UII. That is, we will require every element of A~ to be related by at least one relation in P to at least one other element in A0 (not excluding the possibility of an element being related to itself). Since we are concerned with the interrelationships among various data items in our consideration of data structure, we find a data item which is related to no other to be

32 singularly uninteresting. (There are, of course, some interesting problems-.associated with the representation of elementary items of data within a computer - such as, the representation of rational numbers - but these problems do not concern us here.) Let the cardinality of set A (which we denote by iA I) be k~ and a let the cardinality of set II be k~o p Al1 =k~ a IHI =k~ p As a matter of notational convenience, let the k-th element of HI be denoted by Pk, where ke {1,2, *.,k }, to distinguish it from the k-th element of A which we will continue to denote by dk, where kE{1,2,.. kOa}., a We then indicate the fact that some data item d. in A is related to some data item Pk in II via some relation r. in P by the notation di r. Pk For specific values of i, j, and k we call di r.j k a relation instance. In particular, di ri Pk is a relation instance if (di, k) e Rj. d. is called the source of the relation instance, pk is called the target of the relation instance, and r. is called the relation symbol of the relation instance. Note that we may use rj to denote both a relation and a relation symbol, Context should make clear, however, the sense in which r. is being used. ]

33 We now define a set S2 to contain all the relation instances implied by the relations in P. {fdi rj Pk [(diPk) E Rj, j= 1, 2,,kk Corresponding to each data item die A, where iE{1, 2,~..,k~}, there exists some (nonempty) set Pi of relations such that (1) Pi CP, and (2) for each relation rje P. there exists at least one data item pkE II, where k {1,2, I 2, kop}, for which di rj Pk. i = {rj rje P, die, (di, Pk)ERj for some Pk Ell} In other words, Pi consists of all relations in P whose solution sets contain at least one ordered pair the first element of which is d.. Corresponding to every source/relation symbol pair (di rj), where di EA and r jE PiC P, there exists some set II 2 c I of targets, where { l, 2, 2o.,n2}, such that for every target pkII2 (dipk)E Rj That is, II2, which is called the 1-th target set, consists of all data items ke II which satisfy di rj Pk' To present a more formal definition, we define the target set II2 for e { 1, 2,..,n2} such that

34 (1) n2 c H (2) 2 (3) i= {pkldid. r. pk e for fixed d. and r.} 2 2ce^ K 1 2 (4) I2 I 2 if Note that this definition does not preclude the possibility of two or more distinct source/relation symbol pairs having the same target set. Let the set Z, where ~e{1, 2, o,n2}, consist of all source/ relation symbol pairs (di rj), where die A and rje Pi, such that (di, Pk) ERj for every pkE II12 ={(di rj) Idi rj PkE for all Pk E II2 That is, E consists of all source/relation symbol pairs which have H2 as target set. We note that there is a one-to-one correspondence between the elements of the set { ~ I f = 1, 2,.n2} and the elements of the set {1II IQ = 1,2, on2}. For convenience we have assumed that Z corresponds to 11 for all e{1, 2, *.,n2} Let the set II be partitioned into a number n1 < k of disjoint - p subsets II, where m=1, 2,o *~ n, such that 1 2 (1) II l = 0 for ml ~ m (2) U'1 I m-1 m

35 (3) Pk vPk e II P Ok e II k e (Pk E nPkE H >Pk2 for all ce{1,2,o.n2} The main thrust of this definition, as contained in condition 3, is the following. If targets Pk and Pk are both elements of the set 1 thenfo t,2 1 2 then for every set II of which k is an element,. p is also an 2 element,, and for every set II of which is an element, Pk is also 12 Of which Yr2 is an element, p a an element. Conversely, if for every set II2 of which Pk is an element k2 is also an element and if for every set I2 of which p is an element pk is also an element, then Pk and pk2 are both elements of the sane set ill We note thatfor each set II, where e{1, 2,...,n2, there exists a set Me of indices such that Me c {1,2,..,nl} II2M = U211n m me M On an intuitive basis we can view the sets II1 as the largest subm of each II as a union of some of them. Obviously, we can partition II into k~0 subsets, each of which contains exactly one data item

36 Suppose, however, that for every target set 1I2 in which data item Pk e II appears, data item Pk e II also appears. Then Pk and pk 1 2 1 can be combined into a single set, thereby reducing the number of subsets II1 to kp-1. Continuing in this manner we eventually reach m p the point at which no two targets in 11 appear together in any target set II in which one or the other appears but are not elements of the same subset of the current partition. At this point we cannot further reduce the number of subsets in the partition and these subsets define the various l1 m Let the set IIm where me {1, 2, ~ ~1n-, consist of all those target sets II2 of which II is a subset. 13 ={n2 n1 c n} m m II3 simply indicates which of the target sets I2 "use" the subset m Im Given some data item die A, for every rj E P. there exists some 2 2 corresponding target set II (such that (di, Pk)E Rj for each PkE II) Let the set rconsist of all distinct relation symbol/target set pairs (r j17 which are associated with all sources in A such that 2 2 (1) r={(rjII,))I dirj 11e Q for some dieA} (2) ( (r j2 ), (rj2 l 2> 3d d. EA ni i such that di r d. r jII2

37 where d. r. I1 represents all relation instances di rj Pk such that 4k ne~ Clearly, rln < k k It may be the case, of course, that the sanme relation symbol/ target set pair is associated with more than one source. Let A be the set of all sources in A with which the q-th element (rj II) of r is associated, where q { 1, 2, oo., r}. Let us define a one-to-one function a which assigns to each element of r a unique element of the set {1, 2,.., I r}.:r —{l,2,-., Irl} (Clearly, a maps ronto the set {1, 2,... Irl}.) We may use the function a( to assign an index value q to each element (rjII) of P q = a(r j ) Similarly, we may use the inverse a1 of the function a to determine the element (r jI2) of Tcorresponding to an index value q: (rII) = -l(q) Then the set Ah may be defined more rigorously as follows: q A = {di d A dirj 12 E Q, (rj2 where q E {1,2,..-, rl}.

38 Let A - {~q i q = 1, 2,-, l rl} A-2 2 -It There may exist sets 22 and &2 e A such that A = A2 for ql qq2. Therefore, let us partition the set A into a number n4< I!rl of subsets, each of which consists of a collection of identically equal elements. Let us then choose one element from each of the subsets to form a new set A* of distinct elements. In particular, let us define the set A* as follows: (1) A* ={A I AAm 2 2 2 2 (2) Am2 for A A e A* and m1 m2 r "m2 m 1 2 We note that IA* I = n4 For each Ah 2E A* we may then define a set r c rF which consists of all relation symbol /target set pairs associated with all Aq e A for 2 A2 which A. Specifically, we define the set r for m {1, 2, o n4} q rm' m' 4 such that (1) rm (2) rmP = {(r jI r, d r rI e for all diA E A* (3) r ml r if ml m2 im1 i2

39 Now in the same manner as we did for Ii, let us partition the set A into a number n3 < k~ of disjoint subsets A, where = 1, 2, n, 3- a 3' such that (1) 4 T1 A A1 =0' for P2 n3 (2) A= = (3) di,d 11 2 (d 2 > d 2 ~ m ri m 2 for all m E {1, 2,,n4} (i. e., for all EA*) It follows that for each set Am *, where mE{,2, n4}, there exists a set Lm of indices such that L C{12 n} Lm c'3 2 U A Am AL m Finally, let the set A where e { 1, 2,..., n3}, consist of all those sets A2 E A of which A is a subset. m

40 A3 =A2 A2 A* A1 cA2} 3 2 tu a simply indicates which of the sets "use" the subset A m k~ 2. 20 1 Schematic Representation Our model for the data structure of an arbitrary collection of 1 3 elementary items of data now consists of the sets A, II, Pi' A1 A A2 r t, I13 and II as well as our original sets A0 and P. m' m' l' m' m In order to clarify the relationships among these sets, we introduce the schematic representation of Figure 2-1. We may view this structure as an undirected graph consisting of certain nodes, or vertices, connected by (undirected) branches, or edges. Each ring of the structure of Figure 2-1 contains some number k + 1 of nodes (where the value of k~ varies from ring to ring). Of these k~+ 1 nodes k act in a "connective" capacity and the remaining node represents some set. In particular, for any ring in the structure those nodes of the ring which lie on a level designated by an odd number are the connective nodes and that node which lies on a level designated by an even number is the set node, In a general sense, we may interpret the connective nodes of a ring as indicating the elements of the set represented by the set node of the ring. Before proceeding with the interpretation of the structure of

LEVEL i j B 2~~~~~~~~~~ik r~~~~~~~~~~~~~~~~~~~k I I 2 2 m~~~~~ 4 r LI~~D J5 2 j 6 -7 m 1P3 Figure 2-1. Data Structure Model

42 Figure 2-1 let us point out again the one-to-one correspondence which exists between sets of certain types. For each set a1 where E {1, 2,. o n3}, there is a corresponding (unique) set. Similarly, for each set A, where me {1,2, 00 n4, there is a corresponding set rm; for each set H2, where e{1, 2, o n2}, there is a corresponding set Hi; and for each set IH1 where me{T,2,,nf, there is a corresponding set I3 In the structure of Figure 2-1 each node at level 2 (which is a set node) represents a pair of sets (A1 for {1, 2, 3}.1n Similarly, each node at level 4 represents a pair of sets (A2,rm) for me { 1, 2, o,n4}; each node at level 6 represents a pair of sets (ED, fl2) for fe{1, 2,***,n2}; and each node at level 8 represents a pair of sets (Hm, I1) for me{1, 2,' n } For each of these set pairs the ring above the corresponding node (i.e., the connective nodes in that ring) indicates the composition of the first set in the pair, and the ring below that node indicates the composition of the second set in the pair. Each node at level I represents some distinct data item die A and each node at level 9 represents some distinct data item pke II, with the result that the set A is represented by the entire collection of nodes at level.,and the set II is represented by the entire collection of nodes at level 9. For a given set Ah, which is represented by some node at level 2,

43 the elements contained therein are represented by those nodes at level 1 which are contained within the ring associated with the given node at level 2o Similarly, for a given set IIm which is represented by some node at level 8, the elements contained therein are representdd by those nodes at level 9 which are contained within the ring associated with the given node at level 8. Next consider some set Am2 which is represented by a node at level 4. The elements of A2 are represented by those nodes at level 1 which are associated with the nodes at level 2 which have nodes at level 3 in common with the ring associated with the node (at level 4) representing A. Stated somewhat differently, the sets A which are subsets of A2 (and which collectively contain the elements of A ) are m m represented by those nodes at level 2 which share (in their rings that pass through level 3) nodes whic h are members of the ring that passes through level 3 and is associated with the node which represents Am. We note that any two rings which have at least one node in common have in fact exactly one node in common. Each set 112 which is represented by some node at level 6, is treated at levels 6,7,8 and 9 in a manner analogous to the sets Am 3 which is represented by a node at Considevel 2. The elements of which is represented by nodes at level 4at level 2. The elements of A3 are represented by nodes at level 4

44 (which represent the sets h2 ) having rings passing through nodes at level 3 which are members of the ring associated with the node representing A3. Again, the analogous situation applies for each set 133 at levels m 6, 7, and 8. The only difference between the situations described above and that for the sets rm and Z is that the connective nodes at level 5 m represent relation symbols in addition to performing their connectivity functions. For example, given some set rm, whichis represented by a node at level 4, each element of this set is represented by a node at level 5 (representing a relation symbol) and a node at level 6 (representing a target set H) which node has a ring passing through the relation symbol node at level 5. It follows also that each relation rjE P (or rather the ordered pairs (di, k) which form the corresponding solution set R.) is represented by those pairs of nodes at level 1 and 9 which are associated with the nodes at level 5 representing the relation symbol rj. (Note that, in general, there may be more than one node at level 5 which represents the same relation symbol - in different relation instances, of course.) 2. 2. 2 Implications of the Model Clearly, there is more information by the collection of sets in our data structure model than we have explicitly discussed in our initial description of the schematic of Figure 2-1.

45 For example, the set3 where {1, 2,... n3}, indicates which sets A2 contain the set A.b as a subset. However, since for every m set A2, where m {1, 2,~, n4}, there exists a corresponding set rF 3 the set Al also indicates all sets rm which are associated with the set 1 3 eA. Hence, the set A indicates (indirectly) all relation symbol/target set pairs which are associated.with the sources in the set A21 In a similar manner the set I3 where m- indicates m' where m( {1,2,"~ nl}, indicates m (indirectly) all source / relation symbol pairs which are associated with the targets of the set I1m We can, of course, continue with this line of reasoning to determine which targets are associated with a particular source/relation symbol pair, or which sources are associated with a particular relation symbol/ target pair, and so forth. In short, given the sets defined for our data structure model, we can determine any fact concerning the interrelationships of the elements of a given collection of elementary data items. 2, 20 3 Uniqueness Considerations In defining the sets of our data structure model, we have implicitly made a very important decision. If we examine the union of all sets'r associated with a given source di (as given by the set A3 associated with the set A1 of which the given source is an element), we find that in each of the relation symbol/target set pairs the relation symbol is

46 unique. In fact, there is a one-to-one correspondence between the elements of this union and the elements of Pi. To be more specific, if we let r -{(rH 12)t (r 2) e r, di rj 1e Q, die A} or alternatively 0 U r ri = MO m mIEM. where Mi = {m I diA 2 A*}, then thefollowing conditions will hold: (1) (rj 1)e ri ]<2K= r. e P. for i e{,2,.., k~} (2) r. r. for (r f ), (r ) ri2) oand Ji ]2 ( I 1 J2 2 On the other hand, if we examine the union of all sets ~Q associated with a given target Pk, we find that in each of the source set*/relation symbol pairs the relation symbol is not necessarily unique. That is if we let 2 2r) A r Pke II} 2 = {(Amj)m r Erm Em PkE then it is not necessarily the case that r. A r. for (A2 r ), Ji J2 mI J1 2 (A mr.) eEk and j-1 / j m22 J 2 * A "source set" i roughly analogous to our notion of a target set. Whereas the sets II- are target sets, the sets Am may be considered source sets0 Strictly speaking, the elements of a set 2~ are source/ relation symbol pairs (dirj) and not source set/relation symbol pairs (Am rj).

For ease of reference we will say that this set of conditions implies uniqueness of type 1 We may, of course, wish to consider the opposite set of conditions, where the relation symbols in the set of source set/relation symbol pairs associated with a given target pk are unique, and where the relation symbols in the set of relation symbol/target set pairs associated with a given source di are not necessarily unique. We will say that this set of conditions implies uniqueness of type 2. Redefinition of the sets of our data structure model to effect uniqueness of type 2is straightforward and, therefore, will not be done here, It should be clear that we cannot require both uniqueness of type 1 and uniqueness of type 2 simultaneously except in very special cases. It should also be clear that it is not desirable to require "non-uniqueness" of both types simultaneously, for in this case our sets and, hence, the model cease to be uniquely defined. We assume, therefore, that the "problem solver" must specify which type of uniqueness he assumes when he describes the data for his problem in terms of our data structure model. Since the principles involved are no different whether considering uniqueness of type 1 or of type 2, unless otherwise indicated we will assume uniqueness of type 1 to be in effect for the remainder of the

48 discourse*. In light of this discussion we can comment further upon the relation symbols represented by the nodes at level 5 of the structure of Figure 2-1o The nodes designated r., rj, and rh (and ostensibly representing J1 2 J3 the relation symbols r. r. and r. ) must represent distinct jI J2 J3 relation symbols since these relation symbols appear in relation symbol/ target set pairs which are all elements of the same set F.i correi sponding to data item di 1! It is also true that the relation symbols represented by the nodes designated r. and r. must be distinct, but for a different reason. J2 ]4 In particular, since both relation symbols are associated with the same target set, equality of these relation symbols would imply a single element of the set r and, hence, a single node in the structure. On the other hand, the relation symbol represented by the node designated r. need not be distinct from either of the relation symbols J5 represented by the nodes designated r. and r. in spite of the fact ]2 ] 4 that all three relation symbols appear in source set/relation symbol pairs which are elements of the set 0k corresponding to data item 1 1 Note that the prototype program to which we will refer later in this dissertation has provision for considering both types of uniqueness.

In general, the relation symbol for node r. need not be distinct 3]5 from those for nodes r. and r. either. Finally, the relation symbol 3J J3 for node r. need not be distinct from that for node r. J3 34 This completes the development of our model for data structure. 2. 3 Use of the Model Now that we have a model for data structure, we must specify the manner in which it is to be used, Specific details will be considered in succeeding chapters, but we will provide here certain basic aspects of the use of the model. The problem solver (i.e., the individual who is confronted with a problem and charged with obtaining its solution) must specify what (or rather, how many) data items and what (again, how many) relations are involved in the solution of his problem. The particular data items and, to a much greater extent, the particular relations which the problem solver chooses to characterize his problem are influenced very strongly by his choice of access processeso For example, suppose the problem to be solved involves (among others) three classes A,B, and C of data items. Furthermore, suppose there exists a one-to-one correspondence fromthe elements of each of these classes onto the elements of each of the other two. Finally, suppose that given an element of one of these classes, the corresponding

elements of the other two classes are to be determined (i.e., accessed) Assuming that we are given an element a E A as the intial starting point, we have three choices of procedures for determining the corresponding elements b e B and c e C: Procedure 1 (1) Given a e A, determine b e B. (2) Given b e B, determine c e C. Procedure 2 (1) Given a e A, determine c e C. (2) Given c e C, determine b e B. Procedure 3 (1) Given a e A, determine b e B. (2) Given a e A, determine c e C. It follows that we have (at least) three choices of relations which we may define. If procedure 1 is selected to access the various data items, we may define a relation rab from A to B and a relation rbc from B to C; if procedure 2 is selected, ve may define a relation rac from A to C and a relation rcb from C to B; and finally, if procedure 3 is selected, we may define a relation rab from A to B and a relation rac from A to C. Although this is a very simply example, the concepts which it embodies carry over to much more complex situations.

By examining his data, the problem solver must also determine the cardinalities of the various sets in our data structure model. Actually, the problem solver need not determine the cardinality of each instance of a set, however, but need only determine the average cardinality of each type of set. For example, rather than determine the actual cardinality of the set AI for each e {1, 2, o,n3}, the problem solver need only determine the average cardinality of the set A over all f e {1, 2, 9.,n3}. The reason only the average cardinalities'3 are required should become clear in the next chapter. There is, of course, a very good reason why, from the problem solver's point of view, actual cardinalities are not desirable. Namely, for data structures of even moderate size these cardinalities may be extremely difficult, if not impossible, to determine. In order to do so, one wuld probably have to construct a structure like that of Figure 2-1. for the entire data structure. On the other hand, as we shall see in Chapter VI, determining average cardinalities for the data structure sets is a relatively painless task. We might point out, however, that there unfortunately exists no universal technique which may be applied to all problems to determine the average cardinalities of the data structure sets. The process which must be used is very much a function of the particular problem and the availability of information to the problem solver. Nevertheless, an

52 example we consider in Chapter VI should indicate some general guidelines. It may happen that the mathematical variance of the actual cardinalities for a particular type of set is very large, in which case one might justifiably question the use of an average value. This is particularly t rue if the cardinalities are characterized by a bimodal distribution for which the average value characterizes neither of the peaks. Such a case would normally arise where the set of data items may be partitioned into two or more classes among which there is no interaction. For example, suppose that the set of data items consists of the (disjoint) sets A, B, C, D, and E of data items. Suppose further that the elements of these sets are related by the following relations: rab from A to B ab rda from D to A r from C to E ce Clearly, then, there. is no "communication" between any element of the sets A, B, and D, and any element of the sets C and E (although elements within the cluster of sets A,B, and D do* communicate from one set to another, as do elements within the cluster of sets C and E)o We may then partition the set of data items into two noncommunicating

classes consisting of the elements of sets A, B, and D and the elements of sets C and E, respectively. We may treat this situation as consisting of two (or more, as the case may be) distinct problems for which separate data structures may be defined, In a general sense, we may do this even where the boundaries between the classes are not so well defined. Clearly, the decision to partition a problem into sub-problems lies with the problem solver. For our purposes, we may assume simply that the data structure given is the only one of concern. Once the average cardinalities for the various data structure sets have been specified by the problem solver, we apply the techniques which are developed in the next three chapters to determine the best storage structure for that data structure.

Chapter III A MODEL OF STORAGE STRUCTURE In the previous chapter we developed a mathematical model for the intrinsic structure of an arbitrary collection of elementary data items. Our goal in this chapter is the development of a model for all storage structures capable of representing the data structure of that collection of data items. We desire a model which can assume the form of any of the three basic organizations - sequential, list, and random - or combinations thereof. We also desire a model which is capable of representing the data structure of a collection of data items in varying degrees of detail. Let us assume that all storage structures to be considered will be resident in a uniform storage medium,each unit of which (such as a byte or word) can be accessed in the same number of time units as any other unit. This type of storage medium is commonly called random access storage. The classical example of random access storage is, of course, magnetic-core memory. We will also make what might be called a storage management disclaimer. We assume that there exists some mechanism for the management of the storage in which our storage structure resides, 54

55 that is, some overseer which keeps track of those storage locations available for, but not actually occupied by, the structure. This storage supervisor is charged with providing upon request unoccupied storage locations for use by the storage structure and is also charged with reclaiming those storage locations no longer in use by the structure. Although the implementation of these functions is of considerable importance, such considerations do not fall within the scope of our efforts here. Therefore, beyond acknowledging the existence of a storage manager and acknowledging its importance in the implementation of any storage structure,we shall ignore it.

3. 1 Initial Storage Structure Model Exammie our data structure model as it appears in Figure 2-1. As the first step in the development of a storage structure model, let us develop a single storage structure capable of representing all the detail of the data structure of this model. Let each node of the DSM (the Data Structure Model of Figure 2-1) be represented by a block of contiguous storage units (bytes, words, etc.) and let each ring of the DSM be represented by a ring of (list) pointers where the unique block of the ring (i.e., the block which represents a node at level 2,4,6, or 8 of the DSM) acts as the head of the ring. For every pointer thus appearing in a ring let us include a pointer in the opposite direction, resulting in a ring linking all nodes in the direction opposite to the first ring. We will distinguish these two types of rings by saying the first consists of forward pointers and the second consists of reverse pointers (although the distinction as to which ring contains forward pointers and which reverse pointers is purely academic). At times we may also refer simply to "pointers" (other than in the generic sense) in which case we will mean.forward pointers. Let us also include in every ring a pointer from each element block (as distinct from the head) to the head. This pointer we will call a head pointer.

57 At this point the blocks of our storage structure consist only of pointer fields. We will, of course, have to include information about the data items. Let this information be placed in a number of data item description blocks, one for every elementary data item. Then let each block which represents some data item (i. e., each block which corresponds to a node at level 1 or level 9 of the DSM) contain a pointer to the corresponding data item description block. Each description block (short for data item description block) will then be pointed to by at least one, but no more than two, description block indicators. In each block representing a node at level 5 of the DSM we will include a field to contain the name (some appropriate code) of the relation symbol which the node represents. Finally, through each block containing the name of a given relation symbol we will pass a ring joining all blocks containing the name of that particular relation symbol. Clearly, the number of such rings will be equal to the cardinality of P. In fact, there will be a one-to-one correspondence between these rings and the relations in P. Each of these rings, which we designate relation rings, will have a head which is distinct from the blocks representing nodes at level 5 of the DSM. The resultant storage structure, sans reverse and head pointers, is shown schematically in Figure 3-1. We have assumed for the purposes of illustration that the nodes designated r. and r. in the J3 J4

TYPE Block Ring d,,, p d,'..... —-'-dib,, L d.o - b2, j2~ i _ b0 -! r~A =!C b:.i a i i i~~~ i blo:P i; b ~1 3-. I r S Figure 3-1. Initial Storage Structure Model Figure 3-1~. Initial Storage Structure Model

59 DSM represent the same relation symbol rj. We will designate the structure of Figure 3-1 as the Initial Storage Structure Model, or simply, the ISSM. Those pointers emanating from the bl blocks and the bll blocks and followed by terms of the form di and pk, respectively, represent the description block indicators. (The description blocks are not shown.) 3. 2 Transformations Instead of the structure described above, we could, of course, have chosen a storage structure in which no head pointers are present, or one in which no reverse pointers are present, or one in which head pointers are present only for certain rings, or one in which some rings are replaced by stacks, and so on. Clearly, our storage structure model must be flexible enough to allow us to represent variations of this sort. We will achieve this flexibility by applying certain transformations to the ISSM to yield the storage structure of interest. These transformations, which may be applied individually and in combination to the ISSM, are called (1) Stacking (2) Duplication (3) Elimination Before considering the effect of each of these transformations

upon the ISSM note the symmetry and the repetitive nature of that structure. The blocks designated (by type) b2, b4, b8, and b10 in Figure 3-1 each act as the head of two rings, one above and one below the block. On the other hand, the blocks designated b3, b6, and bg each act as an element in each of two "back-to-back" rings. The blocks designated bl and bll also act as ring elements but appear in only one ring. Thus, the ISSM is composed essentially of repetitions of the structure of Figure 3-2. The blocks designated x1 and x3 (specifically, the blocks containing xl, x12, x31, and x32) function as the heads of two rings each (only one of which is shown in, detail) and the blocks designated x2 (specifically the blocks containing x21, x22, and x23) each function as elements of back-to-back rings. To simplify further discourse, instead of writing "the blocks designated (by type) x1i we will write "the xI blocks" and instead of writing "the block containing x11l' we will write simply "x11 " Consider now the effect of each of the three transformations given above upon the structure of Figure 3-2. The "stacking" transformation causes a given ring to be turned into a stack. For instance, if we apply the stacking transformation to the zl-rings of Figure 3-2, then for each z1-ring we form a stack of x2-blocks (which act as elements of the given z1-ring) upon the corresponding x1l-block (which acts as head of the given zl-ring).

61 TYPE Block Ring XjII X2 x, IdxlZ X2 x2 1X22 1'X23 X2 ] @ f } LZ X3l1 I-x32 3l x Figure 3-2. A Portion of the Initial Storage Structure Model

62 Similarly, if we apply the stacking transformation to the z2-rings, then for each z2-ring we form a stack of x2-blocks upon the corre - sponding x3-block. Note that since the x2-blocks are shared by both the zl-rings and the z2-rings, the x2-blocks may be stacked upon either the x1-blocks or the x3-blocks but not both. Figure 3-3 illustrates this transformation as applied to the z1-rings of Figure 3-2. An alternative to stacking is the "duplication" transformation. Applying this transformation to a ring causes a copy of the head of the ring to be concatenated with each element block of the ring. In addition, for each copy of the head block which is created by this transformation, a copy of the second ring of which the block is head is created. The net result is the removal of a given ring from the structure by duplicating certain other rings. Figure: 3-4 illustrates duplication as applied to the zl-rings of Figure 3-2. In order to more clearly illustrate the effect of duplication upon the second ring associated with a duplicated head block, assume that the x -blocks of Figure 3-2 are duplicated upon the element blocks of the rings below them. The result of this duplication is shown in Figure 3-5. Note that all blocks in the copies of the ring associated with a duplicated block must maintain membership in any other rings of which blocks of the copied ring were members. Contrary to stacking, duplication may be applied to either the zl-rings or the z2-rings or both simulataneously. If duplication is applied to both the zl-rings and the z2-rings, the

63 TYPE Block Ring Fg X3 S in 12 xl 21 x22 X2 X23x Z2 X31 X32 X3 Figure 3-3. Stacking Transformation

64 TYPE Block Ring XII X1 X12 2 xl Figure 3-4. Duplication Transformation

TYPE Block Rn X1 12 X II x~~~~~~~~~~~~~~~~~~~~~~~~~~~I X21 X21 Y, 22 Y-22 Y-23 I 23 K 2 z2 Y' 31 X31 X32 X32 X3 X41 X42 X43 X44 Figure 3-5. Secondary Effects of Duplication Transformation

structure of Figure 3-6 will result. This leads us to our third transformation, "elimination". If the x2-blocks are being used purely for connective purposes (like the b -blocks and b9-blocks of Figure 3-1), they can be removed, or eliminated, from the structure of Figure 3-6 since in this structure there is a one-to-one correspondence between the xl -blocks and the x3-blocks which they connect. The resultant structure would appear as in Figure 3-7. Finally, we can apply stacking to the zl-rings of Figure 3-2 and duplication to the z2-rings or vice-versa which, for the first case will yield the structure of Figure 3-8. It should be apparent that stacking, duplication, and elimination can be applied in various combinations simultaneously to the several rings of the ISSM. In order to specify more rigorously just what combinations of these transformations are permissable, we will introduce a number of decision variables which pertain to the various transformations for each of the types of rings within the ISSM. Before doing this, however, we will consider a number of peripheral issues. 3.3 AdditiQnal Model Characteristics At first glance it might appear desirable to treat each ring of the ISSM independent ly from all the others with regard to the transformations which we apply to it. For instance, we may wish to maintain one a1 -ring of the ISSM as a ring, apply the stacking transformation to another al-ring, and apply the duplication

67 TYPE l Block Ring x I12 2 Xix21F x22 X23 X2 X31 X31 X32 X3 Figure 3-6. Simultaneous Duplication

68 TYPE a *: Block Ring X, I,x,12 X12X X31 X31 X3X32 x3 Figure 3-7. Elimination Transformation

69 TYPE Block Ring x II x -12 XI x2l x22 X 2 X31 x31 X3 x23 x2 Figure 3-8 Comb nx32 x 3 Figure 3-8. Combination of Stacking and Duplication

70 transformation to still another al-ring. It quickly becomes apparent, however, that such a posture has two pitfalls. First, it is doubtful that such a structure can be used in a very efficient manner. At each step in the performance of an operation upon the structure one must determine the convention applying to that particular portion of the structure and then choose the proper code to effect the desired step. Second, the number of variables required to specify such a structure via any given model would quickly exceed our ability to consider all possible variations thereof. Therefore, we will make the assumption that the application of a transformation to any ring of a given type implies the application of that transformation to all rings of the given type. Thus, all rings (and all blocks) of a given type are assumed to be treated uniformly and our model is said to be homogeneous. As a second issue, notice that applying duplication to the a3-rings of the ISSM or to the a8-rings or to both will in general result in the generation of several copies of each b -block. (See Figure 3-1.) Clearly, this results in an increase in the number of such blocks in each relation ring. In particular, this results in the generation of two or more b6-blocks to represent a single relation symbol as it appears in the DSM of Figure 2-1. As we shall see later, certain operations which we may want to perform upon the data represented

by a particular storage structure may use a given relation ring to search for the occurrence of a particular relation symbol (as used in the DSM and as characterized by its association with one or more relation instances). If we can guarantee that the relation ring contains no more than one b6-block representing the relation symbol sought, then once a b6-block representing the desired relation symbol has been found, the remainder of the b -blocks in the relation ring may be 6 ignored. On the other hand, if the relation ring may contain more than one b6-block representing the relation symbol sought, then even when a b6-block representing the desired relation symbol has been found, the remainder of the b6-blocks in the relation ring must be examined to check for additional blocks representing the relation symbol. It may, therefore, be advantageous for the relation rings to maintain their original compositions (and, hence, guarantee that there exists no more than one b6-block representing a given relation symbol) regardless of transformations applied to the rest of the ISSM. For this reason we will now replace each b -block by the structure of Figure 3-9. We assume, of course, that the rings

TYPE Block Ring a4 rj b5 a5 a6 irj Ib7 I i"a7 Figure 3-9. Substitute Structure for b6-blocks

therein may (like the rings of the ISSM) contain reverse and head pointers. The b5- block and the b7-block of Figure 3-9 (each of which contains a field for the name of a relation symbol) replace the b6-block in the a4-ring and in the a7-ring of the ISSM, respectively. In the ISSM upon which no transformations have been made there will then be one b5-block and one b7-block for each (new) b6-block. Applying duplication to the a3-rings and to the a8-rings now results in the generation of copies of b5-blocks and b -blocks, respectively. Each copy of a b5-block which corresponds to a particular b6-block is put into the a5-ring associated with that. b6-block. Similarly, each copy of a b7-block is put into the associated a6-ring. We see now that the b -block has assumed a new role in our storage structure model - it now functions as an head (of the a5-ring and the a -ring) instead of as an element block. Our previous discussions of stacking, duplication, and elimination still apply, however, since the repetitive and symmetric nature of the structure remains unchanged. Substituting the structure of Figure 3-9 for each b -block in the ISSM of Figure 3-1 yields the structure of Figure 3-10, which we will henceforth call our Storage Structure Model, or simply SSM. By making this substitution we have clearly made no changes which would affect the ability of the structure to represent the

TYPE Block Ring dli A' didi dip13 dAda b, b2 _<9 t At~~~ b3 03 - b4 04 ~"-~~~ ~~~~~~~rj1~ r ~ ~-~l: ~~b5 a5 A = <t I ~~ —-t —- "~~~~j2 I ~~ —C"' ~~~r) / _ti~~~_j- 1 - --, —- b6 06 ( m y!,111 —1 1 I e m'~?T3_~~,Inr ~L; ~ 5~3T~0 b7,A_- I _ i. bl bg! ~' blo Pg -....k3 P Figure 3-10. Storage Structure Model

75 intrinsic structure of a collection of data items as modeled by the DSM, A third issue to consider also involves the effects of duplication. Whenever duplication is applied to the a2-rings or the a -rings of the SSM, copies of b -blocks and bll-blocks, respectively, are generated. Since certain operations which we may want to perform upon the data represented by a particular storage structure may require the ability to access all copies of a given b1-block or b lblock as they represent a given source or target, respectively, we will introduce for each b1-block and each bll-block of the SSM a ring to contain all copies of the block generated by duplication. These rings (which are similar to the relation rings for b6-blocks) will be called source rings for b -blocks and target rings for b -blocks. As a final issue we introduce the possibility of a block type field for each block of the SSM. In certain cases, especially when stacking and duplication have been applied to several pairs of adjacent rings, it may be advantageous, if not imperative, to have some means for distinguishing the different types of blocks from one another. To accomplish this, we may include within each block a type field which contains some code identifying the type of the block. Since the SSM contains only eleven distinct types of blocks (bl1,b2,',bli), a type field need contain at most four bits. If fewer than eleven types are to be distinguished (due to the elimination

76 of certain blocks from the SSM, for instance), an even smaller type field may be used. If some other item in a given type of block does not use the entire field allocated to it (as is frequently true with pointer fields), it may be possible to put the type code in that unused space, thus, eliminating the need for a separate type field, In any event we will provide the option of including a type field in the blocks of any given type. 3.4 Decision Variables Return now to consideration of the decision variables which we will use to describe the SSM and transformations applied thereto. Let 0i be a binary-valued decision variable, the value of which indicates whether (0i=1) or not (0i=O) forward pointers are present in the ai-rings of the SSM, where ie {1,2,..., 10}. =0 implies that either the stacking or the duplication transformation has been applied to the ai-rings and that they are no longer really rings at all. Let Ai be a binary-valued decision variable, the value of which indicates whether the stacking transformation (Ai=O) or the duplication transformation (Ai=l) has been applied to the ai-rings of the SSM when 0i=O, where i {1, 2,.., 10}. As a matter of convenience, we will require Ai=O when 0i=1. Let!' be a binary-valued decision variable, the value of which indicates whether (0- 1) or not (0i! =) head pointers are present in tile a.-rings of the SSM, where is 1,2,..., 10. Note that head

77 pointers may e p)reseflt even though the a -rillgs are niot actulaflly rings. For instance, head pointers may be used to advantage when the element blocks of a ring are stacked upon the head. Let 0!' be a binary-valued deciscion variable, the value of which in the ai-rings of the SSM, where i {1,2,..., 10}. We will assume that 0Xi can equal 1 only if 0i=1. That is, we will assume that reverse i 1 pointers are present in the a.-rings only if forward pointers are. What this really means is that if the a. -rings are linked via pointers in one direction only, then these pointers are forward pointers; reverse pointers are present only to provide two-way linking. Let hi be a binary-valued deciscion variable, the value of which indicates whether (3i=l) or not (3i=0) the bi-blocks of the SSM are to be eliminated from the structure, where ie {2,3,.', 10}. (The bf-blocks and the b l-blocks are assumed always to be present.) From our discussion of elimination we know that the value of Pi is determined by the values of hi_1 and A.hi where ie{2, 3,..,10}. In particular, if i- 1 = Ai = 1, then hi = 1; otherwise ti=O. In other words, the bi-blocks are eliminated from the SSM if and only if duplication has been applied to the ail1-rings and to the ai-rings, where ie{2, 3,',10}. Finally, let i- be a binary-valued decision variable, the value of which indicates whether ("i=1) or not (Ti=0) the bi-blocks of the

78 SSM contain a type field, where iE {1,2,, 11}. There are a number of constraints (some of which we have already mentioned) which the above decision variables must satisfy in order to insure a physically realizable structure. Each of these will De considered in turn Delow. Constraint 1 + ai.1 for all is {1,2,..., 10} This constraint implies that ~i and Ai may not both be equal to 1. That this should be true is obvious: i=1 implies explicit ai-rings of (forward) pointers and ai=1 implies the application of duplication to the rings, clearly an impossible situation. Constraint 2 0' < 0i for all iE {1,2,",10} This constraint implies that the a.-rings may not contain reverse pointers unless they contain forward pointers. Constraint 3 0 + a < 1 for all ic{1,2,,10} The purpose of this constraint is to prohibit head pointers from the a.-rings if duplication has been applied to them. Clearly, a head pointer would be to no advantage in a ring to which duplication

79 has been applied since the head is attached directly to each element block. Constraint 4 hi = i-l i for all iEr2,3,...,10) This constraint implies that the bi-blocks may be (in fact, must be) eliminated from the SSM if and only if duplication has been applied to the rings on both sides of the blocks. Constraint 5 0i + 0i+l+i+ i+l > I for all ie 2, 4,6,8} The purpose of this constraint is to prohibit the application of stacking to both (types of) rings which share common element blocks. Upon close examination of the SSM it becomes clear that we must generalize this constraint somewhat to yield the following constraint. k 0i+Aj+.k_> 1 i=] for all je{2, 4, 6,8}, kE {3,5,7,9}, and k > j The implication is that if each type of ring from the a.-rings to the ak-rings inclusive is subjected to either stacking or duplication, then duplication must be applied to either the:aj-rings or the ak-rings (or both). If tnis constraint is not met, we have the

80 (impossible) situation analogous to applying stacking to rings which share common element blocks. (In fact, if k=j+1, we have precisely that situation.) Perhaps a brief example would serve to clarify this point. Assume that j=2 and k=5 and that duplication has been applied to both the a3-rings and the a4-rings. In this case each combination of related b3-blocks, b4-blocks, and b5-blocks is "fused" into a single block which functions as an element of both an a -ring and an a5-ring. It follows that stacking may not be applied to both the a2-rings and the a5-rings simultaneously. Since we have assumed that either stacking or duplication has been applied to each, we see that duplication must be applied to either the a2-rings or the a5-rings (or both). We note that if stacking were applied to the a3-rings instead of duplication, then duplication must be applied to the a2-rings. Similarly, if stacking were applied to the a4-rings instead of duplication, then duplication must be applied to the a5-rings. There are a number of other decisions which we may want to make in characterizing the structure via the

SSM. In particular, instead of assuming that each bl-block and each bl-block contains adescription block indicator, we may wish to assume that the description blocks are attached directly to the b -blocks or to the b ll-blocks or both. Let a be a binary-valued decision variable, the value of which indicates whether (cr1=) or not (c-1=) the bl-blocks of the SSM contain description block indicators; if they do not, we assume that the appropriate description block is attached directly to each bl-block. Let c2 be a binary-valued description variable defined similarly for the b -blocks of the SSM. 11 When we defined the data item description block, we indicated that there is one description block for every data item. In examining transformations which may be applied to the SSM we have seen that it is possible for the transformed structure to contain more than one copy of a b -block and/or more than one copy of b l-block. We wish to reaffirm at this point the assumption that there is indeed exactly one description block for each data item. This means, of course, that if c -0, we must insure that there is exactly one copy of

82 each bl-block in the transformed structure. Similarly, if a2=0, we must insure that there is exactly one copy of each b l-block in the transformed structure. We will discuss the enforcement of these two constraints somewhat later in the discourse. The assumption of one description block for each data item perhaps requires some justification, Possibly, the most compelling reason for making the assumption is that the data item is the smallest, atomic unit of information to be represented and, hence, it should be represented in the structure by some unique, welldefined, closed-form device. Consider a simple example. Suppose that the value (i. e., the description) of a given data item varies in the course of the solution of a problem. For instance, the data item might correspond to the coordinates of a symbol being displayed upon a computer-driven graphical (CRT) display. If the symbol is being moved across the display, its coordinates will vary with time and must constantly be updated. Since all descriptions of such a data item must be altered each time its value changes, a single description is to be greatly preferred. A second reason for making the assumption is that when a data item is both a target and a source, the description block acts in a connective capacity, much as the element blocks which are shared

83 by baclk-to-lackli rillgs. Ally two sucllh rings shlriie atl Itlost (01e elenllt block between them. Thus, if thile aalogy is to carry over fully, there should be only one description block for a given data item. The connective role of the description block brings up another point. If cl=0 and cr2=0, it is clear that we may not apply stacking to both the al-rings and the a10-rings. It should also be clear that (except in special cases) whenever A and 1 are not disjoint, there must be at least one ie {1, 2, *, 10) such that the ai-rings of the SSM are explicit rings (i.e., 0i=1). This, of course, is to allow the structure to "wrap around". Thus, we have the following constraints. Constraint 6 o1+f2 +01+ 010~+A1+ 10 _ 1 This constraint disallows stacking of both the al-rings and the a10-rings when a1=0 and a2=0. Constraint 7 If a1=0, 2='0, and I[A+ I II > I1~1, then 10 i-1 This constraint implies the existence of at least one explicit

84 ring whenever A and Il are not disjoint. (We note that IA 1+ II > A~ I is always true. Hence, I 1+1 I = I AO I implies A and fl are disjoint, ) 10 We can relax this constraint somewhat to allow 1 0i=0 in cases i 1 for which A and H are not disjoint provided we can guarantee that d. r. d. and d. r. d. are not both true for any d. and d. eA 11 J1 112 11 11 12 and any r. and r. e P (dc and d. are not necessarily distinct, nor ~1 J2 12 are r. and r. ). J1 J2 The next set of decisions we may want to make involves the presence of the relation symbol name field in the b5-blocks, the b6-blocks, and the b7-blocks. We may desire to exclude this field from certain of these blocks. Therefore, we introduce the binaryvalued decision variables Pl,p2, and p3, where P1 indicates whether (p1=l) or not (P1=O) the relation symbol name field is present in the b5-blocks of the SSM, and P2 and p3 perform similar functions for the b6-blocks and the b7-blocks, respectively. Up to this point we have assumed that elimination of the bi-blocks from the SSM always occurs if duplication has been applied to the a. -rings and to the a.-rings,where { 22,3,.o,10}. (See Constraint 4.) We wish now to alter this assumption somewhat so that if the b.-blocks contain relation symbol fields, they will not be eliminated from the structure under any conditions. Clearly, this change applies only for ie{5,6,7}.) Therefore, we replace Constraint 4 by the following one.

85 Constraint 4' For i {2,3,4,8,9, 10} For ie{5,6,7} If Pi-4=1 Pi =0 Otherwise, Pi = hi-lAi This concludes our discussion of the decision variables required to describe the SSM. 3. 5 Quantification of Model Now that we have a formal manner (i. e., the decision variables) for describing the transformations which may be applied to the SSM, we would like to determine the quantitative effects upon the SSM of applying various transformations. I 3 2 Assume that for each of the sets of the DSM A a m' 2 m3 r Z I2 IH3H, and HI1 (for all possible values of their respective indices) we are given the expected number of elements therein (i.e., the average cardinality of the set). The expected number of elements contained in some set A:, where..{1, 2,. n3. can, for example, be determined from

86 n3 n3 = 1a This means that we know the expected number of element blocks in each type of ring of the SSM before any transformations have been applied. In particular, let k~ represent the expected number of element blocks in each of the a.-rings of the SSM (before the application of transformations), where ie 1,2,*, 10}. k% and kO will, of course, both be 1. Consider now the effect of transformation of the SSM upon the expected number of element blocks in each of the various ring types. Let ki represent the expected number of element blocks in each of the ai-rings of the SSM after the application of transformations, where i ef1,2, ~ 2o, 10}. Clearly, the application of stacking to the a.-rings has no effect upon k. - it remains equal to k~. Furthermore, we will assume that elimination has no effect upon ki. Even though the element blocks of the ai-rings may be physically absent, the function which they perform is not altered by this fact. We assume that k. represents in this case the expected number of "virtual" element blocks. It follows that ki is affected only by duplication. (We know from our earlier discussions that ki will indeed be affected by duplication.) For notational convenience, if x is a binary-valued decision

87 variable, we define x as having a value which is the complement of the the value of x. That is, if x=O, then x=1, and if x=1, then x=O. We may then write expressions governing the various ki for ie {1,2,..., 10} as follows: kl =A1 +Llk k 2=a 2+ 2k(4+A4k +A6k (A8+A8k810+ )))) k4=A4+4k 4 (26+A6k (8+8k( 810+A 10k 10))) k =A6 6i 810k0 We can easily verify the validity of these expressions. Consider first kr. If Ai=0, each al-ring of the SSM will be either an explicit ring (01=l) of k~ bl-blocks or a stack (01=0) of kj bl-blocks upon a b2-block. (k? is, of course, the expected - not actual - number of blocks in each of these rings or stacks.) If on the other hand

88 A1=1, each al-ring of the transformed SSM will consist simply of a bl-block and a (copy of a) b2-block, concatenated. Clearly, applying the duplication transformation- or not applying it - to any other type of ring in the SSM can have no effect upon the number of elements in the al-rings. (Applying duplication to the a2-rings, for instance, will affect the number of al-rings but not the number of elements contained in each.) The expression for k1 follows directly. Consider next k3. Assume for the moment that Al=0. Then k will behave exactly as kl: if A3=0, each a3-ring of the SSM will be either an explicit ring (3-=1-) of k3 b3-blocks or a stack (03=0) of k~ b3-blocks upon a b4-block; and if A3=1 each a3-ring will consist of a b -block concatenated with a b -block. On the other hand if!1=1, each b3-block will be replaced by k b3-blocks, one for each time the b2-block of an al-ring is duplicated upon a bl-block. In 2 1 this case, if A 3=0 each a3-ring or stack will contain kok0 b3 -blocks. If A3=l, of course, each a3-ring will still consist of a single b3block concatenated with a b4-block. Again, applying the duplication transformation to any rings other than the al-rings or the a3-rings can have no effect upon the number of elements in the a3-ringso The expression for k3 follows directly. 3

Using arguments similar to those above, we can easily justify the expressions given for the remaining ki. In addition to the number of element blocks in each a..-ring (i.e., the number of element blocks associated directly with the head of each a -ring), we may desire to know the number of element blocks not actually in each a.-ring but associated indirectly with the head of that ring. For instance, we may want to know the number of b5-blocks associated with a particular b2-block. Since there are k2 a -rings associated with each b2-block (from the fact that the a2-ring of which the b2-block is head contains k2 b3-blocks and there is one a4-ring associated with each of these b3-blocks) and since there are k4 b -blocks in each a4-ring, we determine that there are k2k4 b5-blocks associated (indirectly) with each b2-block. For the moment let the notation bi/b. designate the number of b.i-blocks associated (either directly or indirectly) with each b.-block 1 J Let us then define a 5 by 6 matrix K which has as its element s those quantities indicated by Table 3-1.. Using arguments similar to that used above to determine b5/b2=k2k4, we may determine values for the remaining elements of K. These' values are given by Table 3-2. We will denote the element in the j-th column of the i-th row of Kby K.. where ie {1,2,..o,5} and je{1,2,..o,6}. It might be well at this point to consider a number of other

90 1 2 3 4 5 6 1 |bl/2 b3/b2 ba/b2 /b b2 bg/b2 bll/b2 bl/b4_A3/b4 b b/4 b/b4 /b4 b b b/b b9/b b/b 4 bl/b8 b3/b b8/b8 b/b8 9 /b8 bl/b8 5 31 Defb3 /b1 b /bl0 bm/b bib of b11 b0 Table 3-1. Definition of Elements of Matrix K.

2 3 4 5 6 1 Iik2k kkk4k6.k2kqksk8 k2k4kgkgk10 kk k k kk kkk f kk kk 13 3 4 46 468 4681O ij j -— i-~~~-~- -- ~ ~i 3 kkk kk k k kk kk 1 3 35566 8 i 6 8 1O 4 klk3k5k7 k kkk kk k I kkk 357 57 7 8 8 10 5 kkkk kkkk kk... kk k k 1 3 5 9j 359_j 59 9 9 10 Table 3-2. Values of Elements of Matrix K.

92 quantities which are similar in many respects to those just considered and which will prove useful somewhat later. Let ma represent the number of copies of each bl-block generated by transformation of the SSM. Before any transformations are applied to the SSM, ma=l, of course. Using an argument similar to that used in obt aining expressions for the various ki, we obtain the following expression for ma. ma = 2+A 2k 4+A4k ( 6+A6k0 (-8+A8k (10+A10k 10)))) Similarly, let m represent the number of copies of each p b -block generated by transformation of the SSM. m =, A+ k9k (7 +A7k7 5+A5k5(3 +A3k3(A 1+k)))) Let mr represent the number of copies of a given b5-block associated with the (copies of the) b1-block representing a given source. m =A +A6k(8+8A o(-10+(A 0 10)) r1 =6668881101 80 Finally, let m represent the number of copies of a given 2 b7-block associated with the (copies of the) bll-block representing a given target.

93 1.2 r) 5 33 3 1 1 1 We indicated earlier that when a1=0, we must insure that there is no more than one copy of each bl-block in the SSM, and when a2=0, we must insure that there is no more than one copy of each bll-block. We now have the means to enforce these two constraints. If a =0, we must restrict A2, 2A4, A6, A8, and A10 to values which guarantee m =1. Similarly, if r =0, we must restrict A1, A3 A5, A7, and A9 to values which guarantee m =l1 In particular, "1 m_ < and a-2 "i <1. Our discussions so far have been concerned mainly with the number of blocks of one type which are associated with a given block of some other type. We will, however, also have need for the total number of blocks of a given type. Let mi represent the (average) total number of bi-blocks in the SSM, where ie{1,2,', 11}. There are k~ sources and m b -blocks representing each a al source in the SSM. Therefore, m =m k~ Since there are k b1-blocks associated with each b2-block, we know that m2=ml/k1. Associated with each b2-block are k2 b3-blocks. Hence, m3=k2m2. Clearly, m4=m3/k3, and so on. Carrying out all indicated multiplications and divisions will result in the expressions given in Table 3-3.

94 m1 =m k a a m m k~ 1 a a m =- = kkmm k0 2 aa m k m 2 aa 3 2 2 k1 m3 k2m k~ m 3=2 a a 4 k3 k1 k3 k2k4mak a m5 =k4 k=k3 k k2 mak a m -_4 k1k3k k2k4k6 ak a m7 = kkkm ik in5 z5k4 2 4 a a m_~k= m109 k _ k kk ink k2k4k6kk aok ml= k10m10 klk3ksk7k9 Table 3-3. Nubers of Blocks, by 5kye, i the SSM7 in =kin 246 48 aa

95 We also know, of course, that mll=m kO. Hence, the following P p equation must always hold. klk k 7kg mpk0 k2k k kkk m k~ 1 3 5 7 9 pp 246 810 aa In particular, it must hold for the SSM before the application of any transformations: kOk kO kO kO k- kO k kOkkO kO ko 1 3 5 7T9 p 2 4 6 8 10 a Recalling that k- =1 and k = 1, we finally get the equation kOko k kOkkOko kO We1 3wi e9 fp 2 4 8 10 a We will later find this relationship to oe of some use to us.

96 Let us derive another expression which is, at the least, of some academic interest. Assume that no transformations have been applied to the SSM. The number of b6-blocks associated with a given bl-block is easily determined to be k k0. This number, however, also corresponds to the number of relation symbols associated with a given source (i. e., I P. I where die A 1 1 is the data item represented by the given bl-block). The number of b 1-blocks associated with a given b6-block is k k 0 (recalling that k= 1), and this number corresponds to the number of targets associated with a given source/relation symbol pair. Therefore, the number of relation symbol/target pairs associated with a given source must be kOkokOkO We know in addition that A 2 4 8 10k contains kO sources. It follows then that the total number of relation a instances (di rj pk) to be represented by the SSM (and described by the DSM) is kkOkokOk k0. Using the result obtained above, 2 4' 8 10 a we may say that the total number of relation instances is also given by ko ko kO kO 1 3 7 9 p Continuing along these lines, we know that the number of targets associated with a given source/relation symbol pair cannot exceed the number of targets in H. Therefore, kO k < ko 8 10 - p

97 Similarly, the number of sources associated with a given relation symbol/target set pair cannot exceed the number of sources in A. Therefore, ko k< kO 13- a Also, the number of relation symbols associated with a given source cannot exceed the number of relations in P. As a result, k~ok0 < ko 2 44- r The same thing is true with regard to the number of (distinct) relation symbols associated with a given target. Recall, however, that a given relation symbol may be associated with the same target more than once. If we assume that the expected number of times the same relation symbol is associated with a given target is represented by mr then we can say kO7kg kO mr r p Since each relation in P must appear in the SSM at least once, the number of b6-blocks must be at least as great as I P I. Hence, kO a > kkO......> kO kok0 - r 1 3

98 This last result suggests another. Let mr represent the expected number of b6-blocks which contain the same relation symbol. Since the SSM contains m6 b6-blocks and since there are k~ distinct relation 6 6 r symbols, it follows that m6 kk2k4m k~ 6 2 4 aa r k k lkk5k~ r lk3 r which becomes kOkOkO =24a r k~kOk0 13r if no transformations have been applied to theSSM. 3.6 Flexibility of Model The reader has undoubtedly noticed that despite our stated desire to design a storage structure model capable of assuming the form of any of the three basic storage organizations, we have made no mention of the random organization. Be assured, we have not forgotten. Our reason for ignoring the random organization to this point is simply that the random organization is basically the same as the list organization, the principal difference being that in a list organization a given item contains an explicit pointer to another item, whereas in a random organization the given item contains a key which is used

99 to determine in some way a pointer to the other item. The result is that in getting from one item to another the random organization will generally be slower than (or at least different in time from) the list organization (since one must look in a dictionary, do some calculation, etc. in order to determine the pointer). Later we will see that this difference can easily be reflected by assuming different amounts of time for following a list pointer and for following a random "pointer". Since in general a key is required for each random "pointer", the number of fields (although possibly different in size) will be the same as for a comparable list organization. However, in comparing the storage requirements of the two organizations one must be careful to include the storage required of a dictionary, if that particular method of implementation for the random organization is used, since it will not be included explicitly in our model. Thus, our model does indeed encompass the random organization without extension or modification. Let us examine now the flexibility of our model. At the very least we would like some assurance that the SSM is capable of representing such basic structures as a simple stack and a simple linear list. Presupposing a structure such as a stack or linear list assumes,

100 of course, that the intrinsic structure of the data to De represented is linear in nature. This means that each relation to be represented is one-to-one. That is, a given data item is related to exactly one other data item by some given relation and vice versa. If this is true, it follows that k =1 for all ie{1, 2,, 10}. The SSM will then appear as in Figure 3-11. Suppose that we assign the following values to our various decision variables: 0i - = = i 0 for all i {1,2,,10} Ai =1 for all ie {1,2,*,10} Ti = 0 for all ie{1,2, ~,11} 1 2 = -0 2 =1 P1 = P3 = 0, P2 = 1 It follows from Constraint 4' that 0= 0 and 1i = 1 for all i {2,3, - 10} and i, 6. The SSM will now appear as in Figure 3-12, where the double-walled field of each block corresponds to a description block. If there is no occasion to access a relation instance via the relation symbol, we may choose to eliminate the relation rings from the structure. (We assume that the relation rings are always present, but if our analysis should show that they are never used, we may obviously eliminate them from the structure.) We may make

TYPE d di —- - di2 Block Ring 8 7 b2 02 6 b3 5 03: = < b4 4 __ C r2 b rjlC fj2 b5 a5 000... o... b6 06 rjl - b7 a7 b8 a8 be 09 2 blo Blo. PkL o-'P kz2 bll Figure 3-11. Storage Structure Model for all k~=l 1

102 di, di2 [ O3 Pk I k Pk2 Figure 3-12. Transformed Storage Structure Model - 1

103 similar decisions about the source and target rings (not shown in the figure). Furthermore, if P contains but a single relation or if we are not interested in the individal relation symbols per se, we may set P2 = 0 and eliminate the relation symbol name field from each block. Assuming that we choose all of these options, the SSM will appear as in Figure 3-13 (where we have assumed for illustratioe that A and H are not disjoint and that any two data items are related by at most one relation). Clearly, the structure shown in Figure 3-13 is a simple linear list. By setting o2 = 0 it can be transformed into a simple stack. Thus, we have satisfied our first requirement, being capable of representing the very basic storage structures. As another example to illustrate the flexibility of the SSM and its ability to represent in varying degrees of detail the structure of the DSM, suppose we are given some collection of data items for which the untransformed SSM appears exactly as shown in Figure 3-10 (i.e., with all dotted lines made solid). Furthermore, suppose that the only information which we wish to "share" is that concerning target sets. That is, we want to represent explicitly only the composition of each target set and the source/relation symbol pairs associated with that target set. Clearly, all other "shared" items will be implicit in the resultant structure but will require

104 L|trj _ Idi, Id I ig- 1 Tnr 4 S Figure 3-13. Transformed Storage Structure Model - 2

105 interrogation of the structure to determine. Let us assign the following values to our various decision variables: h1 2 = 3 = 45 0,7 = 08 =1, 9 —10 = 0 0i "- 0' =~for all iE {1, 2,.. 10} A1 2 =3=A4= A5=A6=1' A7A8=0, A9=A10=1 T. = 0 for all i {1,2,',11} 1 =0 a2 =1 P1 =P3=0, P2=1 By Constraint 4' P2-3 - 4 = 5 =I P6 = 7 = 8 -9 0' P10 -1 The SSM will then appear as in Figure 3-14. Each a8-ring in this structure represents a target set and each a7-ring represents the set of all source/relation symbol pairs which are associated with a given target set. It is evident that using the SSM we can represent as much or as little of the detail of the DSM as we wish. As a somewhat different example, let us consider how the software-simulated associative memory designed by Feldman (and mentioned in Chapter I) could be specified within the context of the SSM.

TYPE Block Ring rj, g 1- -— "- r. - - j! - — "" b6 07 b8 08 b9 _ I dLI - J I LJ Figure 3-14. Transformed Storage Structure Model - 3

107 In Feldman's implementation each cell of the "memory" contains five fields as shown in Figure 3-15, three information fields F1, F2, and F3 and two link fields, L1 and L2. Recall that the address of a given cell is determined by hashing the contents of fields F1 and F2. The L1 field of a cell is used to associate additional cells (of the same form as that shown in Figure 3-15) with the cell accessed by hashing F1 and F2 in order to handle the multiplicity problem (i. e., the situation in which more than one value for the field F3 is associated with a given combination of values for fields F1 and F2) and the overlap problem (i. e., the situation in which different F1/F2 value pairs hash to the same address). In particular, the L1 field contains a pointer to one of the cells in a ring of cells containing the value combinations of all triples (F1,F2, F3) for which the values of F! and F2 hash to the given address. The L2 field, on the other hand, is used to link in rings all cells having the same values for the field F3. These rings make it possible to determine the values of fields F1 or F2 or both given a value for field F3. Suppose, for instance, we are given the following triples of values for the fields F1, F2, and F3 (in that order):

108 FI F1 FS Fiur eI L 2 -. Figure?-15o Software-Simulated Associative Memory Cell

109 (al,bl, c1) (al,bl, c2) (a2, b2, c3) (a3,b3,c3) where the combination of a1 and b1 hashes to the same address as the combination of a2 and b2o The resulting structure would appear as shown in Figure 3-16, where the dashed pointers represent addresses determined by hashing the indicated pairs of values. Let us define our data items and the corresponding data structure for this case as follows: (1) Let the set S consist of all (ordered) triples (ai, bi, ci3) of values for the fields F,F2, and 1 2 3 Fo to be represented by the storage structure. 3 (2) -{(ai, bi )I (ai, bi2, ci3) e for some ci } 1 2 11213 3 (3) I={ ci b(ai,b c )ES for some a. and b } 3 I1 i2 i3 11 12 (4) Let P contain a single relation r from A to II with the solution set R such that ((ai,bi ), Ci3) e R -'~(ai,bi,ci3) c

110 (a, b) (a2,b2) (a3 b3) \ / I C\ / I Fl b-1 i S 2 A i M yb2 C3t Figure 3-16. Software-Simulated Associative Memory Structure

11 1 Since in his storage structure Feldman permits - in fact, if conditions warrant, requires - the duplication of data item descriptions, we shall relax (for this case) our earlier assumption that data item description blocks may not be duplicated* Let us assign the following values to our decision variables: i = 0 - 0i -=O for all ie1, 2,.., 10} Ai = 1 for all i1, 2,.., 10} T. =0 for allie{1,2, 11} 1 o 2 = 0 Pl =P2 =P3 =0 It follows from Constrint 4' that 3i = 1 for all i {2,3,, 10}. These decision variable values result simply in the stack structure described earlier, except in this case the sets A and II are disjoint and the relation considered is not one-to-one. Since we have allowed description blocks to be duplicated, the storage structure consists of isolated blocks of storage each of which contains the values of one of the triples in S. Passing through each of these blocks is a source ring, a relation ring, and a target ring. To be completely general, we could include in our model a decision variable which indicates whether or not multiple data item descriptions are to be allowed. Although this can very easily be done, we choose not to do so.

If we assume that (1) instead of a source ring for each distinct source in A, we have a source ring for each set of sources which hash to a common address, (2) instead of a separate head for each source ring, we use that b -block which has the address to which all sources in the corresponding set hash, and (3) the relation ring (of which there is only one) is discarded, then the resultant structure is identically equal to Feldman's. The source rings correspond to Feldman's rings of L1 links and the target rings correspond to his rings of L2 links. We can easily justify the two assumptions concerning the source rings. First, combining into one source ring all sources which hash to a common address is reasonable (if not mandatory), since these sources are indistinguishable unless a different method for determining the location of the head of a source ring is used. Second, using one of the elements (although a special element) of a source ring as head instead of creating a special, distinct head does not alter the structure or utility of that ring. In fact, this may always be preferable, since it reduces the size of the ring. As a final somewhat more complex example, let us consider how Childs' [11] storage representation of his so-called Set-Theoretic Data Structure (STDS) could be specified within the context of the SSM. Note that Childs' use of the term data structure corresponds roughly to our definition of the term storage structure.

Childs defines a Set-Theoretic Data Structure as a storage representation of sets and set operations such that given any family N of sets and any collection S of set operations, an STDS is any storage representation which is isomorphic to N with S. In particular, an STDS - shown schematically in Figure 3-17 - is composed of five structurally independent parts: (1) a collection S of set operations (2) a set B of data item names (3) a collection of data item definitions, one for each data item name (4) a collection N of set names (5) a collection of set representations, one for each set name Sets, the collection N of set names, and the collection B of data item names are represented by blocks of contiguous storage locations. The address of a location in the block representing N is a set name and the content of a location in this block is the address of a block representing the corresponding set. Similarly, the address of a location in the block representing B is a data item name and the content of a location in this block is the address of a stored description of the corresponding data item. The blocks which represent individual sets contain the names of the data items which constitute the sets. In an effort to minimize the storage occupied by the STDS, Childs defines two types of sets: generator sets and composite sets. Only the

SET NAMES: N SET REPRESENTATIONS DATA ITEM NAMES: B DATA DESCRIPTIONS no HEAD N (1) bo HEno+ I bo+l t GENERATOR COMPOSITE SETS no (nI - /\~~1. | (I) through (n*-l) are sets of pointers to composite sets in N. t ~~~S )El ~~ (n*) through Q (n) are sets of pointers to generotor sets in N. r( i) ore sets of pointers to generator sets in N. Figure 3-17. Set-Theoretic Data Structure (Childs)

115 generator sets have storage representations; the generator sets are disjoint; and the composite sets are unions of generator sets. With these general characteristics in mind we can examine the STDS in more detail. The block of locations representing the collection B of data item names is assumed to have location b as the address of 0 its head, The first location containing a pointer to a data item description has the address b +1, and the location containing a pointer to the O i-th data item description has the address b +i. If b represents the 0 cardinality of the set B, then the last location containing a pointer to a data item description is b +b. Since all pointers to data item descrip0 tions are located between b +1 and b +b, B may be represented by the O O set of integers {1, 2,. o,b}. Therefore, any integer i such that 1<i<b is the data item name for the pointer to the i-th data item descriptiono The pointer to the i-th data item locates a block of storage containing a description of the i-th data item and a list of all generator set names (elements of N) of which the i-th data item is a constituent. The black of locations representing the collection N of set names is similar to the block representing B, with n0 and n as the address of the head and the cardinality, respectively. The contents of the block representing N are pointers, also. The pointers fall into two classes and are distinguished by an integer n* such that 1<n*<n. For all 1<i<n*, i is the name of a generator set, and for all n*<i<n, i is the name of a composite set. A generator set has a set representation,

while a composite set does not since it is the union of some generator sets. For i>n* the pointer in n +i locates a block of storage containing the names of generator sets, the union of which forms the corresponding composite set. For i<n* the pointer in n +i locates a block of storage containing the names of all composite sets which use generator set i and a pointer to a block of locations containing the names of those data items which form the set. Let us define our data items and the corresponding data structure for the STDS as follows: (1) Let N be the collection of set names. (2) Let M be the collection of sets themselves. (3) Let M* be the collection of generator set descriptions. (4) Let B be the collection of data item names. (5) Let D be the collection of data item descriptions. (6) Let r1 be a one-to-one relation from N onto M which assigns to every set name in N a unique set in M. (7) Let r2 be a one-to-one relation from M to M* which assigns to every generator set in M a unique description in M*. (8) Let r3 be a relation from M to N which assigns to every generator set in M the names in N of those composite sets of which the given generator set is a subset.

(9) Let r4 be a relation from M to N which assigns to every composite set in M the names in N of those generator sets which are subsets of the given composite set. (10) Let r5 be a relation from M* to B which assigns to every generator set description in M* the names in B of those data items which are elements of the given set. (11) Let r6 be a one-to-one relation from B onto D which assigns to every data item name in B a unique description in D. (12) Let r7 be a relation from D to N which assigns to every data item description in D the names in N of those generator sets of which the given data item is an element. From these definitions we see that P = {rj j1, 2,..,7} A =II = NU M U M* U B U D Let us assign the following values to our decision variables: 0=~ 0' - i =0 for all ie, 2,...,10} A.i =1 for all ie 1, 2,.., 10} T. =0 for all ieT1, 2,..., 11} 1 =-' 0,2 1 Pl =P2 =P3 =0 It follows from Constraint 4' that {i = 1 for all ie{2, 3,.., 10}.

Assuming that source, relation, and target rings are not used, these decision variable values result in the storage structure of Figure 3-18. Ignoring for the moment the fact that in the STDS the blocks representing set names are adjacent to one another, as are the blocks representing data item names, it is clear that the STDS is simply the storage structure of Figure 3-18. To elaborate upon this somewhat, let us examine the various blocks of the STDS as shown in Figure 3-17, starting at the left with the blocks representing set names and working to the right through the blocks representing generator and composite sets, generator set descriptions, data item names, and data item descriptions. Each block representing a set name (i. e,, each location in the block of locations representing the set N) is equivalent to the structure of Figure 3-18 for m=1. In this case the description block of the SSM is null and the relation of interest is rl. Each block representing a generator or composite set is also equivalent to the structure of Figure 3-18. The description block of the SSM is again null, but m> 1. For composite sets the relation of interest is r4. On the other hand, for generator sets there are two relations of interest, namely, r2 and r3. If we assume that the description block indicator for the target of relation r2 (which has only a single target since it is one-to-one) appears first in the stack of description block

p - IOapo anpon aS aozoao4S patuLoJsleaL'-t-g Z- I7l ~d ~LdI 6P,,

120 indicators, followed by those for the targets of relation r3, then there is no need of a relation symbol field and we may continue to assume P = P2 =P3 -= 0. The blocks representing generator set descriptions correspond to the structure of Figure 3-18 for relation r5 and m> 1. The situation for the blocks representing data item names is analogous to that for the blocks representing set names and the relation of interest is r6. Finally, each block representing a data item description is equivalent to the structure of Figure 3-18 for m> 1 and r7 as the relation of inte re st. Let us consider now the adjacency of the blocks representing set names and that of the blocks representing data item names. Since the collection of set names and the collection of data item names act as the (only) entry points into the STDS, we will include (contrary to our earlier assumption) source rings for the set names and the data item names. Because there is no duplication of these names within the STDS, the source rings need not be closed, however. In particular, a source ring need contain only a pointer from the head of that "ring" to the b 1-block representing the corresponding source, and the b 1-block need not contain a pointer back to the head. Moreover, by stacking all blocks representing set names and by doing the same for all blocks

121 representing data item names and then by assuming (as Childs does) that set names are integers between 1 and n and that data item names are integers between 1 and b, we may implement the source ring "pointers" via a special case of the calculation method of the random data organization described in Chapter I. Given a set name (an integer) or a data item name (an integer), we can determine the (unique) address of the corresponding block simply by adding no or bo, respectively, to the name, The result is the STDS described by Childs, We note that for completeness we may also wish to define two ordering relations r8 and r9 for the collection N of set names and the collection B of data item names, respectively. These relations may be represented within the context of the SSM simply by setting r2=0 and using all other decision variable values as given for the structure of Figure 3-18. This concludes our discussion of the Storage Structure Model. We have described in this chapter a model for the storage structures capable of representing, via the three basic organizational methods, the intrinsic structure of a given collection of data items as described by the Data Structure Model of Chapter II. The model consists of a given relatively general storage structure (Figure 3-10) together with a number of transformations which may be applied to this storage structure, the combination of which is described by a number of binary-valued decision variables. In addition we have shown the model to be flexible and capable of representing the intrinsic structure of data in varying degrees of explicit 1etail.

Chapter IV ANALYSIS OF THE STORAGE STRUCTURE MODEL In the previous chapter we developed a model for the storage structures capable of representing the intrinsic structure of some collection of data items as given by the Data Structure Model developed in Chapter II. Our purpose for developing the Storage Structure Model was not simply to illustrate the fact that the intrinsic structure of some given collection of data items can be represented by a multitude of storage structures. Rather it was our purpose to develop a vehicle by means of which we can compare the relative merits of these storage structures. Our goal in this chapter will be the development of certain measures of performance for the SSM which we can use to make these comparisons. 122

123 4. 1 Measures of Performance It is generally conceded that the amount of time required to perform certain operations upon a given storage structure and the amount of storage occupied by that structure are the two most important factors to be considered in determining the "goodness" of the structure. One might argue that a factor such as the number of man-hours required for implementation of the software associated with a given storage structure is equally important, but if the structure is to see more than limited use, this factor is of transient interest and, hence, of little importance by comparison. It is also generally (but not always) true that one may reduce time at the expense of increasing storage and vice versa. Under many computer operating, systems- the batch, monoprogrammed, real-memory systems, for instance - the cost of solving a given problem (i. e., running a given program) is based soley upon the CPU time required to achieve the solution, with the only constraint placed upon the storage being that the program and storage structure must fit into the available storage (usually, quite substantial). At the other extreme, with time-shared, multi-programmed, virtual-memorysystems; the cost of solving a problem is often a function of the amount of storage occupied as well as the CPU time,

124 although for these systems the amount of storage available for a program and its storage structure is virtually unlimited. In the first case, that storage structure which causes CPU time to be minimized and occupies no more than the storage available is clearly the best. In other words, we are willing to use any amount of storage, up to the maximum amount available, as long as the use of that storage allows CPU time to be reduced. In the second case, however, no such clear-cut policy exists. Suppose that the cost C of running a program is given by C =k [ T(1+O.O1P)] where T represents CPU time, P represents the average number of pages of virtual memory required during the running of the program, and k represents the dollar cost per unit of CPU time. (This is the charging scheme currently in use for MTS, the Michigan Terminal System, at'The University of Michigan. [ ] ) Increasing by 100 pages the amount of Storage used is equivalent to increasing CPU time by T units. The question we would like to ask is, How much must CPU time be reduced in order to justify an increase of 100 pages in the virtual memory used? Let T represent CPU time for P pages of memory and let T-t represent CPU time for P+100 pages, where t represents

125 the net reduction in CPU time. Then (T-t)[ 1+. 01(P+100)] < T(1+. O1P) must be true in order to justify the increase in storage. It follows that t >2+ - 2+. 01P must be true to effect a net gain when storage is increased by 100 pages. Figure 4-1 contains a plot of t/T, the minimum fractional reduction of CPU time required to offset an increase of 100 pages of storage, versus P, the initial storage requirement. Clearly, a smaller percentage reduction in CPU time is necessary to justify an increase in the storage used if the storage requirement is large to begin with. We are faced, however, with the problem of not knowing how CPU time and storage will vary as the storage structure is perturbed Moreover, we have considered but a single cost function. We must therefore, develop a procedure for evaluating the time and storage requirements of each storage structure represented by the SSM, while at the same time remaining flexible enough to consider a variety of cost functions. Basically, the procedure which we will use here is to determine

126 t 0. 6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P (IN HUNDREDS) Figure 4-1. Minimum Fractional Reduction of CPU Time Necessary to Offset an Increase of 100 Pages in Program Size for Cost Function C=k[ T(1+0. 01P) ]

127 the storage structure that minimizes time, subject to a constraint on storage. Clearly, this guarantees an optimal storage structure for the batch, nonoprogram med, real-memory systems. For systems with more complex cost functions, if we ease the storage constraint to allow for virtually infinite storage, we will be guaranteed a storage structure which is optimal in the sense that it minimizes time, although it may not be optimal in the sense that it minimizes the given cost function. Then using the storage structure determined in this manner as a basis for comparison we may constrain the available storage to be less than that required for our comparison structure and determine the storage structure which minimizes time subject to this constraint. If the storage structure thus determined results in a lower value for the given cost function, we may use it as our comparison structure and repeat the procedure. If the value of the cost function exceeds the best so far, we may ease the storage constraint somewhat and then repeat the procedure. If the given cost function behaves as shown in Figure 4-2(a), this procedure will allow us to come as close to the optimal solution as we wish, but if the given cost function behaves as shown in Figure 4-2(b), we may in fact determine a solution which is only suboptimal. In each of these figures C* represents the value of the cost function for the storage structure obtained when storage is unconstrained.

128 z o0 0 z (I) o LIMIT ON STORAGE AVAILABLE (a) z 0 C,) LIMIT ON STORAGE AVAILABLE (b) Figure 4-2. Possible Cost Function Behavior

129 Since we have no way of determining the behavior of the cost function we can guarantee only a suboptimal solution (provided the process converges). Of course, if we are willing to determine the best solution for every possible storage constraint between 0 and the storage required for our initial solution, we can always guarantee an optimal solution. Our discussions for the remainder of this chapter will assume simply that we are interested in determining that storage structure which minimizes time subject to a constraint (possibly infinite) on storage. 4. 2 Time Cost Function We have determined that in order to compare the relative merits of a collection of storage structures we must characterize each storage structure by two quantities: time and storage. The storage characteristic of a given storage structure is simply the number of storage units occupied by the structure and, hence, is relatively well defined. The time characteristic of a storage structure is not, however, so well defined. Intuitively, we feel that the time characteristic should be some measure of the amount of time required to perform certain operations upon the given storage structure. These operations may, of course, vary with the problem being solved. We will define a number of primitive operations which are

130 representative of the types of operations one might wish to perform upon a collection of data items and from which one can construct other more complex operations. Then in order to define a particular problem which we wish to solve, we will assign weights to each of these primitive operations to reflect the relative frequency with which the operation is used in the solution of the problem. Let Qi where i {1, 2,..., N} represent some primitive operation and let ai represent the relative frequency of that operation. As a matter of convenience let us place the following constraints upon the various ai: 0 < ai < 1.0 foric{1,2,',N} N a. =1.0 i=l Then if we can determine the number of time units t. required to perform operation Qi,where ie {1, 2,..,N}, using a particular storage structure, we will define the time cost T for that storage structure as N T- a.t. i=1 1 1 Assuming no constraint on storage, that storage structure for which T is the smallest will be considered the optimal structure to

use in the solution of the given problem (as defined by the various ai). Thus, we are faced with two problems: (1) Definition of the primitive operations Qi for ie{ 1,2, -,N}. (2) Determination of the time cost ti for each primitive operation Qi. In defining the primitive operations we may separate them into two classes: interrogative operations and manipulative operations. Interrogative operations query a storage structure to determine certain information about the data represented, such as the targets associated with a given source/relation symbol pair. On the other hand, manipulative operations result in alterations to the structure to effect changes in the data represented, such as the addition of a relation instance to the structure. Although the manipulative operations are no less important, we will confine ourselves to considering only the interrogative operations. The reason for this restriction is simply to narrow the scope of our problem. The techniques which we develop here will be equally applicable to both types of operations. Thus, given sufficient time we could extend our consideration to the manipulative operations as well.

132 4. 2. 1 Primitive Operations Since our model for the intrinsic structure of data is based upon the relation instance, it would seem logical to base our primitive operations upon the relation instance, also. The form of the relation instance is d r p where dEA, reP, and p II. We may define our primitive operations by assigning to the three fields of the relation instance form fixed values, variables, and "don't cares" in various combinations. The operation defined by assigning some particular value to each field of the relation instance form simply determines whether or not the resultant relation instance is true. For instance, if we assign the values die, rje P, and pke I to the three fields of the form (in that order), the resultant triple di rj Pk may or may not constitute a valid (or true) relation instance. This operation compares the triple di rj Pk with each known relation instance. If a match is found, the value of the operation is true; otherwise the value is false. For ease of reference we will characterize this operation by the triple di rj Pk' which we will call its prototype. Suppose we assign the values d e A and r je P to the first two fields of the form and a variable to the last field, which results in the IlI)ro)totyt)e d.i r1. whrl'e tihe - " I)reeseltS t Variable. rlThe I ]1,

133 operation defined in this manner determines all targets p which satisfy di rj p. We assume that the source di is associated with the relation symbol r.. Therefore, this operation will find at least one target satisfying di rj p. We can define two other operations with prototypes d. - Pk and - rj p. respectively, which perform similar functions. Instead of assigning a variable to the third field of the form as above, we might assign a "don't care", which results in the prototype di r; *,where the "*" represents the "don't care". This operation determines whether or not the source d. is associated with the relation symbol rj and has a value true or false, accordingly. Again we can define two similar operations with prototypes di* Pk and * ri Pk' respectively. If we assign a particular value to but a single fi ld of the relation instance form, we have several other possibilities available to us. The prototype - rj - defines an operation which determines all source/target pairs (d,p) such that d rj p is true. The prototypes di — and - - define similar operations. The prototype - rj* defines an operation which determines all sources d associated with the relation symbol r.. Similar to this operation are those with the following prototypes: * r -, 4 -, d * - - * and * - All of these operations are summarized in'rable 4-1.

134 Ope ration C ode Prototype Description Q1 di rj Pk Is di r. Pk true? 11 jk Q d. r. * Is d. associated with r.? 2 1 J 1 j Q3 dr. - Determine all p such that di rj p. * rj Pk Is Pk associated with r.?.Q5 rj Pk Determine all d such that d rj Pk. Q6 d i Pk Is Pk associated with di? Q7 d. - Pk Determine all r such that di r Pk' Q8 ~ r. * Determine all d associated with r.. Q9 * r. - Determine all p associated with rj. Q10 - rj - Determine all (d,p) such that d rj p. QI d. - * Determine all r associated with d.. Q12 d. * - Determine all p associated with d.. Q13 d. - - Determine all (r,p) such that d. r p. Q134 * Pk Determine all d associated with Q1-4 Determine all d associated with Pk. Q15 * -Pk Determine all r associated with Pk. Q1 6 - - Determine all (d, r) such that d r Pk * indicates "don't care" - indicates variable field Table 4-1. Primitive Operations

135 We could have defined an operation with the prototype -—, that is, one with variables in all positions of the relation instance form, but such an operation would provide all the relation instances of the structure and is, therefore, not very enlightening. Similarly, we could have defined operations with the prototypes di**, *rj, and **Pk, respectively, but since we assume that di, rj and Pk must appear somewhere in the structure, these operations provide no information at all. Clearly, we still have not exhausted all the possibilities for operations which may be defined using the form d r p. We have certainly considered all possible operations for which the fixed quantities are constrainted to be single-valued, but we could define similar operations for which the fixed quanities are allowed to be mutli-valued. For instance, we could define an operation which determines all targets shared by some set of source/relation symbol pairs. We choose not to do this, however, for the same reason we are not considering manipulative operations. Nonetheless, the sixteen primitive operations which we have defined should be representative of most (interrogative) operations we might want to consider. We are now faced with the problem of determinirg the time cost ti for each primitive operation Qi where ie {1,2,. ~, 16}.

136 But in order to determine ti we need some algorithm or procedure which describes the implementation of operation Qi. Clearly, there does not exist any such unique procedure. Consider for example operation Q1: di rj Pk For this operation we may define four basic procedures: (1) Find d. in the storage structure. Examine all relation symbols associated with d. in search of r.. If rj is found, examine all targets in the corresponding target set in search of Pk' (2) Find rj (the relation, not just a relation symbol) in the storage structure. Examine all sources associated with r. in search of di If d. is found, examine all targets in the corresponding target set in search of Pk. (3) Find r. in the storage structure. Examine all targets associated with r. in search of Pk' If Pk is found, examine all sources associated with the appropriate relation symbol/target set pair in search of d.. (4) Find k in the storage structure. Examine all relation symbols associated with Pk in search of rj. If rj is found, examine all sources associated with the appropriate relation symbol-target set pair in search of d.. 1

137 Even each of these basic procedures is subject to some variation. For instance, the process of searching for rj given di can be implemented in a number of different ways (although the differences may be slight). However, for our purposes we will assume that each of the basic procedures has some unique implementation. We will choose one which we feel is representative of all the implementations of the given basic procedure and let that implementation be the "unique" one. Each of our primitive operations will then be characterized by a number of "unique" basic procedures which we will call methods Operation Q!(di rj pk) is characterized by four methods, for instance. Descriptions of the methods for each of the primitive operations are contained in Appendix A. Let us introduce a simple notational scheme for describing the methods of an operation. Let us denote the steps in a method by a sequence of symbols, one for each step, from left to right. Let the steps "Search for the source di", "Search for the relation (or relation symbol) rj", and "Search for the target Pk" be denoted by the symbols AA A A d, r, and p, respectively. Let the steps "Determine all sources", "Determine all relations (or relation symbols)", and "Determine all targets" be denoted by the symbols d, r, and p, respectively. The A A sequence d r p would then be interpreted as: (1) Search for the

138 source di, (2) Determine all relation symbols (associated with the source di), and (3) Search for the target Pk (among those targets associated with the source di and the relation symbol currently under scrutiny). Using this notation we have summa rized the methods for each of the primitive operations in Table 4-2. In general, each of the methods of a given primitive operation will have a different time cost. Let the time cost for method j of operation Qi be represented by tij. We will then define the time cost ti of operation Qi by t. min t. lei, 2,, 16 1j l je 1, 2, 1 i where si represents the number of methods for operation Qi. Thus, to determine the time cost t. for operation Qi' we must determine the time cost tij for each method of the operation and then choose that method which has the smallest time cost to represent the operation 4. 2. 2 Elementary Time Costs In performing a given primitive operation (via any of its methods) upon a storage structure we will find it necessary to trace our way from one point in the structure to another. In general, this will give involve following pointers in rings and sequencing through stacks. In order to determine the time cost of the operation we may,

139 Operation Method 1 Method 2 Method 3 Method 4 AA AAA AAA A A A Q1 dir jPk drp rdp rpd p rd AA AA Q2: dirj* d r r d Q3: dir' AA AA AA d. r. drp rdp prd 1J AA AA Q4: *rPk rp pr AA AA A A Q5: -rPk r pd prd d r p AA AA Q6: di* k dp pd A'A A A AA A A Q7: di-Pk drp prd rdp rpd A A Q8: - r* rd dr A A Q1 r p pr A A A A A A Q1:. d-* dr rd A A Q12: d.*- dp pd A A Q13: d — drp rdp A A Q14: P*k pd dp Q15: *pk pr rp A A Q16: — Pk prd rpd Table 4-2. Summary of Methods of Implementation for Primitive Operations

140 therefore, wish to know the amount of time required to access one block in the structure from another. As a result, we will introduce a number of elementary time costs, each of which represents the amount of time required to access one particular type of block in the storage structure from another. Before beginning consideration of these elementary time costs, let us clarify somewhat the concept of tracing through a storage structure. We assume that there exists a position indicator, or simply an indicator, which contains the address of that field (in some block) currently of interest or under consideration. The indicator is clearly just a pointer to the field of interest. As our interest shifts from point to point within the structure, the value of the indicator changes to reflect this. More correctly, the value of the indicator changes to reflect the effects of operations applied to the structure to elicit information from it, and our interest shifts accordingly. Tracing through a storage structure simply amounts to stepping the indicator along certain paths of access within the structure as required by operations applied to the structure. Let I represent the current value of the indicator. I is then equal to the address of the field of current interest. Let (I) represent the contents, or value, of the field whose address is I. We will now define a number of very basic quantities to be used in the formulation of the elementary time cost expressions. Let fi. 1

represent the number of time units required to follow a forward pointer from one block to another in an a.-ring of the SSM, where iE 1,2, *., 10O}. fi is then the number of time units required to replace I by (I), where I is the address of a forward pointer field in some block of an ai-ring. fi is of interest, of course, only if 0i = 1. Let si represent the number of timeunits required to step or sequence from one block to another through a stack of blocks in an ai-ring of the SSM, where i {1,2,., 10}. si is then the number of time units required to add (or subtract) some displacement D to I, where I is the address of an arbitrary field in some block of an ai-ring. Clearly, si is of interest only if 0. = 0 and Ai = 0. Let hi represent the number of time units required to follow a head pointer from an element block to the head of an ai-ring of the SSM, where iE {1,2,, 10}. hi is then the number of time units required to replace I by (I), where I is the address of a head pointer field in some element block of an ai-ring. Of course, hi is of interest only if 0! = 1. Finally, let so represent the number of time units required to step from one pointer field of a given block to another pointer field of that block. For example, in tracing through a structure we might enter a particular block via one ring and desire to continue our tracing with the second ring which passes through the block. In

142 order to do this, we must add (or subtract) some displacement D to I, where I is the address of the pointer field in the first ring and I ~ Dis the address of the pointer field in the second ring. s_ reflects the time required to carry out this addition (or subtraction). Note that the second pointer field may contain a head pointer instead of a ring (forward) pointer, but the process to be carried out is essentially the same. To be completely general we should also define some quantity to represent the number of time units required to follow a reverse pointer in each of the rings of the SSM. However, reverse pointers are generally useful only for the manipulative operations which we might define and since we are not considering manipulative operations (and since reverse pointers will not be useful for our interrogative operations), we shall assume that the structures we consider contain no reverse pointers (i.e., 0'i =0 for all ie{1,2,.,10} ). Hence, we have no need for the quantity mentioned. We might question the usefulness of defining a separate quantity fi (or si or hi) for each ring 6f, tlS %SM. There are two reasons for doing so. The first is for the sake of generality, and the second is for convenience when it comes time to develop expressions for the elementary time costs (we can tell at a glance which a -rings of the SSM are under consideration). Realistically, it would be quite unusual if so, sl, s2, ~ ~ ~, s10 were not all equal. The same may in general be said of f. and h.. 1 1

143 In fact, fi and hi are probably equal to one another for all values of i, also. It is possible, however, for pointers to be stored in different size fields or for them to be located in different relative portions of a storage unit or for some to require shifting before use and so forth. All of these factors can contribute to differences in the times arising from their use. Furthermore, instead of actual pointers we may want random pointers in some rings. As we indicated earlier, the uses of these two types of pointers will in general be characterized by different amounts of time. We will make two assumptions regarding the various quantities fi' s' and i. First we will assume that the time required to follow a pointer is always at least as great as the time to step through a stack. Thus, si < fiandsi<hiforie{1,2,.',10}. Since following a pointer will always involve a storage reference whereas stepping through a stack need not, this is a reasonable assumption to make. Second, we will assume that the time required to follow a head pointer will never be less than the time to follow a forward pointer. Hence, fi<hi for ie{1,2,... 10}. This implies that if either the forward pointer or the head pointer must be subject to additional processing (because of different size fields, etc.) the head pointer will be chosen for this additional processing. Since forward pointers will in general be used more frequently than head pointers (when both appear in the same ring), this is a reasonable assumption to

144 make. Combining these two assumptions, we can write si<fi. < h for ie1, 2, " 10}. (We note that this assumption is not crucial to the development of the time cost function, but reflects simply a condition which generally exists.) Let us return now to consideration of the elementary time costs in which we have expressed some interest but which we have not yet defined. These elementary time costs will be divided into three classes as follows: (1) those which reflect the number of time units required to move (i. e., trace our way) from one element block of a ring of the SSM to another element block of that ring, (2) those which reflect the number of time units required to move from an (arbitrary) element block of a ring to the head of that ring, and (3) those which reflect the number of time units required to move from the head of a ring to the first element block of that ring or vice versa. Let ei represent the number of time units required to move from one element block of an ai. -ring of the SSM to another element block of that ring, where iE {1,2,',10}. Consider for example e2, the number of time units required to move from one b3-block in an a2-ring of the SSM to another b3-block in that ring.

145 If 02 = 1, the a2-rings are indeed explicit rings and we need only follow a forward pointer to move from one b3-block to another. Hence, in this case e2 = f2. On the other hand if 02 =0, there exist- a number of possibilities. If A 2=1, each a2-ring (which is not really a ring, of course) may be viewed as containing a single b3-block. We will assume, therefore, that e2= 0. 2 If ~-= 0, we are again faced with a number of alternatives. Suppose 03= 1. Then the b3-blocks of a given a2-ring will be stacked upon the corresponding b2-block, and e2= s2. A more complex situation could arise if instead 03= 0, 04 = 0, 05= 1, 3= 1, and A 4= 0, which results in the situation depicted in Figure 4-3 (where type fields have been incalidedjin' each block for clarity). In this case, the b5-blocks of each a4ring are stacked upon the corresponding b4-block, whichas- dup izsttl trpon eactli3block of the associated a3-ring. The b3-blocks of each a2-ring are in turn stacked upon the corresponding b2-block. Since it is possible for the b3-blocks or the b -blocks or both to contain head pointers to b2-blocks and b4-blocks, respectively, and since these head pointers may be used to advantage if we wish to move up the structure for Figure 4-3, it now becomesaiportant to know whether we are moving from some b3-block to the b3-block

146 } b2 -Block:3 1 b3 -Block r'4 ) b4 -Block r'5 r b5 -Block b5 -Block T5 rj3 b5 -Block r3 _ ) b3 -Block r4 ) b4 -Block Ii4 _ b5 -Block 5 b5 - Block rj6 b5 -Block "r'3:) b3 -Block T4i ) b4 - Block'r, rjq7~ ~ b5 -Block.rj~, ~b5 -Block, 1 b5 - Block Figure 4-3. Example of Stacking-Duplication-Stacking

147 above it or to the b3-block below it. This will be determined by the general direction in which we are moving through the storage structure as a whole. For instance, if we are tracing through the structure from a bl-block representing some source in search of some b6block representing a relation symbol, we are moving downward from a given b3-block to the one below it, and vice versa. Let 6i be a binary-valued variable, the value of which indicates whether movement is toward (8i=l) or away from (6 i=0) the head of an a.i-ring when the element blocks of that ring are stacked upon the head, where iE{1,2,', 10}. Returning to consideration of e2, if 6 2= 1, we can take advantage of the head pointers (if present) in the stack of b 5-blocks between consecutive b3-blocks. Thus, if 6 2= 1 and 0M = 1, we can step from a b3-block to the first b5-block on the bottom of the stack above it and then follow the head pointer in the b5-block to the b4-block which acts as the head of the stack. Since there is a b3-block associated with every b4-block in the structure, we may treat the two as indistinguishable (i. e., as a single block). Following the head pointer then brings us to the b3-block desired. It follows that e2 is given by s2 + h4 If on the other hand 6 2= 0 or 0~= 0, we must step from the b3o-cfhee.tce2 +k4 3 block to the b5-block adjacent to it and then step through the stack of bg-blocks of which there are k4. In this case e2= s2 + k4s4.

148 Thus, if 02=0, 03= 0, 04= 0, 05= 1, A2 0, 3-= 1 and 4= 0, then e2 will be given by the following expression: 2= 2 + of 6 2 h + (o + -2) k4s4 where the "+" between the two decision variables 4 and 62 is treated essentially as disjunction. That is, if either -0 or 62 or both are 1, then the value of 04 + 62 is 1; otherwise its value is 0. We may also treat the product ~0 6 2 as conjunction, although both the arithmetic product and the logical product yield the same result. To be consistent, we will treat products and sums of deciscion variables (which are represented by Greek letters) as conjunction and disjunction, respectively. The result of these operations will be either a 1 or a O which will then be treated simply as an integer in any arithmetic operation (specifically, an arithmetic product). For example, in the above expression for e2 the term 04 62h4 is effectively the arithmetic product of h 4 and a 1 or a 0, depending upon the result of the logical product 0 6 2' Returning once again to our consideration of e2, suppose 023= 0 44= 5=06=0, 07=1, A2=4=A6 =0, and 3=A 5=1. The resultant storage structure will be similar to that of Figure 4-3 but will have stacks of b7-blocks inserted between the various b5-blocks. By applying reasoning like that used above, we generate the following expression for e2:

149 e2 =2 + 0 0' 6 (h4+h) + 0406 2 (h4 + k6s6) + ~4066 2k4(s4 + 6) + (0406 + 6 2) k4 (s4 + k6s6) Implicit in this expression is the assumption that if both the b -blocks and the b7-blocks contain head pointers, then a head pointer in a b7block points to the head pointer in the b -block associated with the b6-block which is the (actual) head of the a6-ring of which the b7block is a member. We can make this assumption since there is a one-to-one correspondence between the b5-blocks and the b6-blocks and we treat the two types of blocks essentially as one. If we did not make this assumption, we would have to step the indicator from the location pointed to by the b7-block head pointer to that location containing the b5-block head pointer, resulting in an additional time cost s to be added to h We can now continue in this manner until we have considered all combinations of the deciscions variables which affect e2. Since expressions of the sort encountered above occur very frequently in deriving expressions for the various ei, we will define a number of general forms as follows: S1(,il) = sj+06.h. + I +6. k s. i i 11 I 11 ~~11 j) I11

150 (j ) + 0! 26 j (h i+ h2 ) S 2il 026 j kil (Sil hi2 3 (112 1 ( 11i2 i1 i212 + 3 6j (hil+ 2 k3 3 + 0iii2136j [ hil+kiZ(si2+hi3)2 +il 0i2 0i3j i~2 il i2 i3 +'0! 0! k. (s. +h. S 12 322 + 0! 0! 0 6.. (h +h. +k. S. ) 1 0l11213 3i [ 1 il +ki3) 1233 1 22 3 ki3 +0! 0!! 4.- 6 k.. + h. +k. S.) 23 1 1 2 3 3 1112 133 1 13 2 2 3 13 +! 0! 0! O-6)k. [s. +k. (s. +.) 1112 13 11 1 2 12 13

S4( ili2, i3,i4)= S.+ 0!' 0 0! 0 6 (hil+h. +h i3+h. ) J 2 13 14 J 12 3 14 +0! 0!'0-0 6 (h +h +h. +k. s. ) 1+ 12341 i i2 i3 i4 1 i4 +0! 0 i O 6 h. +hi A. (S.3+hi4 ] 0 1 00 6 j h. +h. +k. (s +k. s.)] + ili2i346j 3[h1 12 i3 i3 i44 + 0! 0!0 04j [h. +k. (s. h. +h. )] 11i2i3 4 j 1i4 i2 23 i4 i41 + 0! 0! 0! 0! 6. [ k. (s.+ h. +k. si)] ~ 102 13 14 1 i2 2 3 1 4 i4 + 0! 0! 0i 6j {hil. +k. [s. +k. (s. +h )] +1 i 2 13 1i4j i 2 i2 3 3 4 + 0! 0! 0! 0! 6 hil. +k. [Si +k. (S.+k. s. )] } 1112 1 134j 1 2 12 13 i3 14 4 +~! 0! 0! 0! k. (s +h. +h +h) i1 12 13 146j 11 il 12 3 4 ~0! 0! 3i 6 k. (s +h. +h. +k. s. ) 1 12 413 1 1 14 4 -0! 0! 0! 0! k1 Sil+h +k. (s +h. )1 1' 1 12'1 16 k. [s. +h. +k.3(s+k s( 4 )] +l2 i3 4 1 12 i 314 0! 0! 0' 6.k. [S h +k. (S. +kh. )] 1112 13 i41 1 1i2 3 3 i4 2 13 14 11 11 12 12 1(si+hi3 14 -'-'0' 0! 6 k.{S. +k. s. +kA (S +h. )] } 11 32 13 11 4 1 2 12 i13 14 + 1l 1 1 1 +k -[ si2+ki( -(si3+k4s1 )] } 1-il- 2 3 i4+~ l22'i

152 We now have a convenient shorthand notation. Instead of having to write e2= S2+46 2h4+ (4+ 62) k4s4 we can write simply e2= S1(2, 4) Similarly, instead of e2 S2+ t of6 (h4+h6) = 2 4 6 2 4 + 04 0'6 2 (h4+k6h6) + 4 6 2 k4(s4+h 6) + C( 06+ 6 2) k4(s4+k6s6) we can write e2= S2(2, 4, 6) This notation has been employed to compile a complete tabulation of ei for all iE{1,2, *,10}, which appears in Appendix B. We have implicitly assumed in the derivation of these expressions for ei that either 01=1 or 10= 1 (or both) when both cr1= 0 and 2= 0. This guarantees, of course, that Constraints 6 and 7 will be satisfied. There are two reasons for making this assumption. First, this assumption allows us to use the same expressions for the various ei when ol= 0 and 2=:0 as when either a1= 1 or $2= 1 or both. (The bl-blocks and the bll-blocks then act as "stops" in the structure.)

153 Second, we need not concern ourselves with developing expressions for the e. for the case in which A and JII are disjoint or the special cases in which A and II are not disjoint. Suppose for instance that i.= 0 for all ie {1,2,,10, A and II are not disjoint, and 1= a2= 0. If the intrinsic structure of our data is such that d1rld2, d2r2d3 dr d dr d. dd ri d d ri+2' d (and d r. d. and d. r. d. are not both true), then clearly our i J1 12 12 -2 11 expressions for the various ei must be a function of n, which item of information is not at our disposal. Basically then, our reason for making this assumption is to avoid deriving expressions for the ei to cover each of several possible special cases. In order to specify the value of 6i to be assumed for a particular use of ei, we will follow the convention that a prime is affixed to ei whenever 6 i= 0. Thus, e. implies 6.= 1 for ic{1, 2, 10} e! implies.i= 0 for i{1, 2, * 10}. We will continue to use ei in the generic sense, also, but context should make the usage clear. As a rule of thumb, the value of 6 i is of significance only when ei appears in a time expression. Unless otherwise specified, all other uses of ei are in the generic sense. The quantity ei reflects the number of time units required to 1

154 move from one element block in an ai-ring to another element block in that ring, but in general it does not reflect the number of time units required to move from the element block nearest the head to the head or from the head to that element block. Let e9 represent the number of time units required to move from the element block nearest the head in an a. -ring of the SSM to the head of that ring or from the head to that element block, where ie {1,2, ~~, 10}. Note that if 0i= 1 (i. e., the ring contains forward pointers), the element block nearest the head when moving toward the head will in general be different from the element block nearest the head when moving away. This will not affect the validity of the expression we will derive, however. Since, when Ai= 1, we assume the duplicated head and its associated element block function as a single block, we define e9 to be 0 when 1i= 1. This leaves us with two possibilities: either the ai-rings of the SSM are explicit rings with forward pointers (if 0i= 1) or the element blocks are stacked upon the head (if 0i= 0). In the first case e9 will clearly be equal to f. and in the second case to s.. Recalling that si < f < hi,' we see that even if i!= 1, choosing to follow a forward pointer or to step through the stack, as the case may be, is still preferable to following the head pointer. Thus, e9 will be given by the following expressions: ei hi (0if + isi) for ie{1,2,' 10}

155 Fl'ilnally, let e* represent. the ilumber of tiime units required to move from an arbitrary element block of an a -ring of the SSM to the head of that ring, where i {1,2,"' 10}. We should discuss for a moment what we mean by "an arbitrary element block". In tracing through a storage structure we may enter a ring via any of its element blocks and desire to move to its head. If there are k element blocks in the ring and if we are given no a priori information (which we assume we are not), then we are equally likely to enter the ring at any of the k element blocks. That is, the probability that we enter the ring via the j-th element block is 1/k for all je {1, 2,...,k}, where we assume that the element blocks of the ring are numbered consecutively from 1 to k in the order in which they appear in the ring. The number of the element block via which we expect to enter the ring is then given by the expectation of j, which we denote E[ j] k E[ j] = i (k) j=1 k - 3kLJ 1 k(k+l) k 2 k+l 2

156 As a matter of notational convenience let us define A k+1 k= 2 When we refer to "an arbitrary element block", it is this block - the A k -th element block - to which we refer. Let us now consider the amount of time required to move from A the ki-th element block of an ai-ring of the SSM to the head of that ring, where i {1,2,..., 10}. As is the case with e~ if i=1, we define e* to be O. If 0! = 1, very clearly we need only follow the head pointer in the element block to reach the head. Then e*. = h.. i 1 A On the other hand, if 0' = 0, we must move from the k.-th element block through all intervening element blocks in order to reach the head. We know, however, that moving from the ki-th block to the A k.-1-st block requires e. units of time, as does moving from the A A ki-l-st block to the ki-2-nd block, and so on through the 2-nd block to the 1-st block. Similarly, we know that moving from the 1-st block (the block nearest the head) to the head requires e9 units of time. Thus, a total of A e9 + (ki-1) e units of time are required to move from the ki-th element block to the head.

157 Therefore, e*i may be characterized by the following expression: r i- ~ {hi + 0 [ e+ (! ki) ei]A 1 1 i 1i 1 for iE {1, 2,' * 10}. 4. 2.3 Sub-procedures and Their Time Costs Upon closer inspection of our definitions for the primitive operations and their methods of implementation, we see that each method consists of a number of sub-procedures, many of which are common to several different methods. For instance, one of these sub-procedures might be that sequence of steps required to search the SSM for all occurences of a particular relation symbol associated with a given source. Another might be that sequence of steps required to access all targets associated with a given source/relation symbol pair. Since sub-procedures such as these appear several times in the definitions of the various methods and since our ultimate goal is the determination of the time cost for each method, it will be advantageous to develop time costs for the sub-procedures. If we examine in detail the procedures used to implement the methods for the various primitive operations (as given in Table 4-2 and Appendix A), we see that they involve twenty different sub-procedures in two classes. The first class contains ten sub-procedures of the form "Determine all x1 associated with x", and the second class contains ten sub-procedures of the form "Search for a particular x1 associated with x2". Let ai denote the i-th sub-procedure

158 in the first class and let a.*denote the i-th sub-procedure in the second class, where iE {1, 2,... 10}. Using the terminology of the SSM (where we refer to particular blocks) as opposed to the terminology of the DSM (where we refer to sources, targets, and relation symbols), let us define first the sub-procedures ai for iE {1, 2, *, 10}. Let aI represent the sequence of steps required to access all bl- blocks associated with a given b7-block. Similarly, let a2 represent the sequence of steps required to access all bl-blocks associated with a given b6-block. We note that the effects of aI and a2 are identical. That is, the same b -blocks will be accessed by both al and a2 (assuming of course, that the given b6-block and the given b7-block are associated with one another). The time costs of these two sub-procedures will in general be different, however. Let a3 represent the sequence of steps required to access all bll-blocks associated with a given b -block, and let a represent the sequence of steps required to access all bll-blocks associated with a given b6-block. Clearly, a3 and a4 are the direct analogies of a1 and a2, respectively, for tracing a storage structure in the other direction. Let a5 and a6 represent those sequences of steps required to access all b -blocks and all b -blocks, respectively, associated with all b6-blocks which represent the same relation symbol. Let a7 and a8 represent those sequences of steps required to

159 access all b5-blocks associated with the b l-block(s) representing a given source and to access all b7-blocks associated with the b block(s) representing a given target, respectively. Finally, let a9and a10 represent those sequences of steps required to access all bll-blocks associated with the b1-block(s) representing a given source and to access all b -blocks associated with the bll-block(s) representing a given target, respectively. A summary of these sub-procedure descriptions appears in Table 4-3. Consider now the sub-procedures ai for ie {1, 2, 10}. Instead of accessing all of the blocks of a given type which are associated with some block or blocks, we may wish to access only enough of these blocks to find a particular one. If there are k blocks of a given type associated with some block of another type, we would expect that in order to access one particular block A of the k blocks we must access on the average k blocks, the last one accessed being the block sought. An example of this might be looking for the b5-block which represents a particular relation symbol and which is associated with a particular b -block. Clearly, we can establish a one-to-one correspondence between the sub-procedures ai and the sub-procedures ar. Thus, by substituting the phrase "a particular" for the word "all", we

160 Sub-procedure a represents the sequence of steps required to access all x1 associated with x2. a X1 X2 a1 bl-blocks a given b7-block a2 b1-blocks a given b6-block a3 b1l-blocks a given b -block a4 b l-blocks a given b6-block a5 bl-bloclks all b6-blocks which represent the same relation symbol a6 bl-blocks all b6-blocks which represent the same relation symbol a7 b5-blocks the b1block(s) which represent a given source a8 b-blocks the b11 -block(s) which represent a given target a9 bll-blocks the b1-block(s) which represent a given source a10 b1-blocks the b11-block(s) which represent a given target Table 4-3. Summary of Sub-Procedure Descriptions

161 may define sub-procedure a*i with the same statement used to define sub-procedure ai for i { 1, 2,..., 10}. Also, Table 4-3 may be considered a summary of the sub-procedures a0* if "all" is replaced by "a particular" in the statement "Sub-procedure a represents the sequence of steps required to access all x1 associated with x2. Let the functions zi(t) and z*. (t) represent the number of time units required to perform ai and ai, respectively, upon the storage structure represented by the SSM, where iE {1, 2, o., 10}. The variable t, which is used as an argument to the functions zi and z*, represents the number of time units required to perform some operation upon each of the blocks accessed by ai and at, as the case may be. The operations to which t refers may be divided into two classes: the first class contains those operations which compare the contents of some field of a block with some given quantity, indicating whether or not a match is made, and the second class contains those operations which fetch (and record or display) the contents of some field of a block. Before considering the operations in each of these two classes, let us define a number of basic quantities much as we did for fi, si, and h.. Let f represent the number of time units required to follow 1 a a description block indicator from a bl-block of the SSM to the corresponding description block. (fa is of importance only if crl= 1, of course.) Similarly, let f represent the number of time units

162 required to follow a description block indicator from a bll-block of the SSM to the corresponding description block. Let Fa, Fr, and Fp represent the number of time units required to follow a pointer in a source ring, a relation ring, and a target ring, respectively. Let cd represent the number of time units required to compare a data item description (as it appears in a description block) with a known or given quantity. Similarly, let cr represent the number of time units required to compare a relation symbol name with a given quantity. Finally, let vd and vr represent the number of time units required to fetch a data. item description and a relation symbol name, respectively. Return now to consideration of the two classes of operations. In the first class we will define five operations which will require the following respective numbers of time units to perform: Ca C, C a rI r C and C. In the second class we will also define five operations r2 P which will require Va, Vr,,Vr and V units of time to perform, respecitvely. Ca and Va represent the number of time units required to compare against a given quantity and to fetch, respectively, the data item

description associated with a given b1-block of the SSM. If ol= O, the description block associated with a given. bl-block is attached directly to the bl-block. Inthis case Ca=Cd and Va= vd. On the other hand if a1= 1, we must follow a description block indicator in order to reach the description block before we can compare or record the description. Thus, in this case Ca= fa+cd and Va= fa+Vd. We may then characterize Ca and V by the following expressions: Ca=olr fa + Cd Va= c fa'+ Vd Cp and Vp represent quantities analogous to Ca and Va for the bl -blocks of the SSM. It follows that C p 2 fp + Cd C =a f +c p 2 p d Vp= c2 fp + vd Cr and Vr represent the number of time units required to compare against a given quantity and to fetch, respectively, the relation symbol name associated with a given b6-block of the SSM. If p2= 1, the b6-block contains a relation symbol name field so that C = cr and Vr=vr. If on the other hand P2= 0, we have two choices: (1) if P1 or p3 is 1, go to one of the b -blocks or b7-blocks associated with the b6-blocks and containing a relation symbol name field, or (2) go to

164 the head of the relation ring passing through the b6-block (which we assume contains a relation symbol name field). It so happens that Cr and Vr will be used only when pl= 0 or p3= 0, however. In particular, they are used only in conjunction with the sub-procedures a7, a a8' a*, which are used to move toward the "center" of the structure in search of one or more relation symbols. These sub-procedures first encounter either the b-blocks or the b7-blocks. If these first blocks encountered do not contain a relation symbol name field, then the corresponding b -block must be accessed, which brings us to the point of this discussion. Let us assume that P1 and p3 must be equal when 2 = 0. This implies that if only one type of block (of the b -blocks, b-blocks, and b7- blocks) contains a relation symbol name field, then the b6-blocks must contain that field (i.e., then P1 =O, P2 =1, andp3 =0). This is a rather arbitrary assumption which we motivate simply on the grounds of convenience in deriving the related expressions. It does, however, have a certain intuitive appeal in that it preserves some of the symmetry of the model. This assumption, coupled with the fact that Cr and Vr will used only when P1 =0 or p3 =0, rules out the first alternative above.

165 To reach the head of a relation ring, as suggested by the A second alternative, we must follow mr pointers. (Since there are on the average mr b6-blocks which represent the same relation symbol, it follows that there are mr b6-blocks in a given relation ring. We assume that we enter this ring via an arbitrary b6-block in the ring. Hence, we A must follow mr pointers to reach the head.) It follows A A that C =m F +c and V =m F + v. We may then r r r r r r r r characterize Cr and Vr by the following expressions: A C =p m F +c r 2 r r r A Vr =P2 m Fr + Vr C and V represent the number of time units required to rI r1. compare against a given quantity and to fetch, respectively, the relation symbol name associated with a given b5-block of the SSM. If P 1, clearly C =-c and V v IfP1 0 however =we will move to the b -block associated with the given b5-block and apply the operations which give rise to Cr and Vr. We know that the

166 amount of time required to move from a b5-block to the corresponding b -block is given by e* Therefore, in this case C =e* + C and 6' r1 5 r V e* + V. C and V are then described by the following rl 5 r r r expressions: C r1 P1Cr +l(e +Cr) Vr =P1 r + P1 (e 5 + Vr) Cr and Vr represent quantities analogous to Cr and Vr for the b7-blocks of the SSM. It follows that Cr =P Cr + P3 (e 6+ Cr) 7 Vr = P3 Vr + P3 (e6 + Vr) It should be clear that only certain of the quantities Ca, C, C a' r1' r' C Cp' Va, Vr,V V V, and V may be substituted for t in zi(t) 1 2 and z* (t) for a given value of ie {1,2,., 10}. For example, only C and V apply. when i- 1. Let us continue now with our consideration of the functions z i(t) and zt*(t) by intially considering zi(t) for ie {1,2,..., 10}. Consider first zl(t), which represents the number of time units required to access all bl-blocks associated with a given b7-block. If we wish to compare with a given quantity the data item description associated with each b -block accessed, then t Ca. On the other

167 hand, if we wish to fetch the data item descriptipn associated with each bl-block accessed, then t = Va. In either event the steps required to access the bl-blocks will be the same. We may outline these steps as follows: (1) Move to the b6-block associated with the given b7-block. (2) Access all b5-blocks associated with that b6-block. (3) Move to the b4-block associated with each b5-block. (4) Access all b3-blocks associated with each b4-block. (5) Move to the b2-block associated with each b3-block. (6) Access all bl-blocks associated with each b2 -block. Of course, depending upon what transformations have been applied to the SSM, some of these steps may not have to be performed (i.e., the time required to perform the steps may be zero). Let us examine the number of time units required to perform each of these six steps. Step (1) will clearly require e*6 units of time to perform. To perform step (2) we must move from the b6-block to the first b5-block of the a5-ring and then move sequentially through the remaining K33-1 b5-blocks of the ring. This will require e5 + K33 e5 units of time. We have assumed that the last element block of the a -ring is treated exactly like the other element blocks in that we try to access the element block following it. Of course, there is no such

168 block, but we assume the time required to determine this fact is given by e Hence, we multiply e5 by K33 instead of by K -1. 5 5 3 33 The performance of step (3) normally requires e* units of time and since it must be performed once for every b5-block associated with the given b7-block, the total time required is K33 e 4 units. Step (4) is similar to step (2) and we,determine that it requires eo + K el units of time. Since it also must be performed 3 22 3 once for every b5-block, the total time required is K33(e 3 + K22 e). Note, however, that if 03= 04= 05= 0, and A50 (A4= 1 in this case), then steps (3) and (4) are included in step (2). If both A3= 1 and a= 1, this is only of academic interest since e 4= 0, e0=, and e= 0 anyway, but if A3 0, it is of some importance. Given that 3= 0 4=0, 05=0, A3=O A4=1, and A5=0, we know that b3-blocks are stacked upon their respective b4 b -blocks are duplicated upon their associated b5-blocks, and b5-blocks are in turn stacked upon their respective b6-blocks. Thus, intervening between each pair of b5-blocks in a stack is a stack of b3-blocks (which may, depending upon the values assigned to other decision variables, have stacks of blocks between pairs of them). It follows that in moving from one b -block in a stack to the succeeding b5-block, we must step through each of the intervening b3-blocks. (Because we are moving away from the heads of the stacks, we may not take

169 advantage of any head pointers, of course.) Clearly then, step (2) encompasses steps (3) and (4) and the time costs of steps (3) and (4) may be disregarded. This also justifies our assumption that e5 represents the time required to determine that the last b 5-block in a given a5-ring is indeed the last. Step (5) requires e* units of time and must beperformed once for every b3-block associated with the given b7-block, or K32 times. Thus, its total required time is K32 e 2 units. Step (6) is similar to steps (2) and (4). Its total required time is K32 (el + Kll el). Note that if 01=,= 02=0, 03= 0, and A3=0, then steps (5) and (6) are included in step (4). This situation parallels that for the inclusion of steps (3) and (4) in step (2) exactly. By extending our reasoning, we can in fact show-. that if steps (5) and (6) are included in step (4) and if step (4) is in turn included, in step (2), then steps (5) and (6) are included in step (2). Finally, for each b -block accessed we perform that operation which is characterized by t. Before summarizing z1(t) to this point, let us define a binaryvalued inclusion variable Ai as follows: Xi= 0i-1+ i+ 0 i++ i+l for iE {2, 4, 6,8} Xi =0 i-1 + i + i+l +Ai-1 for iE{3,5,7,9}

(Recall our assumption that "+" indicates disjunction when applied to decision variables.) We may now write the following expression for z1(t): z(t) = e 6eo +e K33e5 + X4[ K33e 4+ K33(e + K22e3)] + X4[ K32e 2 + K32(eO1+ Klle{)] +K31 t If we rearrange the terms of this expression somewhat, we can obtain the expression z1(t) =e6 e + K33 et +X4(e4+ e3)] + K32 [ X4e3 + X2 (e 2e + K31 [X2et +t] or alternatively zl(t) =e6 +e5 + k5 [ e5+X4(e4 +e 3)] + k3k5 [ X4e? + X2(e +e)] + k1k3k5[ X2ei + t] This expression is not quite complete, however, for we have

171 ignored so far quantities such as the number of time units required to move from a pointer field for one ring to a pointer field fDr another ring passing through the same block (when such a situation exists). We assume that when a1 is to be performed, the position indicator is pointing to the relation symbol name field of a given b7-block. (If the b7-blocks do not contain a relation symbol name field, a1 will not be used.) If 6= 1 or 06= 1, we must step from the relation symbol name field to either the head pointer field or the forward pointer field of the given b7-block in order to allow access to the corresponding b6-block. As we discussed earlier, this will require so units of time. Suppose instead that A 6 1 and 0= 1. In this case we must step from the relation symbol name field of the b7-block to the a5ring forward pointer field of the corresponding b6-block. (Recall that e 6 is defined to be zero for A6= 1.) rhis will also require s units of time. To carry this one more step, suppose A- 1, A5= 1 and 04= 1 or' =- 1. Here we must step from the relation symbol 4- 4 name field of the b7-block to either the a4-ring head pointer field or the a4-ring forward pointer field of the corresponding b5-block. There is, of course, only one b5-block associated with the given b7-block in this case. Also, e* = 0, e~ 0, and e'= 0. As before, 7 60' 5 6 0 A b this move requires so time units. Continuing our example in this manner we reach the point where A6= 1, 5- 1, A4= 1, A3- 1, A2- 1

172 and A 1. Clearly, there is exactly one bl-block associated with the given b7 -block. Furthermore, e* - e0 e -O e* - e- e e= O0 e* ~, eO - and e' O. We must in this case step from the relation symbol name field of the b7-block to either the description block indicator or the description block itself, as the case may be. This also requires so time units. Thus, whenever the following expression has a value of 1, we must step from the relation symbol name field of the given b7-block to some other field in the structure a move which requires so time units: (06VV 6 ) V A A65 V A6A5(4 V A6A5A403 V 6A5A4A3(02V 0o)V 6A5A4A3A29A1 A6A5A4A3A2A1 (We have used "V"' instead of'4!' to denote disjunction here because we will later want to arithmetically sum the values of expressions such as this.) Suppose now that 06= 1 and 05= 1. If 06= 0, we must follow the a6-ring forward pointers to reach the b6-block associated with the given b7-block. In addition we are to follow the a5-ring forward pointers to access all b -blocks associated with that b6-block. Hence, at the b6-block we must step from the a6-ring forward pointer field to the a -ring forward pointer field. This move, as we know, requires

173 Otime units. If 0 = 1, we can follow the a6-ring head pointer directly to the a5-ring forward pointer without incurring the additional so units of time. Suppose next that 06= 1, 6, 5 and 04=1 or 0= 1. In this case we must step from the a6-ring forward pointer to either the a4-ring head pointer or the a4-ring forward pointer. Again, s units of time are required. If we continue with 0 this line of reasoning, we can obtain another expression 06605 V 060~A5(04V 06)V06 5A403 6 ~,,h5~4a3(~4v 6)V 46 6'"'5 4~3 V 0GA65 A4A 3(02 V 02) V 5 4A3A201 V 060 A5A4A 3A2A which, when its value is 1, implies a step requiring so units of time. Using the same type of reasoning for cases for which 05= 1, we can develop the expression 05(04V 0) V 05A403 V 05A4A3 (02V ~0) V 05'A4A3A201V 05A4A3A2A1 In this case, however, when the value of the expression is 1, k5 steps of so time units are required. This follows, of course, from the fact o

174 associated with the given b7-block. If we carry on in this manner, we can develop several more expressions of this type. Let us consolidate all of these expressions and their respective multipliers into a single expression which we will designate z as follows: z- {((06V0V) A605VVA6A5 (4vol V..' V A6A5A4A 3A 2 ) +(060 605 V A'6 A5(04V 0) 060 A 5403V - V 0V6f6 A ) +5[(5(04V04)V 05A403V05A4A3(02V 2)V. ~ ~ V 054A3z2A1 ) 4+(4% 340 3 (2v2) v 4 3 2 v 4 3" 21)] +k3k5[ (03(02V 02) V 03A201V 03A2A1) +( 2O2f1 V 02 tA1) ] +klk3k5[ 011 } so In this expression "+" indicates arithmetic summation. z0 indicates the number of time units by which our current expression for zl(t) is deficient. The situation is easily remedied:

175 z, (t) +e + k5 e + X(e +e )] + k3k5[ X4ek + X2(e2 + e )1 + kk3k5[ X2e+t + z] j Consider next z2(t), which represents the number of time units required to access all bl-blocks associated with a given b6-block. The derivation of an expression for z2(t) follows almost exactly the derivation of the expression for z1(t). The only difference is that for z2(t) there is no need (obviously) to move from a b7-block to the b6-block. Thus, 2z~ {(0 05(V4V 5403V5A4A3(02V)V5 4(2V) V 514A3A2A1) +k5[(05(4V04V 05A403V 05 4A3(22V 02) V5 A4A 3 2 1) +(04003V 04A 3(02V 0V)V 40i43A 201V'040A3A2AA1)] k3k 5[ (03(02V )V 03A201V 032A1) + (0201v 020A1)] +klk3k5[ 0i ]} So

176 and z2(t) e5 + k5[ e + 4(e + e 3) + k3k5[ X4e3 + +2(e 2+ e 1)] + k1k3k5[ X2et + t] + z2 In deriving z0, we have assumed that the position indicator is intially pointing to the relation symbol name field of the given b6block. If p2= 0, however, we may assume that the indicator is pointing to the relation ring pointer field. (Recall that the relation ring is used to determine the relation symbol for a given b6-block when P2= 0.) This does not affect the validity of our expression for z0 Since a3 and a4 are direct analogies of a1 and a2, respectively, it follows that z3(t) and z4(t) parallel exactly z1(t) and z2(t), respectively. The actual expressions for z3(t) and z4(t) may be foundin Appendix C which contains a complete summary of zi(t) and zi for all ie{1, 2,,10}. Next consider the derivation of an expression for z5(t), which represents the number of time units required to access all b1-blocks associated with all b6-blocks Which represent the same relation symbol. It should be clear that a5 can be performed by accessing

177 all the b6-block in a given relation ring and then applying a2 to each of those b6-blocks. Since there are mr b6-blocks in a given relation ring, the number of time units required to access these b6-blocks may be given by (m r+) F. m rF time units are required to access the blocks, and Fr time units are required to follow the pointer from the last b -block in the relation ring to the head, which indicates that all the b6-blocks have been considered. It follows that z5(t) = (m rl) Fr + mrZ2(t) Although we have no explicit need for z we know that z5 = mr 2 5' r z6(t) is analogous to z5(t) and, therefore, will not be considered in detail here. An expression for z6(t) appears in Appendix C. Next in line for consideration is z7(t), which represents the number of time units required to access all b5-blocks associated with the bl-block(s) which represent a given course. By applying arguments similar to those used above, we can obtain the following expression: z7(t) = (ma+l) Fa+ ma {e + e 2 + k2 [ e2 + X3(e 3 + e 4 +k2k4[X3e4 + t] } + 7

178 We have assumed here that a source ring is used to access the b1-block or b1-blocks representing a given source. If we should desire to assume that the locations of these b -blocks are known a priori (perhaps as the result of some other operation), we may set Fa to zero and still use the given expression. Derivation of an expression for z0 proceeds along lines similar to the derivations for z0, z2, etc., but with some distinct differences. We assume that when we initially enter a b -block the position indicator points to the source ring pointer field of that block. Our immediate goal is to trace through the storage structure to access all b5-blocks associated with the given bl-block. For each b5-block accessed, either we are to compare the relation symbol which it represents with some known quantity or we are to fetch that relation symbol. In either event we must determine what the relation symbol is. If P1= 1, we may determine what the relation symbol is simply by accessing the b -block. On the other hand if P1= 0, we must access the corresponding b6-block. As a result, our expression for z0 will appear as follows: z = ma V(( 1)VAl02V A~2V 3) V 12A A3 04 7 a 1 I 12 123' 1 23 3 + (~01~ 02v v1l'A2(03V 03)V i1'1A2A3~04 V 011A2 A3 A4(p1V 05V 05)V 1A2A3A4A5 pl)

179 +k2[ (02(03V 03)V 02A304V f02A3A4(PlV 05V 05)V f02A3A4A5pl) +(03' v34v 0h, 4(p1 A V 05 )V 030',4~5 A 1)] +k2k4 [ (24(plV 05V5)V04A5pl) + (5"050P1)] } so To clarify the intent of this expression let us briefly examine the terms 1 2A3A4(lV V05V 0)VA1A22A3A4A5p1 which apply when A11, A2=1, A3=1, and A4=1. If p = 1, we need only step from the source ring pointer field of the given bl-block to the relation symbolname field of the associated b5-block in order to be able to determine what the relation symbol is. On the other hand if P1= 0, either we must step to the a5-ring forward pointer field or head pointer field of the b5-block (if 05 =1 or 0f= 1) in order to access the corresponding b6-block or we must step directly to the relation symbol name field or the relation ring pointer field of the b6-block (if A5= 1). z8(t) is analogous to z7(t) and so will not be considered here but may be found in Appendix C. zg(t), which represents the number of time units required to access all bll-blocks associated with the b1-block(s) which represent

180 a given source, is similar in all respects to those zi(t) considered above. Without any difficulty we can derive the following expression to represent it: z9(t) = (ma+1) Fa + ma e 2e a k2 a a [ e2 + X3(e3+e4)] + k2k4 [ rX3et +X5 (e5 + e~)] t 0)* +k2k4k6 [ X5e6 + X7(e 7+e 8)] +k2k4k6k8 [ X7e +X9(e* + eO )] + k2k4k6k8k10 [ X9ge0 + t] } + z where z9 m {(( 01V 0)VA1 02V A 2( 0 )VOf V A2(f3V 0)V... VOlt'AAA A A2 A A A A 0 +( 2V 010i3 1 2)4V 5 * 2 V 6 71 7 8 9 10 +K12[ (02(03 0 )V02A304V 02A3A4(05v 05)V'' V02A3A4A6 4AZ91 ) +(0303 4V 03 3A4(%05V 05)V. ~ ~ V 33 4As5A6A7A8 A9A10))] +K13[ (04(05V5 )V04A556V 04A5S6( 7V07)V' V 4AsA6ZA78 9A10) +(505'06V 05~A6(v7V 0)V V 0505A6A7A889A10)]

+K14[ (06(07V 07)V 06A708v 06vA78(09 V0. ~'v v A06A7A8A9A10) + (070j08V 07I7 8(09V 0g)V''' V707A8f 9A10)1 +K1 5[ (08( 09 )V 08A90 10V 08A9A10) +(090lo 10v 90 A10)] +K16[ 010] } so Finally, z10(t), which is analogous to z9(t), appears in Appendix C. Let us now consider the functions z*. (t) for iE {1, 2,. 10O}. Consider the derivation of an expression for z* (t). There are k5 b5-blocks, k35 b3-blocks, and klk3k5 bl-blocks associated with a given b7-block. (To simplify notation, we will use K33,K32, and K!j in place of k5, k3k5, and klk3k5, respectively) Because of the uniform distribution of blocks in the SSM, we can say in general that when we have accessed Kt1 b1-blocks associated with a given b7-block, we will have A A accessed K2 b3-blocks and K33 b5-blocks associated with the b7-block. Since a1 and a are so closely related, one would expect a strong similarity between zl(t) and z1(t), and indeed there is. Using very much the same arguments we used for z(t), we obtain the following expression for z (t):

182 A A z1(t) =e6 +3 e + (3-1) e + X4(e4 +e) A A (K+ 2 1)hqe3 + 32 2(e 2+eO) + (K31-) X2el + 31 t+ z~ where z I performs the same function for z (t) as Z0 does for z1(t). It may be helpful to make a comment or two about this expression. Note that the coefficients of e' e t and e' are of the form k-1, whereas 5' 3' 1 in the expression for z1(t) these coefficients are of the form k (i.e., they are not decremented by 1). As we indicated in the derivation of the expression for zl(t), if we want to access all blocks of a given type, then even when we encounter the last block we must try to access one more in order to determine that all the blocks have been considered. If there are k such blocks, this means we must perform k access operations (not including the operation required to access the first block from the head), which accounts for coefficients of the form k in the expression for zl(t). If on the other hand we are looking for a particular block, then we may stop accessing blocks once we have encountered the one sought. This means that access operations need be performed only for those blocks which precede the block sought. Thus, if the block sought is the k-th block encountered (of a given type), we need perform only k —1 access operations, which accounts for coefficients of the form k-1 in the expression for zj(t).

183 Even though in this case we are seeking a particular b1-block, this reasoning clearly applies to the b3-blocks and the b5-blocks at the intermediate levels, also. For instance, we can view this situation as searching for the particular b3-block (associated with the given b7-block, of course) with which the b1-block sought is associated and searching for the particular b5-block with which the b1-block sought is associated. At this point we bring up a note of caution: the expression for l(t) above is correct as far as s it goes, but it is incomplete. Suppose for instance that X4= 0, that is, e* e, and e' are A covered by e5. Of course, e5 applies only to the first 133-1 b -blocks encountered, not the last one. This means that e4 and e are required once to account for the 4 3 b3-blocks associated with the last b5-block encountered. e3 will also be required a number of times (to be determined) to account for these b3-blocks. Normally (when X4= 1) e3 is required for 32A b3-blocks, but (K33-1) k occurrences of el are covered by e5 when A A 4= 0. Therefore, we will require (132-1) - (K33-1) k3 occurrences of eI when 4= O. 3 4

184 A A (%2-1~) - (%K3-1) k3 k3k5+1 k5+1 - 1) ( -2 1) k3 2 2 - k3k5-1 k3k5 k3 2 2 k3-1 2 A A k3-1 = K22-1 Similarly, e 2 and e0 are required once and el is required N1- 1 times when x2= 0 Taking these factors into account, we obtain the following expression for z (t): z* (t) = e*+e' + + (K3-I)e +[ X4(K -1)+ 1] (e* + e3) 6 3 A _A A + (X 32+4K22-1) e + [2(K32-1) + 1] (e2 + e) A A A + (X2}31 + XKI&-1) e + 3 t + lo* The expression for z~* follows directly from the expression for t. All operations involved remain exactly the salme; only the

185 numbers of blocks are different. z10 {((v 06) VA605V A6A5(4V )v 6 V A5A4A3A2A1) + (O6 o0V -'f A ( 4V of ov A' A v... V 06 Of -A'h -AlA +(06065v066A5 (04V06)V 6A5 403v. 6 6p) +K3[ (05(04v 0 )V05A403v 05A4A3(02v 0)v~ * 05A4v3 v 221) +(04043V 04013(02V 0) V 040053A 201V 0404423A2 1)] +Kj2[ (03(02 V02)v0320V 03A2A1) +(020-1V 0202A1)] A31rl] }So 0* The expressions for all other z*(t) and their corresponding z0* 1 1 can be obtained as above and are summarized in Appendix C. 4. 2. 4 Primitive Operation Time Costs We now have at our disposal the information required to return to our consideration of the time costs t.. associated with the methods for implementing the various primitive operations. Let us consider the derivation of an expression for tll, the

186 number of time units required to perform Q1 using its first method. Recall the basic steps required to implement this method: (1) Find d. in the storage structure. (2) Examine all relation symbols associated with d. in search of r.. (3) If rj is found, examine all targets in the corresponding target set in search of Pko Step (1) amounts to finding the head of the source ring containing those b1-blocks which represent di. We will assume that finding the head of a source ring requires Ta time units. (We can assume that this step is accomplished by using di as a key to a dictionary, for instance.) Since we will have need of these quantities later, also assume that finding the head of a relation ring or a target ring requires Tr and Tp time units, respectively. If we have assumed that Fa= 0 (i. e., that we do not use the source ring to access the b1-blocks representing a given source, but rather we know a priori where these b1-blocks are located), then we may assume that Ta= 0, also. Now consider steps (2) and (3). Suppose that we do not find r. among the relation symbols associated with di. This means that we will have to examine all b5-blocks associated with the b1 -4lock(s) representing di in search of one which represents the relation symbol

187 rj, but without success. Of course, this is simply sub-procedure a7, which has a time cost z7(Cr ). 1 Suppose next that we do find rj but not Pk. If there is but a single copy of each b -block associated with the bl-blocks representing di (i. e., if mr =1), the b5-block representing rj may be found via a7, which has a time cost z* (C ). We will then examine via a3 all bll-blocks associated with the b -block representing rj in search of Pk' which will not be found. The time cost required for this is z3(Cp). If on the other hand m > 1, we must examine all b5-blocks p r1 to insure that we have found all those representing rj. Then for each b -block representing rj we must perform a3. The time cost required in this case will be z7(Cl) + mrz3(Cp). r p Finally, suppose that we find both rj and pk. In this case we may stop accessing b5-blocks as soon as Pk is found. Thus, the time cost required to find r~ will be z (C ). Since there are m b -blocks eq u i t f(r7 r r 5 which represent rj (and which are associated with the b1-block(s) A representing di), we would expect Pk to be associated with the m -th Pk r1 b5-block representing rj. This means that for this b5-block we need examine only part of the b l-blocks associated therewith at a time A cost z3(C ), but for the first m -1 b5-blocks representing r. we p 1 must examine all associated bll-blocks at a time cost z3(C) each.

188 Implicit in this discussion has been the assumption that pi = 1. If in fact p1= O, we may use a4 and a* in place of a3 and, respectively. The reason for this, of course, is that the operation characterized by C must access the b -block associated with a r1 6 given b5-block when p -= -and we may use this b -block instead of the b5-block as our starting point in the search forPk. Clearly, z4(Cp) p and z* (Cp) will then be used in place of z3(Cp) and z3(Cp, respectively. Let us now summarize the time cost for each of the three situations: (1) rj is not found Z7( r1) (2) rj is found, butpk is not (6+'8)z(Cr1)+A6A8z7(C rl) + mr [P1Z3(Cp)+lZ4(Cp)] (3) both rj and k are found z*(C )+(m 1)[ PlZ3(Cp)+PlZ4(Cp)] +[ PlZ3(Cp)+Pl+Z (C) 7 z1 rl+( r (Note that = a or A - implies m =1 andA6 1 andA8 1 implies in general that m > 1.) If we assume for the moment that the probability that the first of these situations occurs is given by Y1, the probability that the second

189 occurs is given by Y2, and the probability that the third occurs is given by y3, then the time cost tll will be given by the following expression: tllTa + y1 7(Cr) +Y 2 { (6+A8) z(C rl)+68Z7 (C )+m [Plz3 )pz4( +y3{z7(C r)+(mr -1)[ P1z3(C)+Pl14(Cp )] +[P1Z3 (C)+P z4()] } Let us now investigate the determination of the probabilities Y1,Y2, and Y3 (along with some other probabilities for which we will have use later). Let us define an event A as "the relation symbol r is associated with the source d", where rEP and de A. We can determine the probability that event A is true, Pr {A}, as follows. Associated with each de A are k ~2k elements of P. Clearly, for any de A there are then kko elements of the k~ elements of P for 16 IC r 2 4 r which A will be true. Thus, kO2k Pr {A} = 4 r Similarly, associated with each re P are mr kOk elements of For any re there are then mrk elements of the k Ao For any rc P there are then m k-k- elements of the k0 elements r13 ~~a

190 of A for which A will be true. It follows that m kokO Pr {A} = 3 ko a Recall, however, that 1.3r kOkOko 1 3 r Making this substitution for m yields once again Pr {A} kO r We can obtain this same result in a somewhat different way. Since IA I =k~ and I Pi =kO there are kok~ possible (d,r) - pairs. a r' ar However, since there are k0k~ elements of Ptassociated with each element of A, only kk24k~a (d, r) - pairs actually cause event A to be ~ 4 a true. Thus, Pr {A} k Ok k ar r Let us define an event B as "the relation symbol r is associated with the target P", where re P and Pe I, In a manner similar to that for Pr{A} we can show that

Pr{B} = m kor P For instance, associated with each pe II are kO7k9/m elements r of P. Since I PI = k, the expression for PrtB} follows directly. Let A' be the event "the relation symbol r which a given 1b-block (or b6-block) represents is associated with the source d", where r P and deAo Since there are k~k~ sources associated with each b 5-block (or b -block) and a total of k~ sources, it follows that 13 Pr {A'} = Similarly, let B' be the event "the relation symbol r which a given b7-block (or b6-block) represents is associated with the target p", where rE P and pE II Since there are k~ko0 targets associated with each b7-block (or b6-block) and a total of k~ targets, we obtain krsk0 Pr { B'} = k 0 ko Let A" be the event "one of the m b -blocks (of the untransr 6 formed SSM) representing the relation symbol r and associated with

192 the target p is associated with the source d", where d A, re P, and PCH Associated with the bl-block representing the source d are k02k4 b6-blocks representing distinct relation symbol. We wish to know the probability that one of these b6-blocks coincides with one of the mr b -blocks representing the relation symbol r and assor 6 P ciated with the b1l-block representing the target p. Since there are m6 b -blocks which may be grouped into sets of m we have m /mr 6 6 r 6 r P p sets from which to choose. Furthermore, since we have ko2k opportunities for success, Pr {A"} - k2k m6/mr kp k~ = k k m 1 3 24 r a 4 a rP kO Finally, let C be the event "the source d is associated with the target p", where deA and PE II Recall the-old-ball-in-the-urn trick: for k =, 1,*, n, the probability of the event Ek that one will score exactly k successes k

193 (where a success corresponds to drawing a red ball) when one draws without replacement a sample of size n from an urn containing M balls, of which m are red, is n (m)k (M m)n-k. Pr {Ek} = ( k) (M) where n (n)k n' (k) -f k'(n-k)' (M)n = M(M-1)... (M-n+l) and (M)o = 1 For our purposes here instead of containing balls, the urn will contain the sets of mr b -blocks which represent a given relation r 6 p symbol associated with the bll-block representing a particular target. The k~~g / m sets of b -blocks which are assoc iated with the bll p block representing the target P will correspond to the red balls, and our sample will consist of the ko2k b6-blocks representing distinct relation symbols and associated with the b -block representing the source d. Then kokOko r 13 p

194 kO ko 7 9 m = r p n= kO ko 24 Pr{C } is then the probability that our sample of k0 ko b -blocks will contain a b-block from at least one of the k~kok/m sets of p b6-blocks. Thus, Pr{C} = 1 -Pr {Eo} 1= ( n (m)o (M-m)n ~ (M)n 1 -(M-m)(M-m-1)' ~ o (M-m-n+l) M(M-1). - (M-n+l) Clearly, the values assigned to m and n may be interchanged without affecting our result. To digress for a moment, we may also apply this ball-in-the-urn concept to the event A". In this case M is still given by kOkOko 2 4 a M 2m kok r 1 3 p and n is still given by n = kOk, but m = 1 since we are interested 2 4'

195 in only a single set of m b -blocks. Furthermore, we wish to know rp 6 the probability of exactly one success (which is the same as at least one success in this case, since there can be no more than one). Hence, Pr{A"} will be given by Pr {A"} = Pr{E1} n (m)l (M-m)1 (M)n m(M-m)(M-m-1). * (M-m-n+2 n M(M-1).o (M-n+l) (M-1)(M-2). * * (M-n+1) M(M-1)... (M-n+1) n = k~k. 2 O k~ k k ko kD 1 3 m r p k~ a which is, of course, the result obtained before. These results are summarized in Table 4-4, where for notational convenience we have introduced some new symbols xi for iE{1, 2, o, 7} to represent the various probabilities* x has been reserved for use in considering uniqueness of type 2, f6r which there must be defined an event B" analogous to event A"T

196 k~ ~ x1= pr{A} = kO2 r x2 =Pr{B} = nr kO 9 p ko kO 13 x = Pr{A'} = 1 k0O a x4 =Pr{B'} = 8 10 k o p o O x5 =Pr{A"} = mr.. P a x7 =Pr{C} =1 (M-m)(M-m-1).. (M-m-n+l) M(M-.1)o. (M-n+1) kO0k~k~ ~ 4 a where M = m kko r kik3 p k7k 9 m = m r p = kO2k~ Table 4-4. Summary of Probabilities

197 Returning now to consideration of tll, we see that the probabilities Yl, Y2, and y3 are given by Y1 = l-xl Y2 =x1(1 -x4) Y3 = X1X4 Let us rewrite the expression for t11, incorporating these values for Y1, Y2, and Y3 t1l = Ta + (-x1) z7 (Cr ) + Xl(1 -x4){1 A6+ 8)z 7(C r )+6 8Z7(Crl)+mrl[Pz3 )4lz4)} r rp rz(C4 +xlx4{z7(C )+(mrl-)[Plz3(Cp)+PlZ4(Cp )] +[Plz 3(Cp)+Plz4(C)] Now let us consider the derivation of an expression for t12, the number of time units required to perform Q1 using its second method. This method is characterized by the following basic steps: (1) Access all occurrences of the relation symbol r. in the storage structure. (2) For each occurrence of the relation symbol r. examine all associated sources in search of di.

198 (3) If d. is found, examine all targets in the corresponding target set in search ofpko To initiate access of all occurrences of the relation symbol rj, we must first access the head of the corresponding relation ring, which has a time cost of Tr time units. If di is not associated with rj, we will examine all sources associated with all occurrences of rj via a5 in search of di, which we will not find. Since the probability that di is not associated with rj is 1-x1, the effective time cost for this situation will be (1-x1)z5(Ca). If di is associated with rj but Pk is not associated with this combination, we are faced with two possible situations. If there is a single copy of each b6-block associated with the b1-block(s) representing an arbitrary source (i. eo, if mr 1), the b6-block representing the relation symbol rj associated with the source di may be *determined by If on the other hand m >1, all b6-blocks re50 r1 6 presenting rj and all their associated b1-blocks will have to be considered in the futile search for Pk- In other words a5 must be used to insure that no copy of a b6-block representing rj has been overlooked. In both cases (m =1 and m > 1), all b -blocks r r 11 111 associated with each of the m b -blocks must be examined via 1 6 a4 in the search for pk. Since the probability that di is associated with rj but pk is not associated with that combination is x1(l-x4),

199 the effective time cost will be X(1 -x4)[ ( —8)z 5(Ca) + A 6A8z5(Ca) + mr Z4(Cp)l Finally, if di and Pk are associated with the same b6-block which represents rj, we may use a5 to find the b1-block aad b6-block representing the di/rj combination with which pk is associated. Since there are mr b -blocks which represent r. and which are associated with the bl-block(s) representing di, we would expect A Pk to be associated with the m -th b -block of these. Therefore, 1 A for rn -1 of these blocks we will use a4 and for the m -th block we will use a4' Our effective time cost for this situation will then be A x1x4[ z5 (Ca) + (m -1)z4(C%)+ z4 (Cp)] Combining all the terms obtained above yields the following expression for t12: t12 = Tr + (1-X1)z5(Ca) +x1(1-x4)[ (-6+X8)Z5(Ca)+A6A8z5(Ca)+m 4(Cp)] +X1x4[ z C)+z(C A + z4 (Cp )] Next w ict(Ca) + (mn 1o u Next we will consider t3, the number of time units required to

200 perform Q1 using its third method, Because of the symmetry of the SSM and the symmetry of methods 2 and 3 for Q1' we would expect a very strong similarlity between the expressions for t13 and t120 In fact if mr =1, these expressions should be exact analogs. Let r us then assume for the moment that mr is 1. The resulting P expression for t13 will be t3 =Tr + (1-x2) z6 (C) +x(1- )+AxA z )+mrz2(CM)] +X2(1-X5)[ (-A3+A5)Z 6(Cp 5 6 r2 a) +X2x,[ z (C )+(m -1)z2(ca)+z 2(Ca)] +2Xs z16 m 2a 2 Let us now consider what effect mr f 1 will have upon this r expression0 Quite simply if m > 1 there will be more than one b -block which represents the same realtion symbol associated with the b1l-block representing a given target, This means that when Pk is associated with r. but di is not associated with this combination, we must consider all b6-blocks representing rj and all their associated bll-blocks in the futile search for d. in order to insure that no b61 6 block representing rj and associated with Pk is overlooked. This is true regardless of whether or not mr = 1. Secondly, instead of mr b6-blocks to consider, we will have mr mr b -blocks representing p 2

201 r. to consider. J Let 6p be a binary-valued variable, the value of which indicates whether(6 1) ornot(5=0) m =1. Also, letm =m m p p r z r r p p p 2 Then we can rewrite the expression for t13 to reflect the effects of m as follows: r P t13 - T + (1-x2)z6(C +x(5)[ p1 x 6 ( +E)Z 6 P 35)Z (Cp) mpz Z(Ca] A +X2X5[ Z5 6 (Cp)+(mz zl)Z2(Ca)+z (Ca)] p Finally, let us consider t14, the number of time units required to perform Q1 using its fourth method. Again, because of the symmetry of the SSM and the symmetry of methods 1 and 4 for Q1, we would expect a strong similarity between the expressions for t14 and tll The comments made about mr in our discussion of t13 also apply p here. Thus, we can easily derive the following expression for t14: t14 =T + (1-x2)z8(C ) +x2(1-x~ P( p3+ 5)Z r2 3A5 )ZC)+mz[ P3zla)(Ca)] } +x2x5{z8(Cr2)+(m 1)[P3zl(Ca)+p3z2(Ca)] +[ P3zl(Ca)+P3z2(Ca)] }

202 As a matter of academic interest we should like to be assured that the probability that operation Q1 is successful will be the same regardless of the method of implementation. Clearly, we are guaranteed that this is so if it is true that x1x4 equals x2x5. Let us check then whether the following equality is true: XlX4 =X2X5 k~~ k~oko k ~k koko 2'k4 8 10 7kl 1 3 kO ko m k~ r ko r p r r P a p kO OkokO k~kO kk kokk0k~k 2'4 8 10 a p 3 7 p which we know to be true. We can continue in the manner above to develop expressions for the methods for each of the remaining primitive operations. Rather than belabor these derivations to the point of monotony, we have chosen simply to summarize the expressions for the various t.i in Appendix D.

203 4.3 Storage Cost Function To complete our discussion of the measures of performance, let us develop an expression for the number of storage units required by a given storage structure. If we can determine the number of storage units required for each type of block in a given structure, then we may sum over all block types the product of the number of storage units required for each type of block and the number of blocks of that type in the structure to determine the basic storage cost of the structure. For instance, if ui represents the number of storage units for each biblock, then the basic storage cost will be given by 11 Z miui 1 =1 To determine the entire storage cost of a structure we must add to this expression the number of storage units required by the various description blocks and the source, relation, and target ring heads. Let ud represent the (average) number of storage units required by a description block and let ua, ur, and up represent the corresponding quantities for the source, relation, and target ring heads, respectively. The quantity to be added to the basic storage cost will then be given by

204 nud+k~ u +kO u +k~ u d a a r r p p Thus, we may define the storage cost S for a given storage structure as 11 S =3i mu +i+ud +k~ u +k~ u + k uPp ii d aa r r p p Let us consider now the determination of ui for all appropriate values of i. In general, a bi-block consists of a number of fields containing pointers for the various rings which may pass through it, possibly a type field, and (if ie{5, 6, 7}) possibly a relation symbol name field. If we know the fields of which a bi-block is composed, we can sum the numbers of storage units required by these fields to determine u. Let us assume that a field containing a forward pointer for an a.-ring requires sf units of storage and that a field containing a head i pointer for an ai-ring requires sh units of storage, where iE {1, 2, o, 10}o Further, let us assume that a description block indicator requires sd units of storage and that a pointer in a source, relation, or target ring requires sp units. Finally, assume that a relation symbol name field requires sr units of storage and a block type field requires st units. (If the type code does not require a separate field, we will assume that st = 0.)

205 As an example let us consider the derivation of an expression for ul, the number of storage units required by a bl-block. A block of this type contains a pointer field for the appropriate source ring; if 1 = 1, it contains a pointer field for an al-ring; if 01 = 1, it contains a pointer field for a head pointer; if al = 1, it contains a pointer field for a description block indicator; and finally, if T1 =1, it contains a type field. As a result, u1 may be characterized by the following expression: u1 =s + 0sjf +0 Sh + 1 d +1 st In a similar manner we may obtain expressions for all other block types. u11 =sp+ 0 sf +10s Sh + 2 Sd + 11 st 11 p10f10 10 i 1-1 f + 1 Sf T. St) for i {2,4,8,10} ui =i(0 Si- s + s1 + 0 S + 0 SS + s + t) -i f i-i hi-1 i f I it for ie {3,9} u =5 5 (04 f + 4 Sh + 05lf s 5 +lr + 5 St) 4 4 5 5 u6 = 6 (05 f +06 f p 2 r + t 7( 6 +

206 We note that as a practical matter it is often the case that sf. I sh, Id, and s are all equal (for all ie {1,2,oo. 10}). 1i d) p

Chapter V A PROCEDURE FOR THE DETERMINATION OF A MINIMUM COST STORAGE STRUCTURE In the previous chapter we developed two measures of perfor - mrance - a time cost function T and a storage cost function S - for our Storage Structure Model. Our goal in this chapter will be to develop a procedure which utilizes these measures to determine a minimum cost storage structure for a given data structure. (Recall that we consider a storage structure to have minimum cost if it minimizes T subject to a given constraint on S.) 207

208 5. 1 Reducing the Number of Feasible Solutions T and S are functions of a large number of variables. In order to facilitate referring to these variables we will partition them into two classes: parametric variables (or simply, parane ters) and decision variables. The parametric variables are those which characterize the environment in which our storage structure is to exist (e. g., si, fi, k~, and Sf) and are those over which we assume no control can be exercised to minimize T and S, The decision variables, on the other hand, characterize the form of the storage structure itself. In particular, the class of decision variables consists of 01'10' 01e 0 0, ~a~ A10' ~ ~ 10 O' 12' PlP2 and p3. These are the variables over which the cost minimization is to be performed. We know from our discussions of Chapter III that the values of the various decision variables may not all be specified independently For instance, the values of 2~" * 10 are determined uniquely once values are assigned to A1f O A10' P1,P2, and P3. Let us assume for the moment, however, that the decision variables 01~~ 10' A...AO C' 1'2'PP2 and P3 are all independent0 Since each of these variables is binary-valued, this implies that it is possible to specify 235 or roughly 3x 1010 different storage structures via these variables.

209 Clearly, some caution must be exercised in choosing a method for determining that storage structure which satisfies our conditions of optimality lest solution of our problem become computationally infeasible Unfortunately, the situation is further complicated by the fact that the time cost function T is rather ill-behaved. First, T cannot feasibly be written as an explicit function of the decision variables, Second, T is a monotone function of none of the decision variables except P1,P2, and p3. As a result, T does not lend itself to minimization by any of the common optimization techniques for functions of zero-one variables, In order to reduce the number of "feasible solutions" which must be considered in determining the optimal storage structure, let us make a number of observations and decisions based upon these observations. Consider for a moment an ai-ring and ai+l-ring pair where ie {2, 4,6,8}. In general, this ring pair may assume any of the forms given in the following table:

210 ai-ring ai l-ring Oi 0i+1 ai Ai+l ring ring 1 1 0 0 ring stack 1 0 0 0 stack ring 0 1 0 0 ring duplicated 1 0 0 1 duplic ated ring 0 1 1 0 stack duplicated 0 0 0 1 duplicated stack 0 0 1 0 duplicated duplicated 0 0 1 1 We note that the time cost required to sequence through a stack is less than or equal to that required to sequence through a ring of the same composition. We also note that a ring and a stack have the same basic structure with regard to the ordering and accessability of their elements. Furthermore, a stack requires less storage than a ring of the same composition. Since we are not concerned with manipulative operations (for which a ring can be preferable to a stack), it follows that the structures "ring-stack" and "stack-ring" for the a.-ring and a. -ring pair will i+1 always require less time for equivalent operations and less storage than the structure "ring-ring". Since the remainder of the SSM is the same regardless of which

of the three structures "ring-ring", "ring-stack", or "stack-ring" is implemented for the given ring pair, it is always advantageous to choose one of the latter two. We can therefore exclude 0i=0i+ =1 for ie {2, 4, 6,8} from further consideration. Suppose that a1= 0 and a2= 0. Then each al-ring and a10-ring pair functions just as the ai-ring and ail -ring pair just discussed (assuming that A and II are not disjoint). It follows that if ac= 0 and a2 =0, we can exclude 01= 010 = 1 from consideration. (Recall that when (1 = 0 and a2= 0 at least one of 01 and 010 must be 1, however.) If on the other hand al = 1 or a2= 1 or both, then the al-rings and the a0-rings will be independent of one another (as well as of the rest of the SSM). Applying the stack-versus-ring arguments to each in turn leads us to conclude that it is always advantageous to make the al-rings and the a0-rings stacks instead of rings. Thus, when al=1 I ~10 or a2= 1 or both, we should always set 01= 0 and 010= 0. These observations have allowed us to reduce the number of combinations of values for 01.. 010 from 1024 to at most 81 when a 1 or a2= 1 or both and to at most 162 when a 1=0 and a2= 0 Let us now examine the role of the head pointer in an arbitrary ai-ring. If for a particular operation the direction of access is away from the head of the ring, the head pointer can make no contribution We can say in general then that if there are no operations to be

212 performed upon the structure such that the direction of access is toward the head of the a -ring, there is no benefit to be obtained from a head pointer in that ring and we can set 0i =0. This will, of course, also tend to minimize the storage requirement of the structure. Suppose on the other hand that there is at least one operation to be performed upon the structure such that the direction of access is toward the head of the a.-ring. Since we expect to enter the ring A at the ki-th element,either we can use the head pointer to go directly from this element to the head or we can step through the intervening ki-1 elements to reach the head, where the time required to step from one element to the next is either si or fi depending upon whether the a.-ring is a stack or an explicit ring, respectively. It follows that in order to accrue any benefit from a head pointer the time cost required to follow it must be less than or equal to that required to step through the intervening elements. If the ai-ring is an explicit ring (0i= 1), this means that A h. < k.f. 1- 1 1 must be true in order to gain advantage from 0 = 1.o (Recall our earlier assumption that s. <fi. < h for ie {1, 2, l 0o ) If the ai-ring is a stack (0i= 0 and Ai= 0), however, the situation 1~~~~~~~

213 becomes slightly more complex. Setting 0!i 1 when 0i= 0 may mean that the time required to follow the head pointer is hi+ so instead of his (See the expressions for the various zi. ) Thus, if 0!= 1 results 1ll 1 1 in the inclusions of an s term, the condition A h. + s < k. s. 1 0 -1 1 must be satisfied for 0= 1 to result in a time reduction. On the other hand if no such inclusion results, the condition A h. < k.s. -- 1 1 must be met, The result of this is that the values of 0" * 10j may be uniquely determined from the values of 01 * 010 and AlO 10 (given actual values for so and si, fi, and hi for ie{1,2,... 10}), which reduces the number of solutions which must be considered by a factor of roughly 103 To further reduce the number of solutions which must be considered, let us treat the variables a1, 2'P1'P2, and p3 as external decision variables. That is, let us assume that we assign values (externally to the solution process) to these variables and then solve for the optimal storage structure subject to the constraints of these values.

214 As a practical matter this assumption does not appear to be very restrictive since the overall optimal solution will probably result in the values C1 C2 P1 P2 P3 1 1 11 0 1 1 0 and o 0 most of the time anyway. In any event all possible combinations can be considered if so desired by specifying them individually. We are now faced with the problem of determining values for only 01i 0010 and A1 * *A10 such that our optimality conditions are satisfied. It so happens that the constraints upon these variables (Chapter III) further reduce the number of solutions to be considered by a factor of 10. In general terms we have reduced the number of solutions which must be considered by a factor of 10 via our decisions concerning 01... 010 a factor of 103 by our decisions involving 01'. * 0'10 a factor of 30 for Tal,c 2'P,pP2, and p3, and a factor of 10 from the constraints on 01I.- 010 and A1.. A10. As a result we are faced now with considering only 104possible solutions. (In fact, when (1 nU'd a2 are not both 0, we need consider at most 8960 solutions.)

215 Although we might wish to reduce this number further (which may in fact be impossible to accomplish through reasonable effort), it falls well within the computational limits of the computer and, hence, should not concern us unduly at this point. We will, however, indicate a special case which allows us to reduce the number of solutions which must be considered by a factor of approximately 2. Because of the symmetry of the SSM, if we assign values to the various parameters and external decision variables in a symmetric manner (e.g., k0 - k0 k o k, = a=2' P = P3, etc.), each solution will have a "mirror image" which has the same time and storage costs. For instance, if 01. 010 = 0001000010 and A1-l o o A10 = 0110111001 is one solution and if the various parameters and external decision variables have symmetric values, then the mirror image solution 01.. 010 = 0100001000 and Al A1 = 1001110110 will have the same time and storage costs as the first solution. This means that once we have considered a given solution we need not consider its mirror image. Note that certain solutions are their own mirror images. Hence, we must consider something over half of all the possible solutions.

216 5o 2 Optimization Procedure Let us now consider the basic procedure which we will use to determine the solution which satisfies our optimality conditions. First we assign values to the various parameters and external decision variables. We then generate a sequence of all the value combinations for 01- 10 For each of these value combinations we generate a sequence of all those value combinations for A1 h10 which satisfy the constraints of Chapter III. For each resultant assignment of values to 0io.. 10 and hAl 10 we may determine the corresponding values of 01.'10 and 02hm 10 A given combination of values for 01o..o10o, ~~~)0' A1.'~o a0... P10 describes a possible state of the SSMo Our goal is to determine that state of the SSM which minimizes the time cost function T subject to a possible limit on the value of the storage cost function S. Let Hi represent the i-th state in the sequence of states, where ie{1, 2, I o,M} and M is the number of states0 Assume that the state i-l 1is considered before the state ri which in turn is considered before the state ~i+l and so forth. Let Ti and Si represent the values of T and S, respectively, for the state pi. Let SO represent the upper bound to be placed upon the storage

217 cost function S. We may then define the following procedure to determine the optimal state 77 * of the SSM. (1) Set T* = so and i =1. (2) If SO = CO, go to step (3). Otherwise if Si > So, go to step (4). (3) If Ti> T*, go to step (4). If Ti < T*, set T* = Ti S* =Si 1,,i 7* =,i' and go to (4) If Ti = T* and Si< S*, set S* Si, 7* = i' and go to (4). (4) Set i =i+l. If i < M, go to step (2). Otherwise, stop. 77* is the optimal state of the SSM. To illustrate the feasibility of this approach the procedure has been implemented via a prototype program on the Michigan Terminal System (MTS) for the IBM 360/67 at The University of Michigan. In the next chapter we will consider some results obtained through the use of this program.

Chapter VI APPLICATION Tn the preceding chapters we have developed a model for data structure, a model for the storage structures which can represent any data structure specified by the data structure model, and finally a technique for determining a minimum cost storage structure to represent a particular data structure in the solution of a given problem. In this chapter we will apply these tools to a specific problem to determine a minimum cost storage structure for the data associated with the solution of that problem, we will compare the results we obtain to the storage structure used in an existing system for solving that problem, and we will examine the sensitivity of our results to variations in the parameters used to describe the problem and the structure of its data. 218

219 6. 1 The Problem: A System for Medical I)iagnosis As the example to illustrate the use of the techniques we have developed, let us consider the representation of data associated with a computer system for aiding medical diagnosis. In the next few sections which follow we will concern ourselves with a general description of a diagnostic system, the specification of the data and the data structure for a particular diagnostic system, and the determination of a minimum cost storage structure for that system. 6. 1. 1 General Description In the general sense, the diagnostic problem is to ascertain the current state of some given system. The diagnostician uses information acquired from past experience with such systems, coupled with specific observations or tests of the given system, in order to deduce the identity of its current state. In particular, the medical diagnostician is concerned with determining the "state" of his patient. The physician has learned through training and experience the sign ard symptom patterns associated with possible diseases from which the patient can suffer. In theory, a given set of signs and symptoms (thereafter referred to simply as symptoms) should characterize each disease uniquely0

220 In practice, however, different diseases may result in similar symptoms. Observation of many different symptoms may be required to identify a particular disease, and a given symptom may suggest many possible diseases. This fact, coupled with the large number of symptoms and diseases, requires the diagnostician to master considerable amounts of information. The diagnostic process is further complicated by a number of other factors. First, the relationships between symptoms and diseases are often known only in probabilistic terms. Second, the tests required to determine whether indeed certain symptoms exist may exact a high cost (in terms of risk to the patient, patient discomfort, money, etc.). This cost must be weighed against the potential usefulness of the test results. Finally, because of the vast amounts of information required in the diagnostic process, those symptom patterns and testing strategies associated with seldom encountered diseases may be effectively lost to (io e., forgotten by) the diagnostician. By using a computer program to provide general diagnostic assistance to its user, a number of these difficulties can be overcome, or at least minimized. One of the principal advantages to be gained from the use of a computer is the sheer bulk of information which it can maintain. Because a program can consider more possible

221 diagnoses than a human being, it can provide a strong sacfeguard that a particular disease is not overlooked in the diagnosis. For these reasons there has been increasing interest in computer- aided diagnosis. In particular, a number of programs have been written which are capable of performing diagnosis in particular medical areas. As a rule, these programs employ a Bayesian analysis of symptoms based on a disease-symptom probability matrix for the given set of diseases considered. That is, the programs compute the probability of disease D given the symptom profile S (i. e., the set of symptoms) as follows: P(D IS) = P(D)P(S ID) S P(D)P(S ID) D where P(D) is the a priori probability of the occurrence of disease D and P(S ID) is the conditional probability that S occurs given D. Since it is not our purpose to design a complete diagnostic system, we refer the reader to the literature for more complete descriptions of the mathematical techniques involved. 6.1. 2 A Particular Diagnostic System Gorry [ 25 ] has designed and implemented a general purpose diagnostic system (although the examples he has considered are

222 exclusively in the area of medical diagnosis) which is representative of these systems and is the one which we shall use as a reference. Let us examine the basic assumptions and characteristics of this system. The objective of the diagnostic process for the medical problem is to determine the malady or disease with which an individual is afflicted. This disease is assumed to be one of a finite, but perhaps quite large, number of possible diseases. Information concerning the given disease can be obtained by performing a variety of tests upon the patient. These tests may range from simple questions as in history-taking to complicated medical procedures such as exploratory surgery. The results of tests applied to the patient are then combined by the diagnostician with his experience with other diagnostic problems to deduce the particular disease, In the case of the diagnostic program this experience is reflected in probability distributions which characterize the results of certain tests given a particular disease. Specifically, this information relates symptoms (the results of the tests) to particular diseases. A symptom is assumed to be binary-valued (i.e., either present or absent) and a test is used to determine the presence or absence of some number (perhaps greater than one) of symptoms. Associated with each test is a cost of applying it to the patient (in terms of risk or discomfort to the patient, the service s of skilled personnel, money, etc.) and therefore it is advantageous to make a

223 decision about the disease based upon a limited number of tests. On the other hand associated with each disease is a cost reflecting the loss resulting from diagnosing that disease as another. (Diagnosing a malignant tumor as benign is very costly and should clearly be avoided.) The possibility of loss for an incorrect diagnosis tends to value extensive testing prior to making a decision. One must therefore balance the testing cost against the cost for an incorrect decision in an effort to minimize the overall expected cost. To accomplish the goals above, Gorry's diagnostic system performs three logical functions: (1) The interpretation of the symptoms for a particular problem via Bayes' rule, given conditional probabilities that particular symptoms are associated with certain diseases and given a priori probabilities that these diseases will occur in a given population. This is called the inference function. (2) The selection of tests to be applied to the patient under consideration in order to obtain further symptoms. This is called the test selection function. (3) The analysis of the symptoms observed to determine whether there are any irrelevant symptoms present (such as a sore elbow when all other symptoms indicate

224 typhoid fever) or, in the general case, to detect symptom patterns for more than one disease occuring simultaneously. This is called the pattern-sorting function. The use of the inference function should be quite clear, but the test selection function and the pattern-sorting function may require brief descriptions. Consider first the test selection function. The purpose of this function is to select tests in such a way as to minimize the overall expected loss of the diagnosis. For finite numbers of symptoms and diseases and a finite number of potentially useful test sequences the optimal choice of tests can be obtained by constructing a decision tree and folding back this tree in terms of expected loss. Such a tree - a portion of which appears in Figure 6-1 - consists of two types of nodes: decision nodes and what Gorry calls "nature's nodes". A decision node represents the current status of the diagnostic problem as given by the probability distribution over the states (i. e., diseases) of the system. Emanating from each decision node are a number of branches, one for each possible test which can be performed and one corresponding to a terminal decision. Each branch corresponding to one of the tests which can be performed leads to one of nature's nodes. Associated with each of nature's nodes are a number of branches

225 (lI )Decision Node >%3 Terminal Decision T, / \Tn AI A A A i (n0 Nature's. No de 2 3 2) ~~~~~~~Decision Nodes Figure 6-1. Section of a Decision Tree

226 representing the possible outcomes (i. e., symptoms) which can result from performing the test corresponding to the branch which leads to this node. Each of these "outcome branches" leads in turn to another decision node. Thus, if somewhere in the course of performing a diagnosis we encounter decision node dI of Figure 6-1, we may choose to perform any one of the tests T1. * Tn in order to gain further information about the state of the system under consideration (i. e., about the disease afflicting our patient), or we may choose to make a final diagnosis. If we decide to make a final diagnosis, we follow the branch of the decision tree which leads from decision node d1 to the terminal decision D and the diagnostic process is concluded. Suppose on the other hand we decide to perform test T1. In this case we follow the branch from decision node d! to nature's node n1. Depending upon whether performing the test T1 results in symptom sl or S2' we will follow the branch from nature's node n1 to decision node d2 or decision node d3, respectively. This newly accessed decision node, with the probability distribution of the states of the system updated to reflect the information conveyed by the symptom leading to the node, now represents the current states of the diagnosis. At this node we again have a choice between making a final diagnosis and performing one of several tests to obtain additional information.

227 Since the number of decision nodes of the decision tree grows exponentially with the number of symptoms and tests and since we may commonly expect to encounter problems involving large numbers of symptoms and tests, it is generally computationally infeasible to search the entire decision tree (or even a large portion of it) in order to determine the optimal sequence of tests to be performed to minimize the overall cost of the diagnosis. Therefore, Gorry has developed an heuristic which allows him to search a relatively small portion of the decision tree below each decision node encountered in the diagnostic process in order to determine whidch test;- should be performed at that node. While this heuristic results in a sub-optimal sequence of tests, the number of decision nodes which must be considered is greatly reduced, which more than offsets this disadvantage. Starting with the root of the decision tree (i. e., the decision node which represents the initial status of the diagnostic process), the test selection function applies Gorry's heuristic to a portion of the decision tree below this node and determines the best test to perform at this stage of the diagnosis. The result obtained by performing this test leads to another decision node of the tree, at which point a terminal decision may be made or the test selection function may again be invoked. This process continues until a terminal decision (i.e., a final diagnosis) is reached.

228 Now consider the pattern-sorting function. As we shall see later only those symptoms significant to the diagnosis of a particular disease are associated with that disease by a probability greater than zero. For instance, the conditional probability of the symptom "sore elbow" given the disease "typhoid fever" is zero since a sore elbow is not a symptom of typhoid fever. If the symptom "sore elbow" were observed, in the course of a diagnosis in which typhoid fever was considered a possible cause of all the symptoms observed, the posterior probability for the disease typhoid fever (as computer by Bayes? rule) would be zero, and typhoid fever would be eliminated from further consideration even though all other symptoms strongly indicated the presence of this disease. The problem here is that while a sore elbow is not a symptom of typhoid fever, a patient certainly can have typhoid fever and a sore elbow. This is an example of the more general problem of irrelevant or "noise" symptoms. Unless special precautions are taken, such symptoms can eliminate the actual disease from consideration when processed by the inference function. A second problem arises when symptoms associated with two or more distinct diseases are observed, as when the patient has more than one disease. In this case the diagnostic process must detect two or more patterns of symptoms, rather than dismissing certain symptoms as irrelevant. Gorry's diagnostic system overcomes these problems by processing

229 a number of symptom patterns in parallel during a diagnosis. A pattern is defined to consist of a subset of the set of symptoms observed to the current stage of the diagnosis, such that 1) at least one disease exhibits all the symptoms in the pattern with a non-zero probability, and 2) the pattern is not a subset of any other pattern. Each pattern, along with the probability distribution for the diseases of the patient given the symptoms of the pattern, is included in what Gorry calls a pattern stack*, which is maintained by the patternsorting function. In addition to creating the pattern stack for the initial set of observed symptoms, the pattern-sorting function processes every new symptom against the pattern stack and updates the patterns accordingly. In particular, if a new symptom is relevant to at least one disease already indicated by a pattern, the symptom is added to the pattern and its probability distribution is updated. If no disease associated with a pattern exhibits the new symptom, no changes are made to either the pattern or its probability distribution. In addition, after the new symptom has been processed against all patterns in the pattern stack, the pattern-sorting function forms new patterns involving * Gorry's use of the term "stack" in this case does not conform to our definition of the term. The pattern stack is actually implemented via a SLIP list.

230 the symptom if possible. Finally, if the inference function determines that the probability of a particular pattern is zero, the pattern sorting function eliminates the pattern and its probability distribution from the pattern stack. In general terms, Gorry's three functions interact in the following manner. The diagnostic system is provided with a list of symptoms which have been observed. The pattern-sorting function examines these, separates them into patterns, and sets aside for later consideration those patterns which do not appear to be relevant to the principal medical problem (i e., the set of most likely diseases). Next, the inference function, using the a priori and conditional probabilities which constitute the "experience" of the diagnostic system, creates a probability distribution for the set of diseases which exhibit the symptoms of the pattern under consideration, Finally, the test selection function utilizes the current probabilities of the various diseases, the cost of eachtest, and the usefulness of the results of each test to select a good test to apply to the patient. (Alternatively, the test selection function may, of course, suggest a final diagnosis.) When the results of the test have been obtained, this process is repeated until a diagnosis has been obtained. 6. 1.3 The Data and Its Data Structures Let us now describe within the framework of our data structure

231 model the basic data involved in the diagnostic process. The set of data items, A~, may be defined to consist of symptoms, diseases, and tests. We could also include the costs of the various tests and the a priori probabilities of the various diseases in the set of data items, but since there is a one-to-one relationship between costs and tests and between a priori probabilities and diseases, we choose to include the cost and the probabilities in each of the descriptions (i. e., definitions) of the tests and the diseases, respectively. If we let SO= {si li=l, 2, ~n s} represent the set of all symptoms, DO{Tdi i=1, 2,...nd} the set of diseases, and T~={tili=1, 2,...nt} the set of possible tests, then AO=SOUDo~ UTo. The relations of interest may now be defined as follows: rl ={(dit) IdiEDO, t T~, and test tj is relevant to disease di} r2- {(ti, sj) I ti T sjeSo, and symptom sj is a possible result of test ti}

232 r ={(si,dk) ISiESO, dkEDO, and the conditional probability that symptom s. is present given disease dk is xj} where x. is an element of the set of conditional probJ abilities constituting the experience of the diagnostic system. For these relations it is clear that A = II = ~. Note, however, that by reversing the order of the elements in the ordered pair (si, dk) for the relations r. we obtain A-D~o U T~ c AO and II = SO U T~ c AO. It may be advantageous to make this switch in order to reduce the cardinalities of A and I, but for the purposes of our example we will assume the relations as are originally defined.

233 We have not included the costs of misdiagnosis in this discussion. The reason for excluding these costs is primarily that we choose to consider the determination of a representation for these items as a separate problem. To be more specific, we feel that the symptoms, diseases, and tests are very strongly interrelated (via relations rl, r2, and the various rO) and, hence, should be considered together in determining a storage structure for their representation. On the other hand, we feel that the costs of misdiagnosis are not closely associated with any of these and, thus, should be considered alone in determining a storage structure for their representation. In other words, we feel that the costs of misdiagnosis have a data structure sufficiently distinct from the data structure for the symptoms, diseases, and tests to warrant separate consideration. This is then an example of a problem which we wish to partition into separate problems for the purposes of determining storage structures for the data involved. Note that rather than actually determining a storage structure for the representation of the costs of misdiagnosis (via our procedure), we will assume simply that these data are represented in matrix form. Let us now consider the actual diagnostic problem which we wish to use as our example. Warner and his associates [76 ] conducted a prolonged study of a number of types of congential heart disease, and as a result, they developed a disease-symptom probability matrix for

234 33 diseases (including "normal") and 50 symptoms*. The lists of symptoms and diseases appear in Tables 6-1 and 6-2, and the corresponding probability matrix appears in Table 6-3. Finally, the list of tests and their respective possible results appears in Table 6-4. In an attempt to generalize this problem somewhat we will map all probabilities of the disease-symptom probability matrix which lie within the range x- 0. 025 to x+ 0.025 to the value x, where x is some multiple of 0.05. That is, to every probability y which satisfies the condition x - 0. 025 < y < x + 0. 025 we will assign a new value x. For example, 0. 06 and 0. 07 will become 0. 05, and 0. 08 and 0. 09 will become 0. 10. Most of the probabilities in the matrix are already multiples of 0.05 so this has little effect upon the actual values therein (or their apparent validity). There are, however, a large number of entries with the values 0. 01 and 0. 02, and these will be mapped to 0. This should not adversely affect the diagnostic capability of the system and serves only to reduce the number of diseases which reflect a given symptom. It is hoped that the resultant interrelationships more accurately reflect those of a more general diagnostic problem, that is, In his consideration of this same problem Gorry apparently had access to Warner's updated matrix, for he considers 35 diseases and 57 attributes. The differences should be relatively slight, however.

235 Symptom Interpretation. SOl Age, less than I year S02 Age, 1 year to 20 years S03 Age, 20 years or more S04 Cyanosis, mild S05 Cyanosis, severe (with clubbing) S06 Cyanosis, intermittent S07 Cyanosis, differential S08 Squatting S09 Dyspnea S10 Easy fatigue S l Orthopnea S12 Chest pain S13 Repeated respiratory infections S14 Syncope S15 Systolic murmur, loudest at apex S16 Diastolic murmur, loudest at apex S17 Systolic murmur, loudest L 4th S18 Diastolic murmur, loudest L 4th S19 Continuous murmur, loudest L 4th S20 Systolic murmur with thrill, loudest L 2nd S21 Systolic murmur without thrill, loudest L 2nd S22 Diastolic murmur, loudest L 2nd S23 Continuous murmur, loudest L 2nd S24 Systolic murmur, loudest R 2nd S25 Diastolic murmur, loudest R 2nd Table 6-1o Symptoms for Congenital Heart Disease

236 Symptom Interpretation S26 Systolic murmur heard best posterior chest iS27 Continuous murmur heard best posterior chest {S28 Accentuated 2nd heart sound, L 2nd S29 Diminished 2nd heart sound, L 2nd S30 Right ventricular hyperactivity by palpation S31 Forceful apical thrust S32 Pulsatile liver S33 Absent or diminished femoral pulsation {S34 ECG axis more than 1100 ~S35 ECG axis less than 00 { S36 R wave greater than 1o 2 mv in lead V1 S37 R' or qR pattern in lead V1 S38 R wave greater than 2.0 my in lead V6 S39 T wave in lead V6 inverted (no digitalis) {S40 Early diastolic murmur, loudest at apex S41 Late diastolic murmur, loudest at apex S42 Holo-systolic murmur, loudest L 4th S43 Mid-systolic murmur, loudest L 4th S44 Holo-diastolic murmur, loudest L 4th S45 Early-diastolic murmur, loudest L 4th S46 Mid-systolic murmur with thrill, loudest L 2nd S47 Holo-systolic murmur without thrill, loudest L 2nd S48 Mid-systolic murmur without thrill, loudest L 2nd S49 Holo-systolic murmur without thrill, loudest L 2nd S50 Murmur louder than gr 3/6 Brackets indicate mutually exclusive symptoms. Table 6-1. Symptoms for Congenital Heart Disease (Cont.)

237 Disease Interpretation D01 Normal D02 Atrial septal defect D03 Atrial septal defect with pulmonary stenosis D04 Atrial septal defect with pulomonary hypertension D05 Complete endocardial cushion defect D06 Partial anomalous pulmonary venous connections D07 Total anomalous pulmonary venous connections D08 Tricuspid atresia without transposition D09 Ebstein's anomaly D10 Ventricular septal defect with valvular pulmonary stenosis Dl1 Ventricular septal defect with infundibular stenosis D12 Pulmonary stenosis, valvular D13 Pulmonary stenosis, infundibular D14 Pulmonary atresia D15 Pulmonary artery stenosis D16 Pulmonary hypertension D17 Aortic-pulmonary window D18 Patent ductus arteriosus D19 Pulmonary arteriovenous fistula D20 Mitral stenosis D21 Primary myocardial disease D22 Anomalous origin of left coronary artery D23 Aortic valvular stenosis D24 Subaortic stenosis D25 Coarctation of aorta Table 6-2. Heart Disease Types

238 Disease Interpretation D26 Truneus arteriosus D27 Transposed great vessels D28 C orrected transposition D29 Absent aortic arch D30 Ventricular septal defect D31 Ventricular septal defect with pulmonary hypertension D32 Patent ductus arteriosus with pulmonary hypertension D33 Tricuspid atresia with transposition Table 6-2. Heart Disease Types (Cont.)

239 SYMPTOMS Diseases Incidences S01 S02 S03 S04 SO S06 S07 S08 DO1 0.100 01 49 50 01 00 01 00 01 D02.081 10 50 50 02 01 02 00 01 D03.005 30 60 10 20 10 20 00 01 D04 o001 10 20 70 30 10 25 00 01 D05.027 20 50 30 15 05 10 00 01 D06.005 10 40 50 01 01 01 00 01 D07.001 20 70 10 65 10 05 00 01 D08.018 50 48 02 30 65 01 00 10 D09.001 10 45 45 22 44 01 00 22 D10 o 054 40 55 05 25 25 10 00 30 D1l.063 40 55 05 30 30 10 00 40 D12.045 20 70 10 01 01 01 00 01 D13.013 20 70 10 01 01 00 00 01 D14.014 90 09 01 10 90 00 00 80 D15.001 05 45 50 01 01 01 00 01 D16.013 10 45 45 01 01 01 00 01 D17 o 001 30 60 10 05 01 01 00 01 D18.072 20 40 40 01 01 01 00 01 D19.002 20 30 50 45 45 01 00 01 D20.008 20 50 30 01 01 01 00 01 D21.013 70 29 01 01 01 01 00 01 D22.001 70 29 01 01 01 01 00 01 D23.036 10 80 10 01 01 01 00 01 D24.009 10 80 10 01 01 01 00 01 D25.054 10 70 20 01 01 01 00 01 D26 G 005 50 40 10 30 60 01 00 15 D27.o 063 90 10 00 20 60 05 10 05 D28.001 30 30 30 30 05 10 00 01 D29.001 60 39 01 01 01 01 80 30 D30 o252 15 70 15 01 01 01 00 01 D31.081 30 60 10 30 50 10 00 05 D32.o 005 30 40 30 01 01 05 50 01 D33 o 009 40 55 05 50 20 10 00 01 Table 6-3. Symptom-Disease Probability Matrix

240 Diseases S09 S10 S11 S12 S13 S14 S15 S16 D01 01 10 03 05 05 03 05 01 D02 35 50 05 02 40 01 02 02 D03 60 70 05 02 10 10 02 02 D04 80 90 05 05 15 10 02 02 D05 40 50 05 05 30 05 60 15 D0'6 15 20 01 05 05 01 02 02 D07 70 80 05 05 20 05 02 02 D08 80 90 20 05 15 10 02 05 D09 80 80 10 30 15 22 05 25 D10 75 90 05 05 10 20 02 02 D1l 75 90 05 05 10 25 02 02 D12 50 65 01 01 01 10 02 02 D13 50 65 01 01 01 10 02 02 D14 90 99 05 10 05 35 02 02 D15 01 01 01 01 01 01 04 01 D16 70 95 40 10 10 10 01 01 D17 10 10 05 01 10 01 05 10 D18 20 20 10 01 10 05 05 15 D19 10 20 05 01 01 10 05 02 D20 50 50 40 05 10 10 80 20 D21 40 50 20 01 05 05 15 02 D22 30 30 30 80 15 20 05 01 D23 20 30 20 15 01 35 20 02 D24 20 30 20 15 01 35 20 02 D25 20 30 20 01 01 05 05 01 D26 15 30 05 01 20 10 02 02 D27 60 70 20 01 05 10 05 02 D28 10 20 011 0 01 01 05 02 D29 10 50 05 20 01 20 05 02 D30 20 30 05 01 15 05 05 20 D31 60 70 20 10 20 10 05 01 D32 20 30 10 01 10 05 02 02 D33 80 90 20 01 30 05 05 10 Table 6-3. Symptom-Disease Probability Matrix (Cont.)

241 Diseases S17 S18 S19 S20 S21. S22 D01 70 02 07 00 80 01 D02 30 20 02 05 90 02 D03 05 05 02 57 40 01 D04 15 20 02 05 40 20 D05 90 40 02 10 20 10 D06 20 02 02 02 60 05 D07 10 15 10 05 75 05 D08 65 05 05 20 20 02 D09 95 25 05 05 15 02 D10 20 02 05 65 25 02 D11 20 02 05 65 25 02 D12 10 02 05 70 20 02 D13 10 02 02 70 20 02 D14 40 05 05 01 02 02 D15 02 01 01 02 25 02 D16 30 05 01 01 05 30 D17 20 05 60 01 10 05 D18 10 02 50 02 13 05 D19 10 02 20 02 10 02 D20 10 10 02 05 10 02 D21 05 02 02 02 05 02 D22 01 01 01 01 01 01 D23 20 10 02 05 05 01 D24 20 10 02 05 05 01 D25 20 10 02 02 10 01 D26 70 02 02 10 10 02 D27 50 02 02 03 10 02 D28 70 02 02 05 30 02 D29 50 02 02 10 30 02 D30 95 05 02 10 10 05 D31 50 10 02 05 05 25 D32 10 10 02 02 20 10 D33 70 05 02 10 30 10 Table 6-3. Symptom-Disease Probability Matrix (Cont.)

242 Diseases S23 S24 S25 S26 S27 S28 S29 S30 D01 05 01 00 01 01 15 05 10 D02 02 0 1 01 01 01 60 01 80 D03 03 01 01 01 02 30 15 40 D04 01 01 01 01 01 95 01 50 D05 01 01 01 01 01 70 02 40 D06 05 01 01 10 15 40 02 10 D07 20 01 01 10 15 85 02 80 D08 05 01 01 01 01 02 60 01 D09 05 01 01 01 01 02 35 10 D10 05 02 02 10 15 10 60 20 Dli 05 02 02 10 15 10 60 20 D12 10 02 02 01 01 10 60 20 D13 02 02 02 01 01 10 60 20 D14 05 02 02 10 10 01 90 20 D15 01 20 02 50 05 10 02 10 D16 02 02 022 0 02 95 00 30 D17 20 02 02 02 02 70 01 20 D18 85 02 02 03 05 50 01 20 D19 05 01 01 05 70 05 05 20 D20 02 02 02 01 01 50 01 20 D21 02 10 02 01 01 20 02 10 D22 01 01 01 01 01 20 02 01 D23 01 95 05 01 01 20 10 01 D24 01 95 05 01 01 20 10 01 D25 05 15 10 80 15 10 10 01 D26 02 02 02 05 10 40 10 30 D27 02 05 02 01 01 20 10 20 D28 02 05 02 01 01 20 10 10 D29 02 05 02 01 01 90 02 40 D30 01 02 05 01 01 30 02 05 D31 01 02 05 01 01 90 02 30 D32 02 02 02 02 02 90 02 30 D33 02 02 01 01 01 30 10 01 Table 6-3. Symptom-Disease Probability Matrix (Cont.)

243 Disease S S31 S32 S33 S34 S35 S36 S37 D1O 03 01 01 01 02 02 02 D02 01 01 01 70 05 05 85 D03 01 05 01 85 05 20 70 D04 01 05 01 85 05 20 70 D05 10 10 01 05 70 05 85 D06 01 01 01 15 02 02 15 D07 01 01 01 90 02 25 75 D08 20 30 01 02 90 02 02 D09 20 10 01 10 02 02 60 D10 01 02 01 95 02 85 10 Dl1 01 02 01 95 02 85 10 D12 01 05 01 95 02 85 10 D13 01 05 01 95 02 85 10 D14 01 02 01 95 02 85 10 D15 01 01 01 10 02 10 02 D16 01 10 01 95 02 90 05 D17 40 01 01 01 15 02 02 D18 40 02 01 02 10 02 02 D19 01 01 01 05 05 02 02 D20 05 02 01 50 02 10 40 D21 50 02 01 05 10 05 05 D22 05 01 01 05 10 05 05 D23 40 01 05 05 15 02 02 D24 40 01 05 05 15 02 02 D25 30 01 99 05 05 02 02 D26 05 01 30 10 40 10 D27 20 02 02 40 20 30 05 D28 10 01 01 20 10 10 10 D29 05 01 10 70 05 80 05 D30 30 01 01 30 10 05 05 D31 05 05 01 70 05 75 15 D32 05 05 01 70 05 75 15 D33 20 30 01 02 90 02 02 Table 6-3. Symptom-Disease Probability Matrix (Cont.)

244 Diseases S38 S39 S40 S41 S42 S43 S44 S45 D01 02 02 01 00 02 70 04 03 D02 02 02 01 02 01 30 02 20 D03 02 02 01 01 01 05 01 05 D04 02 02 01 02 D05 02 02 15 01 85 05 02 20 D06 02 02 02 02 02 20 02 02 D07 02 02 02 02 30 10 01 30 D08 90 10 05 02 50 15 05 02 D09 0 2 0 2 25 25 45 45 25 25 D10 02 02 02 02 20 05 02 02 Dl1 02 02 02 02 20 05 02 02 D12 02 02 011 0 01 10 02 02 D13 02 02 0 1 0 01 10 01 01 D14 02 02 02 01 30 40 02 05 D15 02 02 01 01 02 02 01 00 D16 02 02 01 01 01 30 15 05 D17 60 05 10 02 10 20 05 02 D18 50 05 10 02 05 10 02 02 D19 02 02 2 0 2 10 10 02 02 D20 02 02 20 20 10 10 10 10 D21 40 90 02 02 10 10 02 02 D22 20 90 01 01 01 01 01 01 D23 70 15 02 02 02 20 10 02 D24 70 15 02 02 02 20 10 02 D25 40 04 01 01 05 20 10 02 D26 20 05 02 02 40 40 02 02 D27 20 05 02 02 30 30 02 02 D28 10 10 02 02 30 30 02 02 D29 10 05 02 02 30 30 02 02 D30 15 05 20 02 92 05 05 01 D31 10 05 01 01 30 30 10 02 D32 10 05 02 02 10 10 02 02 D33 90 10 10 02 30 30 05 05 Table 6-3. Symptom-Disease Probability Matrix (Cont.)

245 Diseases S46 S47 S48 S49 S50 D01 00 00 80 05 10 D02 05 01 90 01 60 D03 60 01 38 01 70 D04 05 01 40 01 40 D05 02 20 20 20 80 D06 02 02 60 02 30 D07 05 01 80 02 70 D08 20 20 20 20 50 D09 15 15 05 05 50 D10 60 05 25 05 90 D11 60 05 25 05 90 D12 68 01 25 01 80 D13 68 01 25 01 80 D14 01 01 02 02 20 D15 02 01 25 02 60 D16 02 02 05 02 20 D17 02 02 10 05 75 D18 05 02 20 10 85 D19 02 02 10 10 30 D20 05 05 10 10 70 D21 02 02 05 05 10 D22 01 01 01 01 10 D23 05 01 05 01 90 D24 05 01 05 01 90 D25 02 02 10 05 65 D26 10 10 10 10 40 D27 03 03 10 10 50 D28 05 05 30 30 60 D29 10 10 30 30 20 D30 01 10 01 10 85 D31 01 05 01 05 50 D32 02 02 20 20 20 D33 10 10 30 30 50 Table 6-3. Symptom-Disease Probability Matrix (Cont. )

246 Test Possible Results T01 S01, S02, S03 T02 S04, S05, S06, S07, N T03 S08, N T04 S09, N T05 S10, N T06 Sli, N T07 S12, N T08 S13, N T09 S14, N T10 S15, N Til S16, N T12 S17, S18, S19, N T13 S20, S21, S22, S23, N T14 S24, N T15 S25, N T16 S26, S27, N T17 S28, S29, N T18 S30, N T19 S31, N T20 S32, N T21 S33, N T22 S34, S35, N T23 S36, S37, N T24 S38, N T25 S39, N T26 S40, S41, N T27 S42, S43, N T28 S44, S45, N T29 S46, S47, S48, S49, N T30 S50, N N means "normal" Table 6-4. Tests for Heart Disease Diagnosis

247 one which encompasses a greater variety of diseases and symptoms. Using this new probability matrix along with the numbers of symptoms, diseases, and tests, we will determine values for the parameters which characterize our data structure model: namely, kO kO ko m and kO.*k a p' r' r' 1 10 p Since ISO =50, D~O I =33, and [TO1 =30, we know that kO = kO = 113. From the probability matrix we see that there are a p 20 relations r~- r fo r 0.05, r for10 and f 1.00. j 1 2 * Y,g and f120 Therefore, k~ = 22. r For each "type" of relation (i.e., rJ r], and r2) we have compiled certain statistics which appear in Table 6-5. If the totals for items 1 and 2 and those for items 3 and 4 are divided by the number of sources and the number of targets, respectively, and if the number of sources and the number of targets are divided by the totals for items 5 and 6, respectively, we obtain the averages contained in Table 6-6. We can also determine from inspection of our given information that there are no identical (shared) target sets (II2's) and no two sources have a relation/target set pair in common. These two facts imply that k = 1 and k = 1, respectively. The fact that k = 1 implies that 7- 3 3 k=- 1, also. Items 5 and 6 of Table 6-6 correspond to k0 and kl0O respectively.

s rd dr t. t~rs. Total number of relation/target 937 636 50 pairs associated with a sources ~ given source S associated o distinct relations 2) suassociated with a given 332 33 30 395 sources source. 3) targ'ts f number of source/relation 937 636 50 1623 3) pairs associated with a targets given target number of distinct relations 4) 5 associated with a given 327 30 50 targets target J 5) number of source subsets (Al's) 50 33 30 113 6) number of target subsets (fT's) 33 24 30 Table 6-5. Statistics for Heart Disease Problem

249 1) Average number of relation/ 14.36 target pairs associated with a given source 2) Average number of distinct 3. 50 relations associated with a given source 3) Average number of source/ 14. 36 relation pairs associated with a given target 4) Average number of distinct 3. 60 relations associated with a given target 5) Average number of sources 1.00 in a given source subset 6) Average number of targets 1.30 in a given target subset Table 6-6. Averages for Heart Disease Problem

250 Therefore, ko = 1 and kO = 1.30. Item 2 of Table 6-6 corresponds 1 10 to the product koko and since k2 = 1, we find that k0 = 3. 50. Item 1 2 4 2 4 represents the product kkO kOkkO and since we know the values of 2 4 6 8 10' all these factors except k%, we can determine that k% = 3. 13. Simi8' 8 larly, item 3 represents the product kOkkk~o7k from which we 1 3 5 9' determine that k = 14. 36. 9 This leaves us with only mr remaining to be determined. Recall p that mr represents the expected number of times the same relation p symbol is associated with a given target. Since kk - 14.36 we, 9' know that associated with each target there are on the average 14. 36 relations which are not necessarily distinct. Item 4 of Table 6-6 indicates, however, that there are an average of only 3. 60 distinct relations associated with a given target. Hence, m =14. 36/3. 60=3.99. r p For convenience all of these results are summarized in Table 6-7. Since our prototype program uses fixed-point arithmetic in most of the low-level routines, we require that the information of Table 6-7 be in integer form. This implies transformation of these results to integer values approximating the original values and satisfying the given constraints. (Clearly, we may not simply round off the values.) Table 6-8 contains one possible choice for the transformed values.

251 k~ = 113 k~ = 113 p kio = 22 r m = 3.99 r p kO = 1 1oo- 1 1 kO = 1 2k~4 = 3. 50 kO = 1 7-1 kO~ = 3.13 k- = 14.36 kO = 1.30 Table 6-7. Summary of Parameter Values for Heart Disease Problem

252 k~ = 113 a k~ = 113 p k~ = 22 r m = 4 r p ko = 1 k1- 1 k~4= 1 kO= 1 3kO = 3 4kOg= 1 70kO -:3 8 kO = 18 9k o= 2 Table 6-8. Transformed Parameters for Heart Disease Problem

253 6. 1. 4 The Operations We are now concerned with estimation of the relative frequencies of the operations to be used in the diagnostic process. In pattern sorting two basic operations appear: namely, (1) determine the diseases associated with a given symptom, and (2) determine the symptoms associated with a given disease. The first of these is used to determine those diseases which may be indicated by the set of symptoms in a given pattern, and the second is then used to determine the intersection of the symptoms associated with each of these diseases with the set of given symptoms. (Note that if each time the first operation is performed the symptom leading to a particular disease is recorded for that disease, then there is no need for the second operation.) Gorry indicates that his program, operating in the sequential diagnosis mode, was given on the average 7 initial symptoms and performed an average of 5. 8 tests (each of which yields one symptom as a result). We may, therefore, assume that the first operation above is performed on the average 12. 8 times in the course of a solution. Since each symptom can indicate on the average 18. 7 diseases (from the symptom-disease matrix), we can probably assume that most, if not all, diseases must be considered for the second operation. We will assume conservatively that 25 diseases are involved and, hence,

254 that the second operation is performed 25 times in the course of a solution. These two operations correspond to our primitive operations Q12 (di*-) and Q14(-*Pk), respectively. Thus, Q12 is performed on the average 12. 8 times and Q14' 25 times. The inference process (i. e., the application of Bayes' rule) also involves two basic operations: (1) determine the a priori probability of a given disease, and (2) determine the conditional probability that a particular symptom is associated with a given disease. Since we have assumed that the a priori probability of a given disease is part of the disease description itself (and, hence, is contained within the disease description block), we need not concern ourselves with the first operation as it cannot affect the structure. The inference function processes each observed symptom against every disease in the pattern stack. If the observed symptom is relevant to at least one disease associated with a given pattern, then Bayes' rule is applied to each disease associated with the pattern. The second operation must therefore be applied once for every such disease and the given observed symptom. Since we have no way of validly estimating the average number of patterns in the pattern stack (although Gorry does assure us that this number should ordinarily be quite limited) but we do know that the

255 process should eventually result in only one pattern (unless the patient has two or more unrelated diseases, which we assume he does not), we will assume that the pattern stack always contains only one pattern. Our next problem is to determine the number of diseases associated with the pattern on the average. Since each symptom can indicate an average of 18. 7 diseases, we will assume that the initial pattern consisting of the first observed symptoms has that many diseases associated with it. At the end of the diagnostic process, Gorry's statistics for this problem show an average of 4. 1 diseases having probabilities greater than a given threshold (0. 01). Let us assume then that the pattern remaining at the end of the diagnosis has 4. 1 diseases associated with it. At the risk of offending the purist let us assume that the average number of diseases associated with a pattern throughout the course of the diagnosis is the average of the average numbers for the initial and final patterns, which is 11. 4. Since on the average 12.8 symptoms are considered in the course of the diagnosis, we can say that the second operation, which corresponds to primitive operation 0 7(di-Pk), is performed (12.8)(11.4) = 145. 9 times. The test s election function determines the set of tests which are relevant to the diseases whose posterior probabilities at the current decision node each exceed a given threshold (i. e,, the most likely

256 diseases so far). Excluded are those tests which may already have been considered. Then for each possible result of each test the inference function is invoked. The resulting probabilities along with the costs of tests and the loss function matrix are used to determine which test should actually be performed. We see then that test selection involves the following basic operations: (1) determine the tests relevant to the given diseases, (2) determine the possible outcomes of each test, (3) determine the cost of each test, (4) determine certain elements of the loss matrix, and the inference function basic operations. Since we have assumed that the cost of a given test is part of the test description, we need not concern ourselves with its determination. Since the loss function matrix is not part of the structure under consideration (it is a matrix), we may ignore determination of its elements also. Thus, we need concern ourselves only with determining the tests and their outcomes and the conditional probabilities that particular symptoms are associated with given disease. Gorry's results indicate that for this problem an average of 350 decision nodes are encountered in the solution process. Let us assume as we did in considering the inference process that there are an average of 11.4 diseases of interest at each decision node. Let us further assume that only 3 of these diseases have posterior probabilities

257 greater than the given threshold. (It would be unlikely that all the diseases are equally probable.) From Tables 6-3 and 6-4 we can determine that there are an average of 19. 3 tests which are pertinent to each disease and each test has an average of 1.7 possible outcomes (symptoms) other than "normal". Then the first of the test selection operations will be performed 3 times at each decision node, or a total of 1050 times. Although each disease has 19. 3 tests associated with it on the average, there may be considerable overlap between the sets of tests for two different diseases. Let us assume that the 3 diseases of interest at each decision node have 22 distinct tests associated with them out of the 30 possible tests. Since some of these may already have been considered (by the end of the solution process a total of 12. 8 will have been considered), let us assume that 15 remain to be considered at any given node. Then the second of the test selection operations will be performed 15 times for each decision node, or a total of 5250 times. Note that these two operations are both of the form of primitive operation Q3 (dirj-). Therefore, Q3will be performed a total of 6300 times during the solution process. Since each test has an average of 1. 7 outcomes, there will be on the average (1. 7) (15) = 25 symptoms to be considered at each decision node, which implies that operation Q7 (of the inference function) must

258 be performed 75 times at each node, or a total of 26, 250 times. Q7 will then be performed a grand total of 26,396 times. To summarize, the following primitive operations are involved in the solution process with the indicated (rounded-off) absolute and relative frequencies: Relative Operation Frequency Frequency Q3 (d. r.) 6300 0.192 Q7 (di-pk) 26396 0.806 Q14(-*Pk) 25 0.001 32734 6. 1. 5 The Environment As a final step before submitting our problem to the solution process and determining the optimal storage structure for it, we must assign values to the unspecified parameters and the external decision variables. As for the external decision variables, let us assume al= 0, a2= 1, P1= 0, P2= 1, and P3 = 0. The choice of these values is largely an arbitrary one, but it does have an intuitive justification. Since the sets A and Ii are not disjoint, we must handle the "wrap-around"' problem by either requiring at least one of (rl and a2 to be non-zero or requiring some 0i to be non-zero. Rather than constrain the 0i's

259 iil this minne r, we choose to set cr2= 1. Since for the problem under consideration there appears to be no advantage to be gained from setting both c1 and a2 to one, we choose to set oa 0. The values assigned to Pl, p2, and p3 are simply the "natural" choices. A relation symbol name appears only once for each use when Pl= 0, P2= 1 and p3= 0, and it is equally accessible from both sources and targets. Consider now the various unspecified parameters. First let us assume that our structure is uniform in nature. That is, let us assume that all pointers in the structure occupy the same size fields and that they all require the same number of time units to follow; let us assume that all stepping operations for stacks require the same number of time units for execution; and so forth. More specifically, let the following conditions be true: f — 1f = hlo= fa =f =F =-F =F 1- ~ 101 -a ap a r p S=S -S= —=S S 1 10 cd =c and vd =v r dr T =T =T a r p U1 -U a p sp sd fl f h h l P10 1 10

260 For further convenience let us also assume that Ta, Tr, and T represent simply the times required to follow pointers and, hence, are equal to Fa, Fr, FpP etc. Not included in the conditions above, but also in need of values are the parameters ud, u s st and S d' r' r' t o' In order to rigorously specify the environment (i.e., the computer) in which the diagnostic system is to reside and operate, we will assume it to be an IBM 360/67, with which we assume the reader to be familiar. With this in mind let us briefly examine the characteristics of each of the quantities to whth we must assign a value. The act of following a pointer simply involves replacing the value of the position indicator (described in Chapter IV) by the contents of the field to which that value points, where this field is assumed, of course, to contain a pointer. By letting the position indicator occupy one of the general registers of the 360/67, this operation can be implemeted via the load instruction (L R1,D2(X21B2)), which has an average execution time of 1. 20,is (microseconds) [34]. Stepping from one field to another in a stack involves incrementing or decrementing the value of the position indicator by the size of the blocks in the stack. Assuming that one of the general registers contains this size, the operation can be implemented via an RR type add or subtract instruction (AR R1, R2 or SR R1, R2, respectively),

261 each of which has an average execution time of 0. 65 As. The comparison operations characterized by cd and cr involve comparing the contents of the field to which the position indicator points with some given quantity and selecting the next operation to be performed on the basis of the outcome of this comparison. If we assume the given quantity is contained a general register, we can use a compare instruction (C R1, D2(X2, B2)) followed by a branch-oncondition instruction (BC M1, D2(X2, B2)) to implement the operation. The compare has an average execution time of 1. 40 ps and the branch on condition has an average execution time between 0. 80 As and 1. 10 /is, depending upon the frequency with which the branch is successful. The "fetch" operations characterized by vd and vr are used to move the contents of the field to which the position indicator points to some other memory location. If one of the general registers contains the address of the location to which the given quantity is to be transferred, we can implement the operation by using a load instruction (L R1,D2(X2,B2)) followed by a store instruction (ST R1,D2(X2,B2)), which have average execution times of 1. 20 As and 0.93' Ms, respectively. Clearly, there is a certain amount of overhead associated with each of these operations which we have not considered - such as, the amount of time involved in program "looping" when an operation

262 or a sequence of operations is to be performed recursively. Our analysis should, however, give us a good estimate of the relative times required to perform each of these operations. If we normalize and round-off the time quantities indicated above, we find that a pointer operation requires 4 time units, a stack operation requires 2 time units, a compare requires 8 time units, and a fetch requires 7 time units. Thus, sf =..=f =h h =f =f =F =F =F =T =T =T =4 ='"=fl =hl'''= a r p a r S= s 1a Sr=sr 2 c c =8 d r Vd=V =7 r Consider now those parameters which pertain to storage. Since an address on the IBM 360 is 24 bits in length, we could assume a pointer field to be 3 bytes in length. For program efficiency, however, we should require the right-hand end of this field to be aligned on a full word boundary. This is best accomplished by using a 4-byte (or full word) field. Therefore, we will assume pointer fields to be 4 bytes in length. The left most (unused) byte can be used for type information. Thus,

263 Sp=Sd= s f =..Sfl =s =. s =4 p F d f lf ii h10 st-=0 By assuming that the source and target ring heads consist simply of a pointer, we also have u =u =4 a p Since 20 of the 22 relations of interest in our diagnostic problem are probabilities, we will assume that the relation symbol name fields of our structure are large enough to contain the actual values of the probabilities to which these relations correspond. In particular, this must be 4 bytes. Therefore, s =4 r We will assume that the relation ring heads consist of a relation symbol name field and pointer. Therefore, u =8 r The description block corresponding to a symptom need contain only the name or a code representing that symptom. For this, a description block consisting of 4 bytes is probably sufficient. On the other hand, since each test has a cost associated with it and each disease has a probability, the description blocks corresponding to them should be on the order of 8 bytes. Let us assume, therefore, that the

264 average description block consists of 6 bytes. Thus, ud=6 Finally, since the IBM 360/67 is a virtual memory machine, we can assume that there is no limit (within reason, or course) to the amount of memory available for our structure. This means that S = 0o 0 We now have a complete description of the environment within which our optimal storage structure must reside. 6. 1. 6 The Solution We are now ready to proceed with determination of the optimal storage structure. In our search for the optimal solution we partition (rather arbitrarily) the set of feasible solutions into a number of families of solutions, one for each distinct combination of values assigned to 01~ 0 010. Each of these so-called 0-families contain all the feasible solutions which have a particular combination of values for 01.** 010. In addition to determining the overall optimal solution (or solutions), we also determine the best solution (or solutions) within each of the 0-families and this information is recorded along with the overall optimum. Appendix E contains a listing of these results for the problem currently under consideration, which we will designate Case 1o

265 Upon inspection of this information we find that there are six equally good solutions for our problem, four of them in 0-family number 10 and two in 0-family number 28. [ The number assigned to a 0-family simply indicates its position in. the sequence of 0-families as they are generated and is used to facilitate referring to individual ones. ] Table 6-9 contains more complete descriptions of these solutions. Let us examine each of the solutions in turn, starting with Solution 1 of 0-family 28. Given the values for the deci sion variables and the values for kl- k10 we can generate the schematic representation of Figure 6- 2 In this schematic each type of block which has not been eliminated from the structure is represented by a single field regardless of the actual number of fields which that block may contain. The doublewalled field at the top of the structure represents a description block. The solid arrows emanating from the b3-blocks and the b4-blocks correspond to the pointers of the a3-rings, and the dashed arrows originating from the b -blocks represent head pointers. The arrows originating from the b7-blocks correspond to a6-ring head pointers and, of course, the arrows from the bll-blocks represent description block indicators. The source, relation, and target rings - although present - are not shown.

266 0- family: 10 Solution 1 01*~010' 0000100000 1...* 0 0000110000 1 10 2-''10: 001000 1 1 1 kl kO 1 3 1 1 1 6 1 1 1 1 Solution 2 010. 10:~ 0000100000 0-''0 0000100100 1 i^: 1011011011 2..10: 00 1 0 0 1 0 0 1 k.kl 1 3 1 3 1 1 1161 1 1 6-9. Case Optimal Solutions Table 6-9. Case 1 Optimal Solutions

267 Solution 3 01* ~10 0000 100000 1.. 10 {"'0*{0' ~ 0000 1 1 0000 Al alo: 110000111 1 10 Solution 4 01"'010: 0000 1 00 00 0'' * 0 0 0 0 1 0 0 1 00oo 1.. A10 1 1 1 00 1 1 01 1 [2....10: 1 1 000 1 001 kTable 6-9 Casek10 1 Optima 1 6 1 1 Table 6-A9o C ase 1 Optimal Solultions (C ont.)

268 0-family: 28 Solution 1 010... 10: 00 10000000 01o*o 010 0 0 1 0 0 1 0 0 0 0 A1 al o 1 0 0 1 1 o 1 1 1 1 2...Z 10 0 0 0 1 0 0 1 1 1 k oo-k ~ 1 3 1 I 1 6 1 1 1 1 Solution 2 $1'0-10: 0010000000 A1 A10 01111011 k' 0: 3 001 001 6 1 k1 10 3 11 11 1 6 1 Table 6-9. Case 1 Optimal Solutions (Conto)

269 Items Common to All Six Solutions m =1 m =18 m =1 m =1 a p r1 r2 Operation Method Cost AA Q3: dirj l(d r p) t3 1 =116.00 A A A Q12 di l(d p) t12 1 =27200 A Q14: -*k l(p d) t14 1=530.00 T = 252. 96 S -33398 Table 6-9. Case 1 Optimal Solutions (Cont.)

270 b1 ba _ b3.. b. 4 ~~bI' I I I Ib6 b7I b7 I I-b7 lI,l -i7 Il Pi13 b7 b7 bl I Pi, I b7 I I.I I bl i I p b6, i3 b b 6 _ 9__ I' ib6 _ __ ~b7 ) F, — b b -- b7 b Pi2 bll - - ---— I Pi1 -b7 — b7 -— b7 O'. Pi pIIi F iuli6 Pi12 -io Pl18 Figure 6-2. Schematic of Solution 1 for Case 1

271 Since k3= 1, the a3-ring head pointer might appear to be superfluous - the forward pointer from the b3-block to the b4-block appears to perform the same function - but in general this is not the case. We earlier assumed that a head pointer points to the next item of direct interest as we sequence through the structure. In this case the next item of interest when following the head pointer or moving in that direction is the b6-block (and its relation symbol name field). On the other hand the a3-ring forward pointer simply points to the next forward pointer of the ring; in order to access the corresponding b6-block we must step from this pointer field to the b6-block at a cost of so time units. It should be clear then that the a3-ring head pointer performs a function distinct from the forward pointer which parallels it, If we assume, however, that the environment in which the structure is implemented is a computer utilizing the base-displacement form of addressing (ala the IBM 360), then the forward pointer can perform the same function as the head pointer (in addition to its original function), in which case the head pointer may be omitted. (This would not be true, of course, if k3 were greater than 1.) If desired, we could include in our procedure the facility for eliminating from those rings of the storage structure for which k = 1 any head pointers which might otherwise arise, We do not choose to do so here and, hence, will maintain the structure as given.

272 Replacing each block by the fields it contains yields the actual storage structure which appears in Figure 6-3 sans source, relation, and target ring pointers. Since the methods used to implement operations Q3, Q7, and Q12 use the source ring as starting points and the method used to implement operation Q14 uses the target rings (none of the operations uses the relation rings) and since the b6-blocks contain a relation symbol name field (hence, the relation rings need never be used to determine the relation represented by a given block), the relation rings are not really needed in the structure. We may assume, therefore, that they are not present. Furthermore, since ma= 1, it is not necessary for the source rings to be closed loops. That is, a b -block need not contain a pointer back to the head of the corresponding source ring. Again, we could include in our procedure the steps necessary to automatically account for these contingencies. We did not feel that the effort required to do so for this and other such cases was warranted for a prototype program, however. Thus, the only addition required to make Figure 6-3 more accurately reflect the actual storage structure is the inclusion of a target ring pointer for every description block indicator. (We leave this to the imagination of the reader.)

273.:d, i rj r,,, Pi -Pi Pi I-P1 Pi ~~~, 1 ~~~~~_ 1 Pi3 I — ci P15 r,,__.1 ~~~~_._...Pi4 PIlo "pi16 "_ - Pi5pl I -1 " —Pi6 — o~o 3 I6 Pi12i18 Figure 6-3. Optimal Storage Structure for Case 1

274 Taking omission of the relation rings and the indicated source ring pointers into account results in the reduction of the storage cost of the storage structure to 31414 units. Now let us examine Solution 2 of 0-family 28. This solution has the schematic representation of Figure 6-4. We note that this schematic is the same as that of Figure 6-2. with the exception that each b6-block of Figure 6-2 is replaced by a b6-block - b8-block pair and each b7block is replaced by a bg-block. As a result, the physical structure of this solution is identical to that of the first solution considered. In fact, this is true for the four solutions of 0-family 10, also' Thus, although each of the six given solutions has a unique specification, they all result in the same physical storage structure. 6. 1. 7 Justification of the Solution We will now present an intuitive justification for the solution we have obtained. Because of their relative weights, operations Q3 and Q7 (especially operation Q7) would be expected to influence the form of the structure considerably more than operations Q12 and Q14' Of the three alternative methods available for operation Q3 - AA AA AA namely, d r p, r d p, and p r d - we would expect either of the first two to be preferable to the third on the grounds that fewer possible

275 di bj b2, —-—... —"-' —--- b3 b3 I~~~~~~~~~~ L, -b3....b3 b4 -t — b44 ae I I(beI,~ l beI I"t-b l I I' ( l b6 lb6 l b6l bg b bil-P I P =bil — ~i I b Ibg w'Pi7 bgP113 91l P7-t P9 bi- Pi? blD' Pi8 bg bg bg bil- i p blI P-Pi bI I p 5 bil -'Pill b- IP b- b8 bg 11 " Pi P I -P'- LP i6 bll2 -1 —Pib Figure 6-4. Schematic of Solution 2 for Case 1

276 solutions must be considered when using one of them than when using the thirdo By the same token, since the number of distinct relations which apply to a given source is less than the number of sources associated with a given relation, we would expect the first method to be preferable to the second. Similarly, of the four methods available for operation Q7 - A AA A A A A A d r p, p r d, r d p, and r p d - we would expect the first two to be preferable to the last two. If operation Q7 were considered by itself, either of the first two methods would probably serve equally well. However, since our choice of methods for operation Q3 favors starting with sources, we are encouraged to choose the first method for operation Q7. In light of our choices of methods for operation Q3 and Q7, we A A would certainly prefer method d p to method p d for operation Q12. In view of our choices so far we might be tempted to choose method A A d p over method p d for operation Q14' but the latter is probably preferable in all but the most extreme cases and so we choose it. In any event since the relative weight of operation Q14 is so small, our decision here probably willnot have a very great effect upon the form of the storage structure anyway. (We note that it is responsible for inclusion of the a6-ring head pointers, however.) Solution of our problem does, of course, support our intuitive

277 choices of methods. Let us examine the structure of Figure 6-3 given our choice of methods for the various operations. In order for operation Q3 to be as efficient as possible, the relations which apply to a given source should be easily accessible from that source. At the same time to promote efficiency for operation Q7, the targets (as well as the relations) associated with a given source should be easily accessible. Intuitively one might feel that duplication of the b4-blocks upon the b3-blocks should render operation Q7 somewhat more efficient than the rings of our given storage structure. In this case access to all targets could be made by stepping through a single stack. In the context of our current problem there are two pitfalls to this approach: First, the performance of operation Q3 suffers in this case because we must step through a stack of description block indicators in order to access one relation symbol name field from another. Second, for operation Q7 itself we must step all the way through a stack of description block indicators (to reach the next relation symbol) even though the target of interest may already have been found. With the structure of Figure 6-3 this is not necessary. Thus, the rings of our structure would appear to be the logical choice. Once a given relation symbol has been found, a stack of description block indicators (as contained in the given structure) would appear to be

278 the best choice for both operations Q3 and Q7. Because of the similarlity between operation Q12 and operations Q3 and Q7 and because of the small relative weight of operation Q12' the given structure should render it relatively efficient, also. In spite of the fact that there are on the average 18 b 1-blocks for each target in our structure, operation Q14 is still relatively efficiento The head pointers from the b7-blocks to the b -blocks and the explicit a3-rings both contribute to making access of the sources associated with a given target very fast. We note that duplication of the b4-blocks upon the b3-blocks would be a detriment to this operation, also. Thus, the storage structure of Figure 6-3 is appealing even on an intuitive basis. 6. 1.8 Comparison with an Existing Storage Structure In order to put the solution we have obtained into proper perspective let us compare it with the scheme used by Gorry in his implementation of the diagnostic system. Gorry's storage structure consists of a number of SLIP lists which are maintained and interrogated through use of the SLIP language. Although he concedes that this structure may at times result in inefficient utilization of the computer memory, he maintains that this

279 disadvantage was more than offset by the convenience of programming in the high-level SLIP language. Since he was concerned with implementing a prototype system, we must agree wholeheartedly with this philosophy. In his structure Gorry defines a basic information block (our data item) to be either a state (i.e., a disease), an attribute (i.e., a symptom), or a test. Each of these basic blocks is then represented by a SLIP list. The result is a number of state lists, attribute lists, and test lists. A state list, as shown in Figure 6-5, consists of a ring of attribute list names (i. e, pointers to the heads of certain attribute lists), one for every attribute relevant to the state, and a description list (or DLIST) which contains the a priori probability of the given state and the print name of the state. An attribute list, as shown in Figure 6-6, consists of a ring of test list names corresponding to those tests which can result in the given attribute and a DLIST which contains the print name of the attribute and a pointer to a member list. The member list contains the list name of each state list on which the name of the attribute list appears and the corresponding probability of the attribute given the state. Finally, a test list, which appears in Figure 6-7, contains the cost

To Attribute Lists DLIST PROB 0.0 PNAME NEU Figure 6-5. State List

To Test Lists DUJrST PNAM FEVE MEM R Member List l l | I 0.05 1 l l | 0.80 To State Lists Figure 6-6. Attribute List

I00 Cost DLIST PNAME XRAY MEMBE.RRII Me mber List To Attribute Lists Figure 6-7. Test List

283 of the test and a DLIST. The DLIST contains the print name of the test. and a member list for the attribute lists which include this test. For the heart disease problem which we have been considering each state list will contain an average of 28. 4 attribute list names; each attribute list will contain 1 test list name and its member list will contain an average of 18. 7 pairs of state list names and probabilities; and each test list will have an average of 1. 7 attribute list names in its member list. Using this information and the description of the environment used in determining our optimal storage structure, let us determine the cost of each of the basic operations involved in the diagnostic process. To reiterate briefly, the basic operations are: (1) determine the diseases associated with a given symptom, (2) determine the symptoms associated with a given disease, (3) determine the conditional probability that a particular symptom is associated with a given disease, (4) determine the tests relevant to a given disease, and (5) determine the possible outcomes of a given test. The first of these operations involves finding the attribute list corresponding to the given symptom and recording the list name of each state list on its member list. (The print name of each disease is irrelevant at this point since the results of this operation are simply used as the starting points for the second operation.) Finding the head

284 of the attribute list requires TaFa units of time. Accessing the head of the corresponding member list involves moving to the DLIST at a cost of so + f time units (where f represents the time cost to follow a pointer), stepping through the DLIST at a cost of 4f time units (ignoring the cost of any comparisons which must be made), and finally moving to the head of the member list at a cost of so+ f time units. Recording the list name of each of the state lists then involves stepping through the member list at a cost of 18. 7 (2f) and fetching each of the state list names at a cost of s plus the actual cost of a fetch (which we will represent by v and assume requires 7 time units). Summing these incremental costs yields a time cost of 353.9 for this operation. The second operation starts at the head of a state list and records the print name of each attribute on the list. The time cost for this operation is given by the expression 28.4 (f + s + 4f + s + v) where f time units are required to access an element of the state list (from the previous one), so units are required to move to the corresponding attribute list name, 4f units are required to access the print name block on the DLIST of the given attribute list, and sn + v units are required to fetch the print name. As a result this operation has a

285 time cost of 880.4 time units. The third operation - determining the conditional probability of a symptom given a disease - involves finding the attribute list corresponding to the given symptom, searching the associated member list for the given disease, and recording the corresponding probability. To the point of accessing the member list this operation is identical to the first operation and, hence, requires T +F + s + sf+ s +f a a o f o time units to reach this point in the structure. On the average we would expect to examine (18. 7 + 1)/2 state list entries of the member list before finding the given disease. For each one examined we must access the member list element block (from the previous one) at a cost of f time units, access the head of the corresponding state list at a cost of so + f, then access and compare the print name at a cost of 5f + s + c (where c represents the cost of a comparison and has the value 8), and finally move to the member list element block containing the value of the probability. For the last state list considered we must fetch the value of the probability, which costs so + v time units. The resultant time cost for this operation is 437. 0 time units. To perform the fourth operation - determination of the tests relevant to a given disease - we must access the state list representing

286 the disease and access in turn the associated attribute lists, which contain the list names of the pertinent tests. This requires Ta + Fa time units to access the head of the state list, f + so + f units to access each of the 28.4 attribute lists, f + s + f units to access the corresponding test list, and 3f + so + v time units to fetch the print name of the test. The time cost for the operation is therefore 1172. 4 time units. Finally, the fifth operation - determining the possible outcomes of a test - involves fetching the print names of the attribute associated with the given test list. To do this we must access the me mber list for the given test at a cost of Ta + Fa + 5f + so + f and access the print name of each of the 1. 7 attribute lists thereon at a cost of 1. 7 (f + so + 4f + so + v). The resultant time cost is 86. 7 tirre units. In order to make a fairly coarse comparison of the SLIP structure with the optimal structure, let us compare these time costs with those determined for the primitive operations. Operations 1, 2, and 3 correspond to primitive operations Q12' Q14' and Q7, respectively, and operations 4 and 5 correspond to primitive operation Q3. Table 6-10 contains a summary of the time costs for both sets of operations, where the costs of operations 4 and 5 have been weighted according to their relative frequencies (operation 5 is used five times as much as operation 4) and combined to facilitate comparison with the time cost for primitive operation Q3, The corresponding values of the weighted

287 SLIP Structure Optimal Structure Relative Operation Cost Frequency Operation Cost 1 353.9 0.001 Q12 272.00 2 880.4 0.001 Q4 530.00 3 437.0 0.806 Q7 285.22 4 & 5 267.6 0.192 Q3 116.00 T =404. 8 T = 252.96 Table 6-10. Operation Time Costs for the SLIP Structure versus the Optimal Structure

288 time cost function also appear in Table 6-10. We see that the time cost T for the SLIP structure is 60 percent greater than that for the optimal structure. In order to achieve a somewhat better comparison of the two structures let us determine the cost of each of the five basic operations using the optimal storage structure. There are two reasons why computing the costs of the five basic operations for the optimal structure (instead of using the costs determined for the primitive operations in obtaining the optimal structure) should give us a better basis for comparison. First, in certain cases our primitive operations are only approximations to the corresponding basic operations which they ostensibly represent. For example, we note that primitive operation Q7 is really more general than operation 3, its counterpart. In particular, Q7 examines all relations associated with a given symptom (acting as a source) to determine whether or not the given disease is a possible target. In practice we know that there is only one such relation and once it has been found all others may be disregarded. This mismatch of operations could be easily overcome, of course, by introducing another primitive operation which searches for exactly one relation (possibly the first encountered out of many, or else the only one) which associates a given target with a given source. Adopting such a

289 posture for all similar cases might result in a plethora of primitive operations, however. In addition, the added effort required to derive time cost expressions for these additional primitive operations might not be rewarded by proportionate increases in the accuracy of the optimal structure obtained. The second reason for computing the basic operation costs for the optimal structure is that each of these time costs may reflect the actual (average) numbers of blocks we may expect to encounter in the structure for the relations with which the corresponding operation is intimately concerned. The primitive operation time costs which we are given, on the other hand, reflect the average over all relations. Let us proceed, therefore, with calculations of the time costs for the five basic diagnosis operations, using the optimal storage structure. The first operation - determining the diseases associated with a given symptom - involves finding all the description blocks for diseases associated with a given symptom. To perform this operation we utilize that portion of the structure representing the relations rl* r0 which have symptoms as sources and diseases as targets. We must find the description block representing the given symptom, step through a stack of on the average 6. 6 a3-rings (using the notation of Solution 1 for 0-family 28), and for each of these step through a stack of 2. 8 bll-blocks to access a total of 18. 7 description block indicators.

290 Accessing the symptom description block requires Ta + Fa time units, stepping through the a 3-rings and following their respective head pointers to the stacks of description block indicators requires 6. 6 (so + f) time units, and recording the various disease description block indicators requires 18. 7 (so + v) time units. The resultant time cost is 215. 9 time units. The second operation - determining the symptoms associated with a given disease - also relies upon that portion of the storage structure which represents relations r.. * r0. For this operation, however, we must access (via the appropriate target ring) all b11-blocks representing a given disease and move upward through the structure to access the corresponding symptom description blocks. For this case mp will have an average value of 28. 4. Therefore, accessing all bll-blocks representing a given disease will require Ta + 29. 4Fa time units. Since the stack of b -blocks contains on the average 6.6 blocks, we would expect to step through (6. 6+1)/2 of these on the way to the symptom description block e Thus, for each bll-block encountered the time cost required to access the corresponding symptom description block will be so+ f + so+ f + 3. 8so The total time required for this operation will then be 678. 2 time units. Note that if the value of k0 had actually been 6. 6 instead of 3 the optimal storage structure would have contained head pointers to facilitate this operation.

291 The third operation - determination of the conditional probability that a symptom is associated with a given disease - is very similar in implementation to the first operation. In this case, however, since we may halt our transversal of the structure once we have found the particular disease sought, we need consider an average of only (6. 6+1)/2 a3-rings and (18. 7+1)/2 disease description blocks. Thus, accessing the symptom description block requires Ta+ Fa time units, stepping through the stack of a3-rings and following their respective head pointers requires 3. 8(so+ f) time units, examining the various disease description blocks requires 9. 8 (s + f + c) time units, and fetching the value of the probability relating the given symptom and the given disease requires v time units. The time cost of the third operation is therefore 175.0 time units. Operation 4 - determining the tests relevant to a given disease - uses the portion of the storage structure which represents the relation r1, which has diseases as sources and their relevant tests as targets. This operation involves accessing a disease description block at a cost of Ta + Fa time units, stepping through a stack containing a single a3-ring and accessing the corresponding stack of b l-blocks at a cost of so + f, and fetching the tests indicated by the test description block idicators in this stack (19.3 of them on the average) at a cost of 19.3 (so+ f+ v). This results in a time cost of 264.9 time units.

292 Operation 5 - determining the possible outcomes of a given test - uses that part of the structure representing the relation r2, which has tests as sources and symptoms as targets. Its implementation is identical to that for the fourth operation, but since the stack of bll blocks contains on the average only 1. 7 blocks, its time cost is 36. 1 time units. Table 6-11 contains a summary of these time costs and the overall time cost in comparison with the corresponding quantities for the SLIP structure. On the basis of this comparison we see that the time cost T for the SLIP structure is 2. 6 times greater than that for the optimal structure. Let us also compare the storage costs of the two structures under consideration. Assuming that each field of a SLIP list block requires 4 bytes of storage, each block then requires 12 bytes. Summing the number of blocks required to implement a list over 33 state lists, 50 attribute lists, and 30 test lists results in a total storage cost of 44355 units, which is 41 percent greater than the 31414 units required by the optimal storage structure. Finally, let us investigate how the SLIP structure might be described by our storage structure model, and then determine the time and storage costs for the diagnostic problem using this structure and the required primitive operations.

293 Operation Time Cost SLIP Structure Optimal Structure 1 353.9 215.9 2 880.4 678.2 3 437.0 175.0 4 1172.4 264. 9 5 86. 7 36.1 T =404.8 T =156.2 Table 6-11. Basic Operation Time Costs for the SLIP Structure Versus the Optimal Structure

294 If we examine Figures 6-5, 6-6, and 6-7 closely, we see that the state, attribute, and test lists are all basically the structure in Figure 6-8. In particular, for the state list if we assume that the head of the list and the DLIST may be combined into the description block (the double-walled block) of Figure 6-8, then except for the presence of relation symbol name fields the remainder of the structure is identical to that of Figure 6-8. If we ignore for the moment the associated member list, then the same can be said for the attribute list. If we now combine each pair of blocks in the member list into a single block like those in Figure 6-8 and if we assume that the head of the member list and the DLIST are combined, we have for the member list exactly the structure of Figure 6-8. Thus, the attribute list can be considered to consist of a single description block with two associated rings like that of Figure 6-8. If the test list itself, its DLIST, and the head of its associated member list are combined into a single block, again except for the relation symbol name fields we have the structure of Figure 6-8. Since each of the state lists, attribute lists, and test list member lists may be viewed as representing all the relation instances having a given source for a single particular relation, we may use the

295'i'i~~~d X — L - 1 Pk, - Pk Pk Fu2 6 Figure 6-8. Basic SLIP Structure

296 structure of Figure 6-8 (including the relation symbol name fields) to represent each of these lists and assume that for each list all of the relation symbol name fields contain the same relation symbol name and occupy zero storage units. For the attribute list member list the relation symbol name fields may contain the various conditional probabilities in a situation analogous to the representation of our relations r * * ro0. Inl speaking of a single homogeneous structure 1 20" for the representation of all the lists we may then use the structure of Figure 6-8 and specify the size of the relation symbol name field as being the average of those for the various lists (i.e., sr = 2 bytes). It should be clear that within the context of the SSM the structure of Figure 6-8 may have the description contained in Table 6-12, where we have ignored the presence of reverse pointers. Before determining values for the unspecified parameters k, k8 4' 8' and k~ let us determine the values of kO k, and kO. We would nor9 a' p' r mally say that the number of sources is equal to the number of states plus the nunmber of attributes plus the number of tests (ioe., kOa= 113). However, since we wish to associate two rings with each attribute, we see that the attributes have two distinct roles in the structure and as a result we will count each one twice so that k~ = 163. (We might have a two source rings associated with each attribute, but only a single description block. ) We will count each attribute as a target only once

297 01f''01o0 000 1 000000 0 0 0 0 0 0 0 0 0 0 0 0 1 10o 1.. 10 1 1 1 0 1 1 1 1 1 1 /2' -'/10' 110001111 (1 = 0 2= P1 =0 P2=1 p3 =0 k~ = 1 k~ = 1 4 k0 =1 7 - 8:? kO =? kO - 1 A question mark indicates an 10 unspecified value greater than or equal to 1. Table 6-12. Specifications for the Basic SLIP Structure

298 since there is no dichotomy of roles here, so k~ = 113. Finally, p k~ = 23 twenty relations corresponding to the distinct probabilities r and one each for the state list, attribute list, and test member list functions. Let us now consider k, k8 and k k0 corresponds to the 4' 8' 9* 4 average number of relation symbols for each of the state lists, attribute lists, attribute member lists, and test member lists. We can easily determine, therefore, that k0 = 2. 7. ko represents the average number of targets for each sourcerelation symbol pair and, hence, will have the value k0 = 4. 4 Since the expression o o o o kO =kO Ok k kOkk k2k4k 8k 10 a 1 3 9 p must be satisfied, it follows that k -= k4k~8kA/k 9 8 a p which yields the result k0 = 17. 1. Because of the constraint imposed by our program requiring these parameters to be integer valued, let us assign the following values to them: kO =150 k~ =100 k~ =23 a p r k(~ ='3 kk =4 k =-18 k0- 8 9

299 Finally, let us assume mr = 4. (This should be approximately the p same as the value derived for m in specifying the values of parap meters for our derivation of the optimal structure and must be integer valued, also, so we choose the value 4.) As a last step before computing the measures of performance for this storage structure we must determine the primitive operations to be used and their relative frequencies. Basic operation 1 corresponds to applying primitive operation Q12 to the member list associated with a given attribute. Operation 2 may be accomplished by applying primitive operation Q3 to the proper state list. Operation 3 may be reresented by primitive operation Q7. applied to an attribute member list. Operation 4 involves applying primitive operation Q3 recursively - first to a given state list and then to each of the attribute lists indicated by the first application. Lastly, operation 5 corresponds to primitive operation Q3 as applied to the proper test member list. It follows that the relative frequencies of primitive operations Q3, Q7, and Q12 are as given below: Operation Frequency Relative Frequency Q3(dr-) 35095 0.570 Q7(di-Pk) 26396 0.429 Q12(di* -) 13 0001 61504

300 If we now use our program to compute the time and storage costs for the basic SLIP structure under consideration (where the environment is assumed the same as that used in determining our optimal structure), we find that T = 248. 00 S = 35684 At first glance it might appear as though this structure is superior to our so-called optimal structure' Indeed, it is true that the average time cost for the performance of a single operation (which is essentially what the time cost function T represents) for this structure is less than that for our optimal structure (comparing the two computer determined time costs). It is also true, however, that more operations must be performed for this structure than for the optimal structure in order to achieve the same result. In particular we see from our calculations of the relative frequencies of the primitive operations applied to each of the storage structures that whereas the optimal structure requires the performance of 32734 operations for the "solution" of the diagnostic problem, the basic SLIP structure requires the performance of 61504 operations. If we normalize the time cost T for the SLIP structure to 32734 operations (i.e., if we multiply this time cost by the ratio 61504/32734) we find that the equivalent time cost is 466. 24, Thus,

301 the time cost for the basic SLIP structure of Figure 6-8 is really 1. 8 times thlat of the optimal storage structure when compared with the program determined time cost (252. 96) for the optimal structure. To complete this comparison we might correct the storage cost given above to account for the additional attribute description blocks and the missing reverse pointers. To wit, S = 43184 storage units, which is 37 percent greater than that required by the optimal structure. We might also find it instructive to compare the time and storage costs for the basic SLIP structure with those for Gorry's SLIP structure. There should be a high correlation between the values obtained for the two structures. If we compare the normalized time cost for the basic SLIP structure as we have computed. it here (466. 24) with the time cost we have determined for Gorry's SLIP structure (404. 8), we see that the time cost for the basic SLIP structure is 15 percent greater than that for Gorry's structure. This difference can probably be attributed to the fact that primitive operation Q7 is more general than its counterpart, operation 3. (We discussed this matter somewhat earlier.) If we now compare the storage cost for the basic SLIP structure (43184) with the for Gorry's SLIP structure (44355), we find that the basic structure has a storage cost within 3 percent of that for Gorry's structures

302 These results would tend to imply that even though our model uses average values to characterize a particular data structure and its associated storage structure, the model may be used to determine a good estimate of the time and storage costs for existing or proposed structures, in addition to determining the optimal storage structure for a given application. The conclusion which we wish to draw from the foregoing comparisons is quite simple: Gorry's SLIP storage structure, although presumably designed to be relatively efficient in operation, can be significantly surpassed in performance. Perhaps we do Gorry an injustice by making this statement, for he does admit to possible inefficient utilization of memory (clearly, for this application the reverse pointers of the SLIP structure are not required). In any event, his structure gives us a reference point for determining the relative improvement in performance we might expect by using a systematic procedure and a rigorous framework in the design of data representation schemes. 6. 2 Near - Optimum Solutions Our next goal is to examine the relative sharpness of the valley in which the optimal solution resides in order to determine just how critical it is that we find the optimal solution, In other words we want to determine whether or not there are other solutions which in

303 practice could serve as well as the optimum, Figure 6-9 contains a plot of the time costs of the best solutions for various 0-families. We see from this plot that there are a number of solutions which have time costs close to that of the optimal solution (0-families 10;.and 28). In particular, 0-families 4, 19, 37, 46, and 55 all contain solutions which have time costs within 3 percent of the time cost for the optimal solution*. This would tend to confirm the hypothesis that there is no absolute optimal solution, but rather a spectrum of optimal or nearly optimal solutions. In order to obtain some feel for the "variance" we can expect from one solution to another in this spectrum of solutions, let us examine briefly the best solution for each of the 0-families indicated above. Descriptions of these solutions are contained in Table 6-13 in order of increasing time cost. Upon close inspection of these various solutions we find in each case that the two solutions in a 0-family result in the same physical storage structure. Moreover, the solutions for 0-family 19 and 0-family 55 are identical. Figures 6-10 through 6-13 contain schematics for the solutions of 0-families 19 (and 55), 4, 37, and 46, respectively. The best solution for each of these 0-families has a time cost within 3 percent of that for the optimal solution. It is possible, of course, that these 0-families might contain other solutions within this range as well.

00 O 0 0 0 00 0( A::: C3~ It 4l I, Cl e 04~~~~~~~~ S * A e*@ssA * Se S B V * * S * e * e e * * S. v@ ~ ~ ~ ~ ~ ~ ~ 5 6 on cc OOc I ~ ~ ~~~ o a ~~~~~~~~ ~ Q) 3I (0I( 0 0 0 LM 0 00 Oa! 00 10.00 20.00 30,00 140.00 50.00 70.00 80.00 90,00 p-family Figure 6-9. Time Costs by ~-family for Case 1

305 0-family: 19 Solution 1 000 1 000000 01" 0101 11 01'10 10 1 1 1 0 1 0 1 1 1 1 1 10*1 02" ~ 10: 110000111 kl. klo 1 1 1 3 1 6 1 1 1 1 Solution 2 01.010: 0001000000 ~'' ~ 000 1 0 0 0 1 0 0 A 10: 1 1 1 0 1 1 11 0 1 1 P2 P:10 11 0 1 1 k...klO 1 1 1 3 1 1 1 6 1 1 1 10 Common Items m =1 m =18 m =1 m =1 a p r1 r2 t3 1= 118 00 -t7 1=289. 22 T = 256 42 t12 1=270.00 S 32494 t14,1 386.00 Table 6-13. Near Optimal Solutions for Case 1

306 0-family: 55 Solution 1 1..."10- 0 1 00000000 0100010000olooooooo 02...: 10 1100111 kl O.kl: 1 3 1 1 1 6 1 1 1 1 Solution 2 0100000000 01"' 010 ~ 1 0 0 0 0 0 0 0 0 0~''* 01000001 00 A110 1 0 1 1 1 1 1 0 1 1 0B2~ 10: 1 1 0 10 0 1 k * k10: 1 3 11 1 1 6 1 1 Common Items m =1 m =18 m =1 m =1 a P r1 r2 t3 118.00 t = 289. 22 T =256.42 t1 21=270.00 S = 32494 t14 1=386.00 Table 6-13. Near Optimal Solutions for Case 1 (Cont.)

307 0-family: 4 Solution 1 0i..0i~: 0000001000 101111 0011 10 0 1 1 0 0 0 0 1 P.0 0011000010 klo.klo' 1 3 1 1 1 1 1 6 1 1 Solution 2 01'.-010: 0000001000 1i@@@0io o o o o o 1 1 o o 1 10' 02 10 1 1 0 0 0 0 0 0 1 k1 k10 1 1 1 3 1 1 1 6 1 Common Items m =1 m =18 m -1 m =1 a p r1 - r2 t3 1 =114.00 t =291.22 T -= 257. 41 7,1 t12 1=2722.00 S =33398 t14,1 =530.00 Table 6-13. Near Optimal Solutions for Case 1 (Cont.)

308 0-family: 37 Solution 1 0 I: 001 0100000 1 10 2 O10 10000011 P2...010. 100000111 kl..klO: 1 1 1 3 1 6 1 1 1 1 Solution 2 0!. 0io: 0010100000 1 10* A ~ A 10 1 1 0 0 0 1 1 0 1 1 2 P10i 100001001 k1..k10: 1 1 1 3 1 1 1 6 1 1 Common Items m =1 m =18 m =1 m =1 a P r1 r2 t 1= 122.00 3,1 t 1= 291.22 T= 259.1-4 t1 2,1= 278.00 S =34754 t14,1=710.00 Table 6-13. Near Optimal Solutions for Case 1 (Cont.)

309 0-family: 46 Solution 1 01f O 010: 0011010000 1 10' {~''O0 110101111 1 A 10 1 1 0 0 1 0 1 1 1 1 02~.10' 1 0 0 0 0 0 1 1 1 kl o'k10 1 1 1 3 1 6 1 1 1 1 Solution 2 01.010 ~00 1 1 000000 1'' 0 0 0 1 10 0 0 1 0 0 A'~ 0A 1100111011 1 10 1-2* * 10 1 1 0 0 1 1 1 0 1 1 f2.@ ~ ~: 10 00001001 kl..k10: 1 1 1 3 1 1 1 6 1 1 Common Items m =1 m =18 m =1 m =1 a r1 r2 tg 1 = 122.00 t7 1 = 293. 22 T = 260. 60 t1 = 274.00 S = 33850 12,1 = t14 1 = 566.00 Table 6-13. Near Optimal Solutions for Case 1 (Cont )

310 di I.....i i I I 40Z. Pi P Pi14 3~~ 9C 1~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In L ii I' t-Pig A I- i,_ ~pi2l 1 Pit) I I — + —-~Pi8O I — $ —~Pil4 ~PI3 l —' Pi 9 ---— Pil5 - ~Pi4, I I- Pi I 0 -'Pi 16 t~~~.Pi5PIIl5Pl aPi6 -p-Pi 12 Pi 18 Figure 6-10. Best Solution for 0-families 19 and 55 for Case 1

311 rjt P- ~ T Pii I I - ~~Pi7'-Pi 13 pi — P _,_ - 14 Pi —-' — Pi 15 P4 — "Pilo — " Pi 16 In~~~~~~~~'' -'-' i bPi5 I I' I1 —'-" Pi4 l —-'i7 1 Pilo Pi6 I 1 i 1 Pi 18 Figure 6-11. Best Solution for 0-family 4 for Case I

, Ib[. Q... a. a) c12 U a ~ ~~~~~~~~~~~~~~~~~~~~C1 ____' X X X AL p4-4 ~~~~~~~~~~~~~~~~~I Fr~~~~~~~~~~~~~~~~~~~~~~r'-4~~~~~ I I 0t111 111 111 1.r4 _[m 00Q hi0 5~~~~~~~~~~~~~~~ Ll. A- tl. IX,. Ie. 11 m C,) L~~~~~~~~~~~~ ~ )t t t t I I a) I I =1 i I I II I I., -. 1' 1 I fi cI&.) h.p n q r) O p

313 3i I I ~ —CI rje 1'Pi? I I I - Pig I 8 _P P 14 t*Pi4 Pi Pi 16 Pi1P I i 17Pi 1pi~~ 6 12a pi PI21 Figure 6-13. Best Solution for.-family 46 for Case 1

314 If we examine the structures of Figures 6-10 through 6-13 and compare them with the optimal structure of Figure 6-3, we can con clude that although there may be a number of optimal or nearly optimal solutions, all of these solutions tend to cluster about the optimal solution- That is, all of these structures are basically the same, differing only in relatively minor respects. In particular, the storage structures which we have considered here all contain a stack of target description block indicators for each relation symbol associated with a given source. Furthermore, these stacks of description block indicators and their respective relation symbols are associated with a given source via a ring or stack of rings such that the overall storage structure takes the form of a downward branching tree (i. e., from source to target). 6.3 Sensitivity Analysis Next we wish to determine the sensitivity of our optimal solution to variations in the problem under consideration. In particular, we wish to determine how sensitive our solution is to variations in k -..k O and to variations in the relative frequencies of the given 1 10 prmitive operations. Since the values of these parameters must generally be determined by estimation, they are subject to the errors inherent in any estimation process. We would hope, therefore, that the solution is relatively insensitive tominor variations in these values.

315 6. 3~ 1 Variation in kO.. kO 1 10 Let us first consider the parameters k... kkO. Recall that the values used thus far in the solution of the diagnostic problem (i.e., in Case 1) are integer approximations to the values we determined for the heart disease problem. (See Tables 6-7 and 6-8.) Table 6-14 contains another equally valid set of values for k... ko in approximation of the values given in Table 6-7. Let us now determine the optimal solution using the values given for kO. oko in Table 6-14 and keeping the values of all other parameters (including the external decision variables) unchanged. Applying our program to these data, which we will designate Case 2, yields the solution listing contained in Appendix F. Figure 6-14 contains a plot of the time costs for the best solutions in each of the various 0-families and Table 6-15 contains descriptions of the optimal and near optimal (within 4 percent of the optimum) solutions in order of increasing time cost. As was the case before, method 1 is the best choice for each of the primitive operations for all solutions. As in the previous case, 0-families 28 and 10 contain the optimal solutions and 0-families 4, 19, 37, 46, and 55 contain the near optimal solutions with the same relative ordering. In addition, 0-family 1 also contains a near optimal solution. Once again we find that all of the (best) solutions within a given

316 k~ = 113 a kO 113 p kO= 22 r Mi~o 4 m=4 r p k~ - 1 12 k3 = 1 4 kO = 1 7k = 12 k = 1 10 Table 6-14, Alternative Values for Parameters of Heart Disease Problem

gz:: g 8* z~~~~ ~ 8~~~~~~~~ ~o 0 0 0 0 0.50 sO Og * *0. 0 0 *..,-I O0~~~~~~~~~~ ~ o 8~~ ~ 000 0 *0 0oo 4.0 0oo oo oo 00 oo ~0~ ~ ~-m FigurC6-14 TimeCost by ~famil for ase~ CO 4ot..C.. I- I t t, I0 I,,.00 10.00 20.00 30.00 1L0.00 50.00 60.00 70.00 80.00 90.00 k-family Figure 6-14. Time Costs by p-family for Case 2

318 0-family: 28 (Optimal Solutions) Solution 1 10"0010 000000 10 0010000000 1 A10 1 0 0 1 1 0 1 1 1 1 /2... 310: 000100111 k1.. k10: 1 4 1 1 1 3 1 1 1 1 Solution 2 01'"10' 0 1 0 0000000'~"'0:''001 0000000 10001001111011 1 10 -.l -..10' i 0 0 1 1 1 1 0 1 1 2 10: ~000101001 kl1.k10 - 1 4 1 1 1 1 1 3 1 1 Common Items m =-1 m =12 m =1 m =1 a P t 1= 84.00 t 1 -=213.04 T = 188.39 7,1 = t12,1= 20Q200 S = 22098 t14 = 344.00 Table 6-15. Optimal and Near Optimal Solutions for Case 2

319 0-family: 10 (Optimal Solutions) Solution 1 0 00000 ooooooooo 0000100000 A10 1011001111 2.. P10: 00100011 1 k1'* k10: 1 4 1 1 1 3 1 1 1 1 Solution 2 o o o 0 1 0 0 0 0 0 0f 010: 0000100000 oeo~ 1011011011 I 0 1 1 0 1 1 0 1 1 2'''10: 001001001 k1 k1Ok 1 4 1 1 1 1 1 3 1 1 Solution 3 0000100000 1l..10' 01110001111 I2-10 110000111 k l.-.k10' 1 1 1 4 1 3 1 1 1 1 Table 6-15. Optimal and Near Optimal Solutions for Case 2 (Cont.)

320 Solution 4 01" e0100 0 O0 1 0 0 0 0 0 00 0000100000 01 010~,i 1 i 0 1 0 0 0 0 0 1.A10 1 1 0 0 1 1 0 1 1 2'' 10~ 110001001 kl~'k10 k 1 1 4 1 1 1 3 1 1 Common Items m =1 m =12 m =1 m =1 a P1 r2 t 1 84.00 3,1 t - 213 04 T = 188.39 7 1t12 1= 202.00 S= 22098 12,1= t14 = 344 00 Table 6-15. Optimal and Near Optimal Solutions for Case 2 (Cont.)

321 0-family: 19 Solution 1 0001000000 ~0l: ~0001000000 a1* * A 1 1 1 01 0 1 0 1 1:2:~10 110000111 kl..k10: 1 1 1 4 1 3 1 1 1 1 Solution 2 0001000000 0;' 01 0001000000 1 10 10: ~ 1 1 00 1 1 1 0 1 1 P2'0O: 110001001 kl"'k10' 1 1 1 4 1 1 1 3 1 1 Common Items m =1 m =12 m =1 m =1 a P r2 t3 1= 86.00 t7 1 = 217.04 T = 191.88 t12 1= 198o00 S = 20742 12,1= t14 1= 236.00 14,1 = Table 6-15. Optimal and Near Optimal Solutions for Case 2 (Cont. )

322 0-family: 55'~,' ~n.. i I I Solution 1 1"' lO:0100000000 1 o10* 01@ o*01001100o ooo a1~ ~alo: 1 0 1 1 1 0 1 1 1 1 2" 10: oolloolll kl.~klO: 1 4 1 1 1 3 1 1 1 1 Solution 2 01 010: 0100000000 01ei. 010' 0 1 0 0 0 0 0 0 0 0 ~! ~ 010* I, 10 1 0 1 1 1 1 1 3 1 1 P2... *10: 0 0 1 1 0 1 0 0 1 kI... k10 1 4 1 1 1 1 1 3 1 1 Common Items ma:-1 m =12 m =1 m =1 pa Pr r2 tg = 86, 00 3,1 t = 217. 04 T= 191.88 t1 1= 198.00 S 20742 t14 1- 236. 00 Table 6-150 Optimal and Near Optimal Solutions for Case 2 (Cont.)

323 0-family: 4 Solution 1 01"'10: o:0000001000 0515010 0000001000 A A o10 1 0 1 1 10 1 110011 12~ P10' k1.. klO: k1 4 1 1 1 1 1 3 1 1 Solution 2 01.... 010 0 0 0 0 0 d;~ ~40: 1 1 1 0 1 1 0 0 1 1 1''10' A 111011 0011 14" P10 kl..k10 1 1 1 4 1 1 1 3 1 1 Common Items m =1 m =12 m =1 m =1 a P r1 r 2 t3 1= 80.00 t7 1 = 221.04 T = 194.07 7 1 = t12 1= 202.00 S = 22098 t14 1= 344.00 Table 6-15. Optimal and Near Optimal Solutions for Case 2 (Cont. )

324 0-family:. 37 Solution 1 01' * e10 0 O0 1 0 1 0 0 0 0 0 ~00 90 1 0 010 1 0 0 0 0 11 010100 1 ~ 10 I I 0 0 0 0 1 1 1 1 1 0 0 0 0 0 I 1 1 1 ~2..o p10~ 100000111 kl.-klo: 1 1 1 4 1 3 1 1 1 1 Solution 2 0 00 0 0 1 0 0 0 0 0 1~ * * 10 1 1 0 0 0 1 1 0 1 1 1 10O,2 1o: 100001001 k klOk 1 1i 1 4 1 1 1 3 1 1 Common Items m =1 m =12 m =1 m -1 a p r1 r2 a: 90.00 t3,1 = t7 A_ 219 04 T = 194. 50 t 1= 208,.00 S = 23454 t14 1= 464.00 Table 6-15. Optimal and Near Optimal Solutions for Case 2 (Cont. )

325 0-family: 1 0000000000 ~1'*10:~ 0100010: 0001000000 1 A 10: 11 1 1 0 1 1 1 1 2:10 11000 0 1 1 1 k1''k 1 1 1 1 4 1 3 1 1 1 1 m =1 m =12 m =1 m =1 a P r r2 t = 83.00 t7 1= 221.04 T = 194.51 tl2 1= 178.00 S = 18482 t14 1= 236.00 Table 6-15, Optimal and Near Optimal Solutions for Case 2 (Cont.)

326 0-famiy: 46 Solution 1 01 010: 0011000000 1100 1 0 11111 /32... 1 0 0 000 1 1 1 klkl: 1 1 1 4 1 3 1 1 1 1 Solution 2 01. 010 0 0 1 1 0 0 0 0 0 0 0.1. 0lo 0 0 1 1 0 0 0 0 o o 0 - 0011000000 1 1100111011 2: 1 00 0 0 1 0 0 1 kl1v k10: 111 4 1 1 13 1 Common Items m =1 m =12 m =1 m =1 a P r1 r2 t3 1= 90.00 t 1 221.04 T = 196.00 7,1 - t1 2,1 = 202. 00 S = 22098 t = 356.00 14,1 Table 6-15. Optimal and Near Optimal Solutions for Case 2 (Cont.)

327 0-family result in the same physical storage structure.. Moreover, the structures for 0-families 10 and 28 are again identical, as are the structures for 0-families 19 and 55. Figures 6-15 through 6-20 contain schematic representations for the solutions of 0-families 28 (and 10), 19 (and 55), 4, 37, 1, and 46, respectively. Upon close inspection of the storage structures for 0-families 28, 19, 4, 37, and 46 we find that they are the same as the structures obtained for those 0-families in the previous case with the exception that the stacks of description block indicators do not contain head pointers. The lack of these head pointers is due, of course, to the reduced number of description block indicators in each stack. We also see that the additional near optimal solution (in 0-family 1) has the same basic form as the others. Instead of having a ring (or stack of rings) or stacks of description block indicators, this structure uses a stack for the stacks of description block indicators. For the purposes of comparison we have determined the time of the Case 1 optimal solution given the conditions of Case 2. The value of this time cost is 188. 41, whereas the time cost of the Case 2 optimal solution (given the conditions of Case 2) is 188. 39. Thus, we may conclude that our solution is relatively insensitive to minor variations in the values assigned to the parameters ko- oko. 1 10"

328 Pi. > pi4 i~Pi7 Pi 50 Pi2 I - Pi5 I 1 -Pii P Pi3 - Pi6 I Pig. Pi12 Figure 6-15. Best Solution for ~-families 10 and 28 for Case 2 (Optimal Solution)

329 rjI rjj3 j4, =Pi -_- Pi l Pi Pi, —--- -1 ~-' —t Pi4 I I 1 " Pi2 >Pi5 Pi _s Pi N~~ — Pi 3Dp6'.L...Pi9.1..'l2 Figure 6-16. Best Solution for 0-families 19 and 55 for Case 2

330 r, - rj4 l Piy 4oPi7 _C = ----- Pi2 5 — PI Pi -I — — ~ ~Z' —W Pir Pi6- Pi 1 Figure 6-17. Best Solution for 0-family 4 for Case 2

331 Pi I ri i =Pi 4 P, Pi8 Pil Pi 3 PiP6 Pi9 Pi 2 Figure 6-18. Best Solution for q-family 37 for Case 2

332 dirj2 Pi Pi Pi 7 rj4 Pi5 r -jbPi 10 12 Figure 6-19. Best Solution for q-family 1 for Case 2

333 rj rjl js rj4 Pi PI'4. P 10'-,EPI? P; PiI0 i2 —- -"" -Pi Pi 1Pi8...Pie- Pil I -i s Pi Pj 6 Pig9 P Pi12 Figure 6-20. Best Solution for 0-family 46 for Case 2

334 6.3. 2 Variation in the Frequencies of Operations Let us now consider the effects of variation in the relative frequencies of the given primitive operations. Instead of the relative frequencies used for Case 1, suppose that we were to use the following values: Operation Relative Frequency Q3 (dir-) 0. 20 Q7(di-Pk) 0. 70 Q1 2(di* -) 0.05 Q1 4(-*Pk) 0.05 These values tend to assign more weight then before to the less significant operations. Keeping the values of all other parameters fixed as they were for the Case 1 and applying our program to these data, which we will designate Case 3, yields the solution listing contained in Appendix G. Figure 6-21 contains a plot of the time costs for the best solutions in each of the various 0-families. It appears for this case that the optimal solution lies in a somewhat sharper valley since there are fewer solutions within 3 to 4 percent of the optimum (although this fact may not be overly significant). We see once again that the optimal and near optimal solutions are

0 0 0 0 0 0 c; ~,. ~. r ~, r o Olt g.. ru~~~~~~~~~~~~~~~~~~~E ~ 0oo " So~~~~~~~~~~Ii s-8 8 0~~~~~~ ~ O0 AT~~ * Ad S. Og S Ad *~ we * * * o *g S ** *5 * S U ~~ St * * S4 0~~~~~~~~~~~~~~ 0 0 ~rl~~~~ E t ~ 0 e..4 0 60 0o (-family Figure 6-21. Time Costs by (-family for Case 3

336 concentrated in the same 0-families, as before, however, In particular the 0-families containing solutions within 3 percent of the optimum are 4, 10, 19, 28, and 55. We see also that these solutions are identical to those in the corresponding 0-families for Case 1. The only difference is in the ordering of these solutions (by time cost). For this case 0-families 19 and 55 contain the optimal solution, followed by the solutions in ~-families 10 and 28 and in ~-family 4, in that order. We see that given the conditions of Case 3 the Case 1 optimal solution has a time cost of 262. 95 versus a time cost of 258. 85 for the Case 3 optimal solution. We conclude, therefore, that our solution is also relatively insensitive to minor variations in the relative frequencies assigned to the given primitive operations, although it is perhaps more sensitive to variations in the values of these parameters than to variations in the values of k e.. ko 6. 3. 3 Solution for the Primary Operation We may find it instructive to compare the solutions we have obtained using a weighted sum of the time costs for several operations and average statistics for relations with a fairly large variance with the solutions we might obtain for a single primitive operation and statistics for only the relations with which that operation is intimately concerned.

337 In particular, let us consider the primitive operation Q7 (which has the greatest relative frequency in our other examples) and the relations r *'r20. We can easily determine that the relations o 0 r * r 20 will result in the statistics of Table 6-16. (See the derivation of the results of Table 6-7 for the procedure involved.) Table 6-17 contains a suitable integer approximation of these statistics. Appendix H contains the solution listing for this problem and Figure 6-22 contains a plot of the time costs for the best solutions in the various 0-families0 The optimal storage structure, which is given by the solutions for 0-families 10 and 28, appears in Figure 6-23. The 0-families which contain solutions with time costs within 3 percent of the optimum are, in order of increasing time cost, 19, 55, 1, 37, and 46. From this information it is clear that in our earlier examples operation Q7 has, because of its weight, a very marked effect upon deciding what the optimal structure is. 6.4 Summary Let us conclude this example by summarizing its main points. First, we have shown that application of our procedure to a particular problem can result in the specification of a storage structure significantly more efficient than a storage structure determined by other less rigorous, qualitative methods.

338 k~ = 50 a k~ = 33 p k~ = 20 r m - 2. 86 r p kO _ kl = 1 2k~- 1 k~: 1 3ko = 6.64 kO= 1 78k~8 2.82 k = 28,39 kO = 1 10e Table 6-16. Characteristics of Relations r1o r o0 1- or20

339 k~ = 50 a k~ = 30 p k~ = 20 r mr = 3 p k = 1 k~ - 1 ko = 1 kO = 6 4 ko = 1 8 k9 = 30 10 1 Table 6-17. Integer Approximation of Statistics in Table 6-16.

o Q 0 01 o e 88 S S S S. O o o ~ ~'t 8 O _ 0 ~-family Figure 6-22. Time Costs by ~-family for Operation Q7

341 Pi, -. -.Pi. i Pi, I ------ Pi -_ Pi 17 A -+-..-. Figure 8-23. Optimal Solution for Primitive Operation 7~, —~............ <~~~~i....... -"~~~~~~>, Pi5- -*i Figure 6-23. Optimal Solution for Primitive Operation Q7

342 Secondly, we have illustrated that the solution obtained is relatively insensitive to minor variations in the values of the parameters ko" ok0ko and the relative frequencies of the given primitive operations. This is of particular significance since the estimation of these values may be somewhat difficult for problems involving large numbers of data items and relations. Finally, at least within the context of the problem considered, we have demonstrated that the optimal storage structure is very much dependent upon the operation having the greatest weight. This information may be of importance if all the operations to be used in conjunction with the structure are not known with complete certainty - the primary operation (or operations) may then be used to obtain a solution which can be at least near optimal in terms of the overall problem.

Chapter VII CONCLUSION The objective of the research reported here has been the development of a rigorous quantitative method for the automatic design of optimal storage structures for the representation of data within a computer memory. To accomplish this objective we partitioned the problem into four basic parts. First, in order to provide a framework within which to work, we defined a relational model of data structure, To apply this model to a particular problem, the user (i.e., the problem solver) must determine what data items are involved in the solution process he has chosen and what relations (of accessibility) relate the various data items to one another. He must then determine the (average) cardinalities of the various sets which we have defined for the model, These cardinalities in essence give us a measure of the redundancy or the "share-ability" of the relations among the data. Second, we developed a model for the specification of the storage structures which can represent an arbitrary data structure as given by the data structure model. This storage structure model consists primarily of a set of decision variables, which are used to specify the structural form of a storage structure, and a set of parameters, which are used to characterize the environment (i. e., the computer) in which 343

344 the storage structure is to exist. As a third step, we defined two basic measures of performance - a time cost function and a storage cost function - for use in comparing the relative merits of a collection of storage structures. The time cost function reflects the number of time units required to perform certain primitive operations using a particular storage structure, where the problem solver chooses the primitive operations of interest from a standard collection of these primitive operations and assigns weights to them to reflect their relative importance in the problem solution. The storage cost function simply reflects the total number of storage units occupied by the storage structure of interest. Finally, we presented a procedure which (automatically) compares the time and storage costs for those storage structures which are members of the set of feasible storage structures (for a given data structure) and determines the storage structure for which these costs satisfy certain optimality conditions. For our purposes, a storage structure is considered to be optimal if it minimizes the time cost function, subject to the constraint that its storage cost is less than some bound. In order to demonstrate the feasibility and the effectiveness of the techniques which we developed, we applied them to a problem for which a solution program had already been implemented and for which

345 fairly extensive data about system performance were available. The results we obtained were enlightening and demonstrate conclusively that the approach we have taken is not only feasible, but desirable. First, we were able to show that the storage structure which we determined to be optimal reduces by a factor of approximately 2.5 the time required to solve a typical problem, as compared with the storage structure implemented for the existing system. The optimal storage structure also requires approximately two-thirds the storage required by the existing storage structure, even though we made no particular attempt to minimize storage and assumed in fact no limit on the storage available. Although such significant improvement in system performance may not always result, these figures do give some indication of the potential of a rigorous approach to storage structure design. Next we considered the "uniqueness" of the optimal storage structure. The results we obtained indicate that there are a number of storage structures with time costs within 3 or 4 percent of the time cost for the optimal solution (and with essentially the same storage costs). This would tend to indicate that there is no single storage structure which should be considered the optimum, but rather that there are a number of near-optimal storage structures, any one of

346 which would serve as well as the others. To a certain extent, this is true. We found on the other hand, however, that all of these nearoptimal storage structures were very nearly identical to the storage structure determined to be the optimum. This fact tends to favor the idea of a single storage structure. We also considered the sensitivity of our solution to variations in the values of those parameters most subject to error - namely, the average values of the cardinalities of the data structure sets and the relative frequencies of the primitive operations. The values of these parameters must be estimated by the problem solver and, hence, are subject to the errors inherent in any estimation process. This will be particularly true for the cardinalities of the sets when the data structure involves very large numbers of data items and relations. Our results indicate, however, that the solution we obtain is relatively insensitive to minor variations in either of these two sets of parameters. Finally, we examined the solution obtained by considering only the primitive operation with the greatest relative frequency (0. 8 out of 1.0) and compared this result with the previously determined solution. On the basis of this comparison we concluded that when all the primitive operations for a particular problem are not known with a great degree of certainty, we can obtain a solution which may be very nearly optimal

347 by considering only the principal primitive operation or operations (which are presumably known). To further investigate the sensitivity of the results of our storage structure design procedure, it might be desirable to consider a more extensive variety of problems, perhaps a broader range of variation in the values of the parameters to which the sensitivity calculations apply, and possibly variations in the values of additional parameters (such as the relative times to follow a pointer and to step through a stack). We cannot argue with the potential value of such investigations. We can only point out that each variation in the problem to be solved requires performing the complete solution process, which for our prototype program involves, while not excessive, a nontrivial amount of computer time, In particular, solution of our example problem involving four primitive operations required approximately 7.3 minutes of CPU time on the IBM 360/67. This suggests that one should at least choose his examples and the values of his parameters for these investigations in a prudent manner. While on the subject of the amount of computer time required by the solution process, we might point out that the amount of time required is completely independent of the numbers of data items and relations (and the corresponding cardinalities of the data structure sets) and is a function primarily of the number of primitive operations

348 characterizing the problem to be solved. This follows directly, of course, from the nature of our time cost function. Let us now consider some of the strengths and weaknesses of certain design considerations employed in developing our procedure. In order to apply the procedure developed here to his particular problem, the user must specify the data structure which characterizes his problem. This involves specifying the numbers of data items and relations involved in his problem, as wellas the average cardinalities of the various data structure sets. Before he can even begin to determine these quantities, however, the user must rigorously define the solution process which he intends to use for his problem and in doing so must choose the relations which will characterize his data structure. By assuming that we must be given the data structure specification as the starting point of our storage structure design procedure, we have completely ignored this facet of the user's system design problem. Specification of a solution process is, of course, another of the areas of computer systems design for which there is a paucity of formal, objective techniques for evaluation and comparison of the effects of alternative decisions. Since the storage structures specified by our procedure are strongly influenced by the user's choice of relations (i e,, by his choice of solution process), this matter is clearly of importance to us in spite of our assumed

349 disregard for it. Let us discuss for a moment the implications of using average values for the cardinalities of the various data structure sets. In the first place, using exact values for these cardinalities is completely out of the question in all but the simplest of situations. Determining exact values would be tantamount to constructing a schematic of the entire data structure for a given problem, and applying our procedure to a data structure model utilizing exact values would essentially be equivalent to completely solving the user's problem for every storage structure considered. Clearly, even for problems of moderate size, neither of these steps can be accomplished through a reasonable amount of effort. On the other hand, determining average values for the cardinalities can be done quite easily (as we have shown in our example). Furthermore, using average values tends to minimize the amount of information which must be supplied by the user. One might criticize the use of average values if the variances of the various cardinalities are large, but it may be possible in such cases to partition the problem (i.e., the data structure) into sub-problems for which the variances are not so great, and then to determine the optimal storage structure for each of these sub-problems. Closely allied with the specification of the relations and the set

350 cardinalities for the data structure is the selection of primitive operations and their relative frequencies. Since we have provided only a limited number of primitive operations, the user may encounter some difficulty in trying to match exactly his operations with the primitive operations. The obvious solution to this problem is, of course, to define additional primitive operations. This may or may not be desirable, depending upon how great the mismatch is. Perhaps the biggest weakness in our consideration of the storage structure problem is the neglect of manipulative operations. Most certainly the updating and alteration of the information contained in a storage structure can have a profound effect upon the form of that storage structure. Again, the obvious solution to this problem is simply to define additional primitive operations to cover the operations desired, This can certainly be done, but development of the corre - sponding time costs would definitely require a nontrivial amount of effort. We hasten to point out that there are a multitude of problems which involve only the interrogation of a static data base. Clearly, our procedure can be applied to these problems without the necessity of adding manipulative operations to our set of primitive operations. One other matter with which one might take issue is our choice of optimality conditions. Although other choices may be made, we feel

351 that minimization of the time cost function subject to a limit on the storage available is the most universally applicable. Finally, we have assumed that the storage structures considered by our design procedure are to reside in the primary store of the computer. By introducing appropriate constraints and by choosing proper values for the parameters representing time cost characteristics, it should also be possible to apply the model developed here to problems for which the storage structures are to reside in the secondary store or a hierarchy of storage devices.

Appendix A METHODS OF IMPLEMENTATION FOR THE PRIMITIVE OPERATIONS Operation Q1 di r 1* jPk Method 1 Search for source d.. Search for r. among the associated relation symbols. If rj is found, search for Pk among the associated targets. Method 2 Search for relation r.. Search for d. among the associated sources. If di is found, search for pk among the associated targets. Method 3 Search for relation r.. Search for pk among the associated targets. If Pk is found, search for di among the associated sources. Method 4 Search for target Pk' Search for rj among the associated relation symbols. If r. is found, search for d. among the associated sources. 352

353 Operation Q2' d. r. * 2 1 j Method 1 Search for source d.. 1 Search for r. among the associated relation symbols. Method 2 Search for relation r.. J Search for d. among the associated sources. Operation Q3: d. r. Method 1 Search for source d.. Search for r. among the associated relation symbols. Determine all associated targets. Method 2 Search for relation r.. J Search for d. among the associated sources. Determine all associated targets. Method 3 Determine all. targets. For each target, search for r. among the associated relation symbols. If r. is found, search for d. among the associated sources. J 1

354 Operation Q4: * rj Pk Method 1 Search for relation r.. Search for Pk among the associated targets. Method 2 Search for target Pk' Search for r. among the associated relation symbols. Operation Q: - rj Pk Method 1 Search for relation r.. Search for Pk among the associated targets. Determine all associated sources. Method 2 Search for target Pk' Search for r. among the associated relation symbols. Determine all associated sources. Method 3 Determine all sources. For each source, search for r. among the associated relation symbols. If r. is found, search for Pk among the associated targets. ]

355 Operation Q6: di Pk Method 1 Search for source d.. 1 Search for Pk among the associated targets. Method 2 Search for target Pk' Search for di among the associated sources. Operation Q7: di - Pk Method 1 Search for source d.. Determine all associated relation symbols. For each relation symbol, search for Pk among the associated targets. Method 2 Search for target PkDetermine all associated relation symbols. For each relation symbol, search for d among the associated sources. Method 3 Determine all relations. For each relation, search for source d.. If di is found, search for pk among the associated targets. Method 4 Determine all relations. For each relation, search for target Pk If Pk is found, search for di among the associated sources.

356 Operation Q - r. * 8 J Method 1 Search for relation r.. Determine all associated sources. Method 2 Determine all sources. For each source, search for r. among the associated relation symbols. Operation Qg: * r. Method 1 Search for relation r.. Determine all associated targets. Method 2 Determine all targets. For each target, search for r. among the associated relation symbols. OperationQ10: - r - Method 1 Search for relation r.. Determine all associated sources. Determine all associated targets. Deter mine all associated targets.

357 Method 2 Search for relation r.. J Determine all associated targets. Determine all associated sources. Method 3 Determine all sources. For each source, search for r. among the associated relation symbols. If r. is found, determine all associated targets. Method 4 Determine all targets. For each target, search for r. among the associated relation symbols. If r. is found, determine all associated sources. Operation Q11 d * Method 1 Search for source d.. Determine all associated relation symbols. Method 2 Determine all relations. For each relation, search for di among the associated sources.

358 Operation Q12: di * - Method 1 Search for source d.. 1 Determine all associated targets. Method 2 Determine all targets. For each target, search for d. among the associated sources. Operation Q13 d. - - Method 1 Search for source d.. Determine all associated relation symbols. Determine all associated targets. Method 2 Determine all relations. For each relation, search for d. among the associated sources. If d. is found, determine all associated targets. Operation Q14 Pk Method 1 Search for target Pk' Determine all associated sources.

359 Method 2 Determine all sources. For each source, search for Pk among the associated targets. Operation Q15 - Pk Method 1 Search for target Pk' Determine all associated relation symbols. Method 2 Determine all relations. For each relation, search for pk among the associated targets. Operation Q16: - - Pk Method 1 Search for target Pk' Determine all associated relation symbols. Determine all associated sources. Method 2 Determine all relations. For each relation, search for Pk among the associated targets. If pk is found, determine all associated sources. Two additional methods could be defined for each of these operations (Q13 and Q 6) but were judged always to be more costly than the methods given here tsince each involves searching the entire storage structure) and are, therefore, not considered.

Appendix B TABULATION OF ei FOR i e {1,2, * 10} Each of the functions e ifor i e {1, 2, o *,10) is presented here in tabular form as a function of 41,k2,.. 10 Al, A2'..' 10 The variables for which values are unspecified in a given row of a table are free to assume either the value 1 or the value 0 (subject, of course, to the constraints of Chapter III). Note, however, that values specified for k1, 02'...'0 pertain to all succeeding rows of a table (even though not explicitly specified) until new values are assigned to them, For all (legal) combinations of values not covered by its corresponding table a function has the value 0. 360

't1 02 ThM344 Vi 46 Q7'8 o 8hAO1 ~2 534 L4z3 8bA e2 1 f2 0 1 0 2 0 0 1 0 1 s 0001 011 2 0'1 0 S1(2,4) 00001 0111 2 0 1 0 1 S1(2,4) 0 0 0 0 0 1 0 1 1 1 1 s2 0 1 1 1 0 S1(2, 6) 0 1 0 1 1 S1(2,4) 0 1 0 1 0 S2(2,4,6) 0 0 0 0 0 0 1 0 1 1 1 1 1 2 0 1 1 1 0 1 S1(2,6) 0 1 0 1 11 S1(2,4) 0 1 0 1 0 1 S2(2,4,6) Table B-1. Elementary Time Cost e2

t~29b 44'A8 1 2 "3 A4 47 A I08 e2 0 0 0 0 0 0 0 1 0 11111 1 S2 0 1 1 1 1 1 0 S1(2,8) 0 1 1 1 0 1 1 S1(2,6) 0 1 1 1 0 1 0 S2(2,6,8) 0 1 0 1 1 1 1 S1(2,4) 0 1 0 1 1 1 0 S2(2,4, 8) 0 1 0 1 0 1 1 S2(2,4,6) 0 1 0 1 0 1 0 S3(2,4,6,8) 6o 0 0 0 0 0 0 0 0 1 0 1 1 1 1 111 s 2 0 1 1 1 1 1 0 1 S1(2,8) 0 1 1 1 0 1 1 1 S1(2,6) 0 1 1 1 0 1 0 1 S2(2,6,8) 0 1 0 1 1 1 1 1 S1(2,4) 0 1 0 1 1 1 0 1 S2(2,4,8) 0 1 0 1 0 1 1 1 S2(2,4, 6) 0 1 0 1 0 1 0 1 S3(2,4,6,8) Table B-l. Elementary Time Cost e2 (Cont.)

~ 02 h0 2 23 A4 A a647 "8 9 61 ~ e 2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 S2 0 1 1 1 1 1 1 1 0 S1(2,10) 0 1 1 1 1 1 0 1 1 S1(2, 8) 0 1 1 1 1 1 0 1 0 S2(2,8,10) 0 1 1 1 0 1 1 1 1 S1(2, 6) 0 1 1 1 0 1 1 1 0 S2(2, 6,10) 0 1 1 1 0 1 0 1 1 S2(2, 6, 8) 0 1 1 101010 S3(2,6,8, 10) 0 1 0 1 11 1 1 1 S1(2,4) 0 1 0 1 1 1 1 1 0 S2(2,4,10) 0 1 0 1 1 1 0 1 1 S2(2,4,8) 0 1 0 1 1 0 1 0 S3(2,4,8,10) 0 1 0 1 0 1 1 1 1 S2(2,4,6) 0 1 0 1 0 1 1 1 0 S3(2,4,6,10) 0 1 0 1 0 1 0 1 1 S3(2,4,6,8) 0 1 0 1 0 1 0 1 0 S4(2,4,6,8,10) Table B-l. Elementary Time Cost e2 (Cont.)

q bP23 ~ ~ Q6 ~7Q Q9 j10 l Z~ A'94 54 AZ10 e4 1 f4 0 1 0 S4 001 0 1 0 0 0 0 1 0 11 1 4 0 1 0 S1(4,6) 0 00 0 1 0 1 1 1 S4 0 1 0 1 S1(4,6) 0 0 0 001 0 1 1 1 1 s4 0 1 1 1 0 S1(4,8) 0 1 0 1 1 S1(4,6) 0 1 0 1 0 S2(4, 6, 8) 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1 1 0 1 S1(4,8) 0 1 0 1 1 1 S1(4,6) 0 1 0 1 0 1 S2(4,-6,8) 0000000 00 0 0 0111 0 1 1 1 1 1 1 4 0 1 1 1 1 1 0 S1(4,10) 0 1 1 0 1 1 S1(4,8) 0 1 1 101 0 S2(4,8,10) 0 1 0 1 1 1 1 S1(4,6) 0 1 0 1 110 S2(4,6,,10) 1 0 1 0 1 1 S2(4,6,8) 0 1 0 1 0 1 0 S3(4,6,8,10) Table B-2. Elementary Time Cost e4

~1't24439i~'5 ~6 >7Q8'>91 ~1 52'53 A5\4Z 6 4' Z 10 e6 1 f6 0 1 0 00 1 0 1 s 0001 011 S6 0 1 0 S1(6, 8) 00001 0111 S6 0 1 0 1 S1(6,8) 00000 0 1 1 S 6 o~ 0 1 1 1 0 S1(6,10) 0 1 0 1 1 S1(6, 8) 0 1 0 1 0 S2(6,8,10) Table B-3. Elementary Time Cost e

2V4 V h 6 g V N O 41 A243 4 A 6 ^78 A 10 e8 1 f8 0! 0 s8 O01 0 S8 0 O 1 0 1 s8 0 0 0 0 1 1 s8 0 1 0 S1(8,10) Table B-4. Elementary Time Cost e8 8

'1 02; i'P a O8 A "20'1 A2 54 56 47 849A10 e10 1 f10 0 0 s10 Table B- 5. Elementary Time Cost e10

1 z 6 8 A 1 2 4 5A6 LY b8 10 e9 1 fg 1 0 0 S9 100 10 59 100 0 110 S9 0 1 0 S1(9, 7) 1 0 0 0 0 0 0 1 0 1 0 S1(9,7) 1 0 0 00 0 0 0 0 0 s9 0 1 1 1 0 S1(9,5) 1 1 0 1 0 S1(9,7) 0 1 0 1 0 S2(9,7,5) 100000 0 111110 90 0 0 0 1 0 1 1 1 0 S1(9,5) 1 1 1 0 1 0 S1(9,7) 1 0 1 0 1 0 S2(9, 7, 5) Table B-6. Elementary Time Cost e

~1 ~2~ 05 9 9 010 AleA92N Nq ha 47 AB h9 AIO e 10000000 1111110 s9 0 1 1 1 1 1 0 S1(9,3) 1 1 0 1 1 1 0 S1(9,5) 0 1 0 1 1 1 0 S2(9, 5, 3) 1 1 1 1 0 1 0 S1(9,7) 0 1 1 1 0 1 0 S2(9,7,3) 1 1 0 1 0 1 0 S2(9,7,5) 0 1 0 1 0 1 0 S3(9,7,15, 3) 10 00 00 0 00 111 110 S9 1 0 1 1 1 1 1 0 S1(9,3) 1 1 1 0 1 1 1 0 S1(9, 5) 1 0 1 0 1 1 1 0 S2(9, 5,3) 1 1 1 1 1 0 1 0 S1(9,7) 1 0 1 1 1 0 1 0 S2(9,7,3) 1 1 1 0 1 0 1 0 S2(9,7,5) 1 0 1 0 1 0 1 0 S3(9,7,5,3) Table B-6. Elementary Time Cost e9 (Cont.)

'2 934 I g O 81263 4 A 6 7 "8 6910 e9 00 00 00 00 0 1 111 11 1 1 1 0 s 0 1 1 1 1 1 1 1 0 Sl(9,1) 1 1 0 1 1 1 1 1 0 S1(9,3) 0 1 0 1 1 1 1 1 0 S2(9,3,1) 1 1 1 1 0 1 1 1 0 S1(9,5) 0 1 1 1 0 1 1 1 0 1 1 1 2(9,5,1) 1 1 0- 1 0 1 1 1 0 S2(9,5,3) 0O 1 0 1 0 1 1 1 0 S$3(9,5,3,1) 1 1 1 1 1 1 0 1 0 S1(9,7) 0 1 1 1 1 1 0 1 0 S2(9,7,1) 1 1 0 1 1 1 0 1 0 S2(9,7,3) 0 1 0 1 1 1 0 1 0 S3(9,7,3,1) 1 1 1 1 0 1 0 1 0 1S2(9,7,5) 0 1 1 1 0 1 0 1 0 S3(9, 7, 5, 1) 1 1 0 1 0 1 0 1 0 S3(9,7, 5,3) 0 1 0 1 0 1 0 1 0 S4(9,7,5,3,1) Table B-6. Elementary Time Cost e9 (Cont.)

0 024c1 321 A14 8J951O 47 N 9'h e7 1 f7 1 0 0 s 7 10 0 10 s 7 1 0 00 110 S7 0 1 0 S (7,5) 10000 1110 87 1 0 1 0 S1(7,5) 10000 0 11110 S7 0 1 1 1 0 S1(7,3) 1 0 1 0 S1(7,5) 0 1 0 1 0 S2(7, 5, 3) Table B-7. Elementary Time Cost e7

261 05 ~4 5 b ~'9 ~W10 81 A2 49 A78 9r10 e7 100000 0 1 11110 7 1 0 1 1 1 0 S1(7,3) 1 1 1 0 1 0 S1(7,5) 1 0 1 0 1 0 S2(7,5,3) 000000 0 0 0 0 0 0 11 1 1 1 1 07 0 1 1 0 S1(7,1) 1 1 0 1 1 1 0 S1(7,3) 0 1 0 1 1 1 0 S2(7,3,1) 1 1 1 0 1 0 S1(7,5) 0 1 1 1 0 1 0 S2(7,5,1) 1 1 0 1 0 1 0 S2(7,5,3) 0 1 0 1 0 1 0 S3(7,5,3,1) Table B-7. Elementary Time Cost e7 (Cont.)

~'I 02 4 Q 6 10 1 X2 A4 45 6 4 7 1 i10 e5 1 f5 1 0 0 S5 1 0 0 10 s5 1 0 0 0 110 s5 0 1 0 S1(5,3) 1 0000 1110 S5 c 1 0 1 0 S1(5,3) 0 0 0 0 0 1 111 0 S5 0 1 1 1 0 S1(5,1) 1 1 0 1 0 S1(5,3) 0 1 0 1 0 S2(5,3,1) Table B-8. Elementar~ Time Cost e5

5V1 i 3 ~ 5 ~i Q1 8' 10 12 3 41 5 eB 3 8 bg h10 e3 1 f3 1 0 0 S3 1 00 1 0 53 0 0 0 1 11 0 S3 0 1 0 S1(3, 1) Table B-9. Elementary Time Cost e3

~q1 ~2' ~ 5' 010 A1 A2 3 AO 45 a 6 7 A 8 9 A10 e 1 fl 0 0 SI Table B-10. Elementary Time Cost el

Appendix C SUMMARY OF zi(t) and zi (t) FOR i E {1, 2,**, 10} I 1 z1(t) =6 e5 + K33 [ e+ X4(e*+e)] K32 [ X4e + X2 (e2 +e) ] K31 [2e1 +tl] +z where z = {((p6V(6) V a6'5VZf6~5(b4V ~ )V"' V a6A5A 4A3A 2A1) + (q6~P6 5v (6~ ~5 (4V4 )V66 45(4v3jV' * 6* V6 E4 +K33[ (q5(04V V A 4 3VsA 4A 3 (sb2Vq)V.''V' A4A3!2 1) +(~4<t>4 42Vp 43 3A21 V 4 A'3A23 1)] +K32 [ (03(02V02)V03A201v/03A2A1) + (020 01V v20A 1)] +K31 [ 01]} So 376

377 z2(t) = e +K33[ e5+ 4(e +e3)] + K32[ X4e3 + X2(e2 +e)] + K31[ 2e + t + z 2 where z2 {(%5VA5(04V40 )~A5A403vA5A4 3(02V02 )V'" V \/5A4Z3A21 ) + K33 [ (05(04V 0v) 05 A403\V05A4ZA3(02V\/0)\/ V054AZ3 A21) 4+(4'03\/ 04 A3(02V 0)V /0404 93201/ 04GA3 2Zl) ] + K32 [ (03(022)V0)V3A201/03 2A1) + (K310V 02] 1)] +K31 [ 01] }s0

378 z3(t) = e5 + e+K34 [ e6 7( +e)] + K35 [ X7e8 + Xg(e 9 + ejO)] + K36 [ Xge0+ t] + 3 where z3 = ((050)vA 506VA 5A6(07/ 0)\/... *VA 67 891o) + (055V5 06056(07V07) V0505 h67P 8TV V 7 8 90505~6A7 9N) K34 [ (06(07V07)V06A\708V06A7A8(09\V0/)v'.'' V/06A7A8A9A10) + (070 0 o8v 0707, 8( v091 070 o 8A90' 07 07Ao ) + K0 5 [ ( 8(09\/ 9)V 08Ao9010 v08 9A10) + (090010v 0909 o10)] + K36 [ 010] } so

379 z4(t) = e + K34 [ e'6 +7(e + e)] + K35 [X7el + Xg(e' + e0)] + K36 [ X9el0 + t] +zo where z4 = 6 (06 A6( 07V 07)VA6A708VA6A7A8(09V 06)V" VA6A7A 8A9A10) + K34 [ (06(07V'0")'V6A7 8\/06A7A8(%9V/0~ )V' *'06A7 8A9A10) + (07:708 v07 a8(09v %) v0707 89 A80A9 V 07s8A9 A 10) ] + K35 [ (08(09V'0)\ V08A9 010V/08NA910) + (09 01ov0s09Alo o)] + K36 [ 010] }so

380 z (t) = (m +1) F + mr 2(t) 0 0 = m z z6(t) = (mr+l) Fr + mr 4(t) Zo = m 6 r4

381 z (t) (ma+l) Fa + mae* o 2 + K12 [ el +X3(e3 + e4)] + K13 [X3e4 + t] } + z7 where z = ma T(( 01V/0)\/Al02VAlIA2(0 3 )l0\ 1 2A 3 4 VA1 2 3 4 (1/5 /5\ 1 2 3 4 5P' + (01 2 1 )01 A2A304 1 01"2 01A1 2(03\/ 03)V1 2A3A 4A o 1 1 A 2 3 4 5 A 5 ) + K12 [ (02(03v03) v02A3 04V0234(Pl\V05vA) v02 3 454 5 + (K304\/03 A4(P1\/05\/0)\030 A45P1)] + K13 [ (04(P1V05v05)V04A5P1)

382 z8(t) = (m + 1) F + mp { e* + e9 + K55 [ e9 +X8(e~ +e.7 + K54 [8e7 + t]} + z where ZO= mp {(( 0 8 p 16' 0)"A1009`(A' 0\ \/ -lo109'9(0 08')\h 0\9"807 L1 Z9A 8A7 (P3 %6v) /VA10iA9A8A7A 63) + ((1000 9 09f 00ioA9(08 020)/010o10A9 A807 \/ 1010 0 A9A8A7(P /066) V010'0{ 0 A9A8A7A6P3) + K55[ (09(8"0))V0~9A807\09A8A7(P3V~6~)V09A8A7A6P3 + (08807V 0808 A7 (P: 06v0'6V08 7 3)] + K54 [ (07(P3V0 )V0763) + (060'P3)] }So

383 zg(t) =(ma + 1) Fa + ma{e] + e2 +K12[ e + X3(e* + e4)] + K13 [ X3e4 + X5 (e*5+ e)] + K14 [ X5e6 + X7(e 7+ e8)] + K15 [ 7e + X9(e + e0)] + K16 [X9ge t + t] } + where z~ - ma {((01v0{)Vc 02~ h l2(0303V' ^)V''' 2 ~3L4a5h6~7A A89A ) 01 +(1011020101a2(03V03)V~ -~V 0l1A2A3A4A5A6A7A8A9A10) + K12 [ (02( 3)i02A 3 04V02A3A4 (05 5)/ V' 02Y 3 A4 5 i \10 (03- 04\ 03 qA4 (05V 0'5).' 030' 3A4A 5A6A7A8A9A 10)] + K13 [(04(0 505)V/04A506~04A5A66(07V07)V''' V04Ara6"T68 9A10) +(050'06V050'5A 6(07V0 )V~/ ~ ~ V 0505 6 7 8 9 10)] + K14 [ (06('070)V06'47 08" 06 7 8(%0909) \f ~ 06 7 8 91 0) + K1 075 [ (8077A9V890 * 07078V9AI 0)10 + K15 [ (08(09V 9) V08 9010V 08A9A10) + (09c 00ov 090A10 )] + K16 [ 010] } s0

384 z0(t) =(m + 1) Fp+ mp e O+ e + K55[ e9+ X(e +e7)] + K54 [ X8e7 + 6(e + e6 5)] + K53 [ K6e5 + X4(e4 + eO)] + K52[ X4e3 + x2 (e2 + el)] + K51[ X2el + t] } + where 0o mp {((010V\/0)W1 0%09YA10,9(ogSV )v... * 4 0~~A4 ) +(_lo 0'o l vo t oA9(08V08)'' v* A0-o 100 A- 4~-)4 + K55 [ (%(V0o0)\'9Z~7'09ZZ84 (0' 6)/~ g +((801 C17/08/i7(0g06 *** *t 0808efe~w1) ] + K54 [ ((7(06/ )jV)V.6~07\0(7 6 5( 04O 4 +(060605 0' 5( 4/) V... A6 542A21) + K53 [ (05(04V0) "V 5A403v05'4S3(02V\02)V''' V05A4LS3A21) + (04 03 v 00404A3t02\ 0, )V. V * 0)] + K52 [ (03(02V0 )V03A201V%3A2A1) + (0K [ 0]So + K51 [ 01] }s0

385 A A A z*(t) = e* + e5, (K33-i) e5, [ X4(K33- ]e 4 3 + (^2K31 + X21-1)e + K31 t + 1 2 ( + e5 + (K33-1) e + [ X4(K33-1) + 1] (e + e3) 4 (X4K32 X4 22 1) e + [2 (32 -1) + 1] (e2 + e + (X2K31 + I2Kll-l el + K31 t + Z1 (4 e3 5 + (K3- e + [ 1) + 1 ( + 7 A 4A A, A + (x7K35 + 7. 45-1) e8 +[ X9(K35-1) + 1] (e1 + e10) + ( 231 + X 2 K116 - 1) e 1 + K36 t + z2 * Q z(t) =e+ e + (34 -1) e6 + [ X774- (e + ) A _A A + (X7K35 + [. K45-1) e8 + X9(K35-1)+ 1] (e9 + e10) + (X936 + X K56 -1) e1 + K36 t + + (The7K35 + e7 K45-1) e8 + [ 9(K3 5-1) +1] (e + ez ) those for z0l z~2 z and z, respectively, with the exception that Kij is replaced by Kij wherever it occurs.

386 z (t) = m (F+e) + (m K F1) e X(m K33 )+l] (e +e3 5 mr rF+ 5 r 33 +LX4r 3 4 3 ~~~~q~0 +(Xm K32 X X" - 1) e + (mrK3 -1)+ 1] (e4* e+) 4 rK32 4 2 e 2( 32 +1 A 0 + )X2mrK3 + XK11-1) ej m K31 t + A r)+ z*(t)mr (F +e) + (mr 34)e + [X7(mrK34) (e+e A~~AI r+ (X r35+X7K45 ) e8 + [X9(mrK3.51 ) + 1] (e I +e0) A 0 + (Km rK36+X9K56-1) e10 + r 36 6 * A * 0 Z; (t) = m (F + + ec ) +(maKi2 1) e2 + [ X3(maKi2 l)+l1 (e3+e4 7 a 1 2 a 12 2 ~~~' r3 a 2 - 3 A A* + (X 3K1 + XK-1) el'+ mK 3t+ z* 3 a 13 3 23 4 a 13 7 A ^ (e r+eO)+ z (t) = m (F+e0e) + (m K35-1)e9 + [X8(m pK35-)1] (ee) s~2~ ~~~~~~~~~e %) +(X m K 5+X K1) e +( mm t + zo 8 p54 8 44 p 54 8 A9t e= + z ma(F a+e I+ e 2 aK 12- 1) e2 + X3(m a K12-1) +1] (e 3+ + (X3 ma 13+K3-7-1) e4 + [ X5(maK3-1)+1 (e +e0) ~~1+ e(X maK1+X K4-1 ) e2 + [X 3(maK1-)1)+l] (e + 5ma14 53 4 A7N +(X~~m~~tKIG+XSK 56~~-I)e0fmKGt + z%*g +8(X m K +e 0 +e) e + [8mK3 1)+] (em t+e 9 a 16 956 9) a 16 9A ~ t+ p o 44 p 54~~~~~~~~~~~~~~~~~~~~~~~~~~~~l

387 z( 10 +e)+ (mpK55-1) e+[ X8(mpK55-1)+1 (e +e) PO nip(F^+e,0 K'- 1) +1] (e*e 8 7 + 8 PK54+;8K44-1) e7+[ X6(mpK4-+1] (e*+e + (6mpK53 6 K33- 1) e5+ [ 4(ml53 -1)+ 1] (e +4 e3) + (X4mpK52+X4 K22-1) e3 + [ 2(mpK52-1) +] (e +e) + (X2mpK51 + k2K11-1) e + mpK51 t + 10 The e.7pressions for z0* *,z' z zO z~ * and z7O are the 5' 6' 7' 8' 9' 10 same as those for zO, z z z and z, respectively, with 5' 6' 7' 8' 9' 10' the exception that ma, Mr mp maK Ki and mK.. are a, r' p a ij' r i j' p ij replaced by mp mnij, and m i, respectively, a' r' p' a ij r ijpi wherever they occur.

Appendix D SUMMARY O F TIME C OSTS F OR PRIMITIVE OPERATIONS 1,1 Ta+ (l-Xl) z7(Crx) + x1(1-x4){(A6+A8)z7 (Cr )+A6A8z7(Cr ) +mr PlZ3(Cp)P1 Z4(CC)] } +X X4Z 7(Cr )+( -i)[pz3(Cp)+P(Cp) Cp)1 +[ P1Z3(Cp)+Plz4(Cp)l t =T +(1-xl)z5(C 1,2 r +x1( -x4)[ (6+A8)Z5(Ca)+A6A8z 5(Ca)+m Z4(C p)] +xtx4[ z (C~+( l-l)z4(Cp)+Z4(Cp)] t, -=Tr+(l-x2)z6(Cp) +2(1[ 56p (3+i5)z6(C+( p3 6 p z2(C) +Xe5E Z*(Cp)+(z -1)z (ca )+z2(Ca) t4 =Tp +(l-x2)z8(Cr 2+X2(1X5){ 6p3+AZ8(Cr )t( p3^Z8(Cr )+mz po3Z1(Ca)43z 2(C)] } +x2x5{z*(C )+(mAz -1)[Pfzl(Ca)tP3z2(Ca)] +[P3Zl(C a)+P3 z2(Ca 388

389,1 = Ta+(-Xl)Z7(C 1)+XlZ7 (Cr) t2,2 =Tr+(1 -x)z5(C a)+X 1z5(Ca) a 6 48)7r G 6 8Z7 r r(Vp)1) +mr,1'=T +(l 6+b )z7(C rl)++ 1 [pZ7( rl)+mrl[1Z3 (Vp)+, i 4(V)] t =T Z+() +AC* 3,2 r (X6 8) Z5(Ca)+6 (h8Z5(Ca)+m r z4(Vp ) 3:=k T +k {(1 -x2)z8(Cr +x2(1-x5){6 p(% +)z8(Cr +( 6+A3A8(Cr2 )+mZ [P3z l(Ca)+P3z2(Ca)] } 2 2 p +x~5x{z8(C )+(mZ -1)[P3z (Ca)+P3z2(Ca) ] +[ Z1(C a z2(C ) +m K36V l 436 p t1,4 +mr K V t4,1=Tr +(1-x2)z6(Cp)+X26(Cp) t42=T +(1-x2)z8(Cr)+x2z8 (Cr )

390;, r5 p( 3 )Z6 +(+z z 2(V ) t52 =T + 8 (A3+Pt5)Z8 (~C ++A )z (Cr2 3)+m zl +PP3z (V) (V 2 P,3 k a Ta+ k a(1 Xl)Z7(Cr +Xl(t-x4){(A6+a)z;(C rl)+%6A8z7(C rl)+mrl [plz3(Cp)+pz4(Cp)] } +X1X4{Z7(C r)+(rl -1)[P1Z3(Cp) 1z4(Cp)] +[PlZ3(C p)+P c )] 11 A 1-7 7LP 6 7rI p p +mZ.K3 K V +' m ~z z31 a ( t6 2 =Tp+(1-x7) ZlO(Ca)+X7zl0(C a) t7,1 =Ta+Z7(Vrl) +maK13{(1Xx4) lP1Z3(Cp) +P1Z4(Cp) -Vr] z1 a +x4[PZ,3(Cp) +P1Z4(C )]} t7,2 =Ta+Z 7(Vrl~' x4) 4 t72 =TP+Zs(Vr i +mpK54{(1xS) [P3z l(Ca) +P3Z2(a) -r +X5[P3Z l(Ca) +P3Z2(C a)}

391 t7,3 = kr T +k~{((l-xl)z5(Ca) +xl(1-x4)[ ( 6+A8)z5(Ca)+As z 5 (Ca)+ m z(Cp )] 6 8 5 a 6 g85 a r14 p +X1X4 [ (Ca)+(mA -1)z (C)+z (C )+-v } 4 5( ar 4 p 4 p r rk~ t, x kr t12+1x4kr Vr t7,4 rk Trk {(1-X2)6(C ) +x2(l-x5)[ 6 p( z3+5)z6 (CpY j3%5)z6(Cp)+m zp2(Ca)] +x2X5[ z6(Cp)+(mz -l)z2 (C)+Z(Ca)+Vr] } =k~ kOv r tl3 +X25 r r t81 = -T+z 5(Va) t =k0T + (1-X1)Z (C)+X1[Z7 (Cr)+Va] t8,2 a a k 7r{(l x)z(C 7 =ka t2,1+ xl ka Va t9,1 = Tr + z6(Vp)

392 t9,2=kOT +ko{(lx2)z8(c +x2[ 8z (C (C ] p p p 8 r2 8 r p o t 42 2 =kptA 42+X pVp t10,1= Tr+z5(Va)+m rK3Z4(Vp) t10,2 =Tra+Z6(Vp)+mrK3 6z 2(Va t =k a T +k0{( -1)Z(C)+xl[ ZC(cr )+Va+piz3(V )+p z4(V )] } t10,3 a a a17rl z7(Cr 3 10,4 p~p ( 2)zs(CZ)+x2[ Z8(Cr )+Vp+P3zI(Va)+p3 Z2(Va)] } til,1= Ta+z7(Vr ) a 7 t ll, = Ta + z9(V l)5(C [ 5(Ca r t1l= Ta + zI(Vp)

393 t1s2 =kpT +kp{(l -x7)z1o(Ca)+z7[ z10(C a)+V p t13,1 = Ta+z (Vp)+k2k4Vr t13,2 = k~T+k~r {(1-xl)z5(Ca)+xl[ Z5(ca)+r+Z4(V)] t141 = Tp+ZO(Va) a a p 7a 9(p'+a 1,2 ak~Ta+ka {(1-x7)z9 (C )+x7[ z9(Cp)+V 3 t = T+z Vr ) 15,1 p 8p r2 Q 2 r r z6p21 z6 (Cp r t5,2 =k~T +kr {(1-X)Z6(Cp)+X2[ 6 (-p)zr]x tl,1 = Tp+Z (Va)+(k7 k/m )V t16,2 r {(1-x )z9(C r' r p 2

Appendix E SOLUTION- LISTING FOR CASE 1 OF THE MEDICAL DIAGNOSIS PROBLEM In order to interpret the notation used in listing the best solutions within each ~-family, the following comments may be helpful. The ten digits following a character string of the form **PF(I): represent the values of 1...'10 for ~-family I, where the value of I simply indicates the position of the 0-family in the sequence of ifamilies generated and is used to simplify distinguishing the various C-families when referring to them. The number associated with COST is the value of the time cost function for the best solution (or solutions) within the given ~-family, and the number associated with CPU reflects the cummulative CPU time in seconds used to that point in the solution process. Following the line containing COST and CPU for a given 0-family are one or more pairs of lines representing the best solution (or solutions, if there is more than one pair of these lines) within the P-family. The first line of the pair contains values for 01. " 1'~i.f ".'. 10' and 2'...f310 in that order, followed by the symbol S 394

395 and a number which represents the value of the storage cost function for the given solution. The values of 1....10 are the same as those indicated for the p-family, of course. The second line of the pair contains sixteen digits, each of which represents the method chosen for one of the sixteen primitive operations, where a value of zero indicates that the corresponding operation is not under consideration. Following the last pair of solution lines is a line containing the entries N and R. The number associated with N indicates the number of feasible solutions contained in the ~-family and the number associated with R indicates the number of those solutions rejected because they violate the given storage constraint. Finally, appearing at the very end of the solution listing are a number of summary items. The number of the P-family containing the overall optimal solution is given here, along with the total number of feasible solutions considered in the solution process. (Note that if two or more 0-families contain solutions which are considered optimal, the number of the ~-family given as containing the optimal solution(s) is that of the last one encountered in the sequence of 0-families. ) Also given in this summary are the average and the maximum values of the time cost function over all feasible solutions. The actual value of the storage cost function is 0. 5 units less than the value shown here. 0. 5 is added to the actual value to effect round-off when this value is printed as an integer.

39G ***** **** SOLUTION BEEGINS ********** **PF( 1): 0000000.000 COST: 0.27291162E 03 CPU: 4.080 0000000000 0001010000 1110101111 110000111 S: 0*30686500F 05 0010001000010100 N: 64 R: 0 **PF( 2): 0000000010 COST: 0.31865479E 03 CPU: 7.994 0000000010 0001000010 1110101100 110000100 S: 0.15544500E 05 00100010000 10100 N: 64. O0 **PF( 3): 00000000100 COST: 0.2978 11 77E 03 CPU: 1 1.900 0000000100 0100000100 1011111010 001101000 S: 0.32042500E 05 001 0001000010100 0000000100 0001000100 1110111010 110001000 S: 0.32042500E 05 001000100001 0100 N: 64 R: 0 **PF( 4): 0000001000 COST: 0.25741382E 03 CPU: 17.653 0000001000 0000001100 1011110011 001100001 S: 0.33398500E 05 001001000010100 0000001000 0000001100 1110110011 110000001 S: 0.33398500E 05 00100010000 10100 N: 96 Re 0 **PF( 5): 0000001010 COST: 0.31881689E 03 CPU: 21.508 0000001010 0000001010 1011110000 001100000 S: 0.18256500E 05 0010001 000010100 0000001010 0000001010 1110110000 110000000 S: 0. 18256500E 05 00100010000100 N: 64 R: 0 **PF( 6): 0000001100 COST: 0.30840747E 03 CPU: 25.470 0000001 100 0000001100 1011110010 001100000 S: 0.34754500E 05 0010001000010100 0000001100 000 000100 1110110010 110000000 S: 0.34754500E 05 010010001000010 N: 64 R: O

397 **PF( 7): 0000010000 COST: 0.26728027E 03 CPU: 3 1.347 0000010000 0100010100 1011101011 001100001 S: 0.34754500E 05 0010001000010100 0000010000 0001010100 1110101011 110000001 S: 0.34754500E 05 001000 1000010100 N: 96 R: 0:**PF( 8): 0000010010 COST: 0.32868311E 03 CPU: 35.200 0000010010 0100010010 1011101000 001100000 S: 0.19612500E 05 0010001000010100 0000010010 0001010010 1110101000 110000000 S: 0.19612500E 05 0010001000010100 N: 64 R: 0 **PF( 9): 0000010100 COST: 0.32350000E 03 CPU: 39.066 0000010100 0100010100 1011101010 001100000 S: 0.36110500E 05 0010001000010100 0000010100 0001010100 1110101010 110000000 S: 0.36110500E 05 00 1000 1000010100 N: 64 P: 0 **PF(O0): 0000100000 COST: 0.25296194E 03 CPU: 46.690 0000100000 0000110000 1011001111 001000111 S: 0.33398500E 05 0010001000010100 0000100000 0000100100 1011011011 001001001 S: 0.33398500E 05 0010001000010100 0000100000 0000110000 1110001111 110000111 S 0.33398500E 05 0010001000010100 0000100000 0000100100 1110011011 110001001 S: 0.33398500E 05 0010001000010100 N: 128 R: 0 **PF(11): 0000100010 COST: 0.3 1436475E 03 CPU: 52.439 0000100010 0000100010 1011001100 001000100 S: 0.18256500E 05 00100010000 10100 0000100010 0000100010 1011011000 001001000 S: 0.18256500E 05 0010001000010100 0000100010 0000100010 1110001100 110000100 S: 0.18256500E 05 0010001000010100 0000100010 0000100010 1110011000 110001000 S: 0.18256500E 05 0010001000010100 N: 96 RP: 0

398 **PF(12): 0000100100 COS T O.30917554E 03 CPU: 58.208 0000100100 0000100100 1011011010 001001000 S: 0934754500E 05 00 1000100010100 0000100100 0000100100 1110011010 110001000 S: 034754500E 05 0010001000010100 N: 96 R: 0 **PF(13): 0000101000 COST: 0.26881396E 03 CPU: 63.894 0000101000 0000101100 1011010011 001000001 S: 0.37466500E 05 0010001000010100 0000101000 0000101100 1 oo001001 110000001 S: 0,37466500E 05 0010001000010100 N: 96 R?: 0 **PF(14): 0000101010 COST: 0.33021655E 03 CPU: 67*711 0000101010 0000101010 1011010000 001000000 S: 0,22324500E 05 0010001000010100 0000101010 0000101010 1110010000 110000000 S: 0.22324500E 05 00100010000 10 100 N: 64 R: 0 **PF(15): 000010i O00 COST: 0.31980786E 03 CPU: 71.573 0000101100 000 1100 1011010010 001000000 S: 0.38822500E 05 0010001000010100 0000101100 0000101100 1110010010 110000000 S: 0.38822500E 05 0010001000010100 N: 64 R: 0 **PF(16): 0000110000 COST: 0.27864404E 03 CPU: 77.432 0000110000 0000110100 1011001011 001000001 S: 0.37466500E 05 00 1001000010100 0000110000 0000110100 1110001011 110000001 S: 0.37466500E 05 00 1 000 1O0 000100 N: 96 R: 0 **PF(17): 0000110010 COST: 0.34004712E 03 CPU: 81.256 0000110010 0000110010 1011001000 001000000 S: 0.22324500E 05 0010001000010100 00001 10010 00001 10010 1110001000 110000000 S: 0.22324500E 05 0010001000010100 N: 64 R: O

399 **PF(18): 0000110100 COST: 0.33486401E 03 CPU: 85.122 0000110100 0000 110100 1011001010 001000000 S: 0,38822500E 05 0010001000010100 0000110100 0000110100 1110001010 110000000 St 0.38822500E 05 001000 10000 10100 N: 64 R: 0 **PF(19): 0001000000 COST: 0.25642383E 03 CPU: 96.713 0001000000 0001010000 1110101111 110000111 St 0.32494500E 05 00100010000 10100 0001OOOOCOOO 0001000100 1110111011 1100001001 S: 0.32494500E 05 0010001000010100 N: 192 R: 0 **PF(20): 0001000010 COST: 0.31782666E 03 CPU: 105*246 0001000010 0001000010 1110101100 110000100 S: 0.17352500E 05 00 1000 1000010100 0001000010 0001000010 11 10111000 110001000 S: 0.17352500E 05 0010001000010100 N: 144 R: 0 **PF(21): 0001000100 COST: 0.31264380E 03 CPU: 113.689 0001000100 0001000100 1110111010 110001000 S: 0.33850500E 05 0010001000010'100 N: 144 R: 0 **PF(22): 0001001000 COST: 0.27228174E 03 CPU: 122.251 0001001000 0001001100 1110110011 110000001 S: 0.36562500E 05 0010001000010100 N: 144 R: 0 **PF(23): 0001001010 COST: 0.33368457E 03 CPU: 127.802 0001001010 0001001010 1110110000 110000000 S: 0.21420500E 05 0010001000010100 N: 96 R: 0 **PF(24): 0001001100 COST: 0.32327588E 03 CPU: 133.406 0001001100 0001001100 1110110010 11000000 Si 0.37918500E 05 0010001000010100 N: 96 R: O

400 **PF(25): 0001010000 COST: 0,28211206E 03 CPU: 141,994 0001010000 0001010100 1110101011 11000000001 S 036562500E 05 0010001000010100 N: 144 R: 0 **PF(26): 0001010010 COST: 0.34351489E 03 CPU: 147.505 0001010010 0001010010 1110101000 110000000 S s: 0O21420500E 05 0010001000010100 N: 96 R: 0 **PF(27): 0001010100 COST: O 33833203E 03 CPU: 153 197 0001010100 0001010100 1110101010 110000000 S: 0937918500E 05 0010001000010100 N: 96 R: 0 **PF(28): 0010000000 COST: 0,25296194E 03 CPU: 159.030 0010000000 0010010000 1001101111 000100111 S* 0.33398500E 05 0010001000010100 0010000000 0010000100 1001111011 000101001 S: 0.33398500E 05 001000 1000010100 N: 96 R: 0 **PF(29): 0010000010 COST: 0.31436475E 03 CPU: 1635.814 0010000010 0010000010 1001101100 000100100 S: 0.18256500E 05 0010001000010100 0010000010 0010000010 1001111000 000101000 S: 0,18256500E 05 0010001000010100 N: 80 R: 0 **PF(30): 0010000100 COST: 0.30398584E 03 CPU: 168.564 0010000100 0011000100 1100111010 100001000 S: 0.33398500E 05 0010001000010 100 N: 80 R: 0 **PF(31): 0010001000 COST: 026358789E 03 CPU: 174.214 0010 00100 00 001001100 1100110011 100000001 St 0.34754500E 05 0! 010010 0 0 100 N: 96 R: 0

401 **PF(32)* 0010001010 COS T: 0,32499072E 03 CPU: 177.9 58 0010001010 0010001010 1100110000 100000000 S: 0.19612500E 05 0010001000010100 N: 64 R: 0 **PF(33 ): 00 10001 100 COST: 0.31458154E 03 CPUI 181.806 0010001100 0010001100 1100110010 100000000 St 0.36110500E 05 0010001000010100 N: 64 RS 0 **PF(34): 0010010000 COST: 0,27345410E 03 CPU: 187.486 0010010000 0011010100 1100101011 100000001 S: 0.36110500E 05 0010001000010100 N: 96 R: 0 **PF(35): 0010010010 COST: 0.33485693E 03 CPU: 191.280 0010010010 0011010010 1100101000 100000000 S: 0.20968500E 05 0010001000010100 N: 64 R: 0 **PF(36): 0010010100 COST: 0.32967383E 03 CPU: 195.043 0010010100 0011010100 1100101010 100000000 S: 0.37466500E 05 0010001000010100 N: 64 R: 0 **PF(37): 0010100000 COST: 0.25913574E 03 CPU: 200.745 0010100000 0010110000 1100001111 100000111 S: 0.34754500E 05 0010001000010100 0010100000 0010100100 1100011011 100001001 S 0.34754500E 05 0010001000010100 N:s 96 R: 0 **PF(38): 0010100010 COST:.32053857E 03 CPUIJ 204.995 0010100010 0010100010 1100001100 100000100 St 0.19612500E 05 0010001000010100 0010100010 0010100010 1100011000 100001000 S: 0.19612500E 05 0010001000010 100 N: 72 R: 0

402 ** PF(3 9): 0010100100 COST: 0,3 1534961E 03 CPU: 209.299 0010100100 00100 00100 1100011010 100001000 St 0.36110500E 05 0010OO1000010100 N: 72 R: 0 **PF(40): 0010101000 COST: 0,27498779E 03 CPU: 213,573 0010101000 001010l 10 1100010011 10OO00001 S: 0.38822500E 05 001000100 0010100 N: 72 R: 0 **PF(41):' 0010101010 COST: 0.33639062E 03 CPU: 216.410 0010101010 0010101010 1100010000 100000000 S: 0.23680500E 05 0010001000001100 N: 48 R: 0 **P F (42): 00 1 01 01100 COST: 0*32598169E 03 CPU: 219.277 00101011 00 00010 1100010010 100000000 S: 0.40178500E 05 0010001000001060 N: 48 R: 0 **PF(43): 001011O0000 COST: 0.28481787E 03 CPU: 223.520 0010110000 0010110100 1100001011 100000001 S: 0.38822500E 05 0010001000010100 N: 72 R: 0 **PF(44): 0010110010 COST: 0.34622095E 03 CPU: 226.373 0010110010 0010110010 1100001000 100000000 S: 0.23680500E 05 0010101000010100 N: 48 R: 0 **PF(45): 0010110100 COST: 0.3$4103809E 03 CPU: 229.201 0010110100 0010110100 1100001010 100000000 S: 0.40178500E 05 0010001000010100 N: 48 R O0

403 **PF(46): 00 11000000 COST: 0,.26059961E 03 CPU: 236,847 0011000000 0011010000 1100101111 100000111 St 0.33850500E 05 0010001000010100 0011000000 0011000100 1100111011 100001001 S: 0.33850500E 05 0010001000010100 N: 128 R: 0 **PF(47): 0011000010 COST: 0.32200244E 03 CPU: 242.58 6 0011000010 0011000010 1100101100 100000100 S: 0,18708500E 05 0010001000010100 0011000010 0011000010 1100111000 100001000 So 0.18708500E 05 0010001000010 100 N: 96 R: 0 **PF(48) 0011000100 COST: 0.31681982E 03 CPU: 248.252 0011000100 0011000100 1100111010 100001000 S: 0.35206500E 05 0010001000010100 N: 96 R: 0 **PF(49): 0011001000 COS T: 0.27645776E 03 CPU: 253,.958 0011001000 0011001100 1100110011 100000001 S: 0.37918500E 05 0010001000010 100 N: 96 R: 0 **PF(50): 0011001010 COST: 0.33786060E 03 CPU: 257.620 0011001010 0011001010 1100110000 100000000 S: 0.22776500E 05 0010001000010100 N: 64 R: 0 **PF(51): 0011001100 COST: 0.32745166E 03 CPUI: 261.398 0011001100 0011001100 1100110010 100000000 S: 0.39274500E 05 0010001000010 100 N: 64 R: 0 **PF(52): 0011010000 COST: 0.28628809E 03 CPU: 267.1 16 0011010000 0011010100 1100101011 100000001 S: 0.37918500E 05 0010001000010100 N: 96 R: O

404 **PF(53): 001 1010010 COST: 0.34769116E 03 CPU: 270.921 0011010010 0011010010 I100101000 100000000 S: 0o22776500E 05 001000100000010100 N: 64 R: 0 **PF(54): 0011010100 COST: 0.34250806E 03 CPU: 274.643 0011010100 0011010100 1100101010 100000000 S: 0.39274500E 05 0010001000010100 N: 64 R: 0 **PF(55): 0100000000 COST: 0.25642383E 03 CPU: 284.501 0100000000 0100010000 1011101111 001100111 S: 0.32494500E 05 0010001000010100 0100000000 0100000100 10 1 1111011 001101001 S: 0,32494500E 05 0010001000010100 N: 160 R: 0 **P F ( 5 6): 0100000010 COST: 0.31782666E 03 CPU: 292.119 0100000010 0100000010 1011101100 001100100 S: 0.17352500E 05 0010001000010100 0100000010 0100000010 1011111000 001101000 S: 0o17352500E 05 0010001000010100 N: 128 R: 0 ~**PF(.57): 0100000100 COST: 0.30710547E 03 CPU, 299,813 0100000100 0101000100 1010111010 000001000 S: 0.33398500E 05 0010001000010100 N: 128 Rt 0 **PF(58): 0100001000 COST: 0.26670776E 03 CPU: 308.275 0100001000 0100001 100 1010110011 000000001 S: 0.34754500E 05 0010001000010100 N: 144 R: 0 **PF(59): 0100001010 COST: 0.3281 1060E 03 CPU: 313.845 0100001010 0100001010 1010110000 000000000 S: 0.19612500E 05 0010001000010100 N: 96 R: O

405 **PF(60): 0100001100 COST: 0.3 1770166E 03 CPU: 319.521 0100001100 0100001100 1010110010 000000000 S: 0.36110500E 05 00100010000 10100 N: 96 R: 0 **PF(61): 0100010000 COST: 0.27657422E 03 CPU: 327.953 0100010000 0101010100 1010101011 000000001 S: 0.36110500E 05 0010001000010 100 N: 144 R O0 **PF(62): 0100010010 COS T: 0.33797705E 03 CPU: 333.529 0100010010 0101010010 1010101000 000000000 S: 0.20968500E 05 0010001000010100 N: 96 R; 0 **PF(63): 0100010100 COST: 0,33279395E 03 CPU: 339o.118 0100010100 0101010100 1010101010 000000000 S: 0.37466500E 05 00100010000 10 100 N: 96 R: 0 **PF(64): 0100100000 COS T 0.26225562E 03 CPU: 346.644 0100100000 0100110000 1010001111 000000111 S: 0.34754500E 05 0010001000010100 0100100000 0100100100 1010011011 000001001 S: 0.34754500E 05 0010001000010100 N: 12 RP: 0 **PF(65): 0100100010 COST: 0.32365845E 03 CPU: 352.244 0100l0001OOlO 0100100010 1010001100 000000100 St 0.19612500E 05 0010001000010100 0100100010 0100100010 1010011000 000001000 S: 0.19612500E 05 0010001000010100 N: 96 R: 0 **PF(66): 0100100100 COST: 0.3 1846973E 03 CPU: 357.780 0100100100 0100100100 1010011010 000001000 S: 0.36110500E 05 0010001000010100 N: 96 R: 0

406 **PF(67): 0100101000 COST: 0.27810791 E 03 CPU 363.331 0100101000 0100101100 1010010011 000000001 S: 0,38822500E 05 10000000000100 N: 96 R: 0 **PF(68): 0100101010 COST: 0.33951074E 03 CPU: 367.008 0100101010 0100101010 1010010000 000000000 S: 0o23680500E 05 0010000000'10100 N: 64 R: 0 **PF(69): 0100101 100 COST: 032910132E 03 CPU: 370.736 0100101100 0100101100 1010010010 000000000 S: 0.40178500E 05 0010001000010100 N: 64 R: 0 **PF(70): 01001100000 COST: 0.28793799E 03 CPU: 376.424 0100110000 0100110100 1010001011 000000001 S: 0.38822500E 05 0010001000010100 N: 96 R: 0 **PF(71): 0100110010 COST: 0.34934131E 03 CPU: 380 186 0100110010 0100110010 1010001000 000000000 S: 0.23680500E 05 0010001000010100 N: 64 R: 0 **PF(72 ): 01001 10100 COSTt 0.34415796E 03 CPU: 383.873 0100110100 0100110100 1010001010 000000000 5: 0.40178500E 05 0010001000010 100 N: 64 R: 0 **PF(73): 0101000000 COST: 0.26571729E 03 CPU: 391.424 0101000000 0101010000 1010101111 000000111 S: 0.33850500E 05 0010001000010100 0101000000 0101000100 1010111011 000001001 S: 0,33850500E 05 NO 1000 2 0000100 0 N: 128 R: O

407 **PF(74): 0101000010 COST: 0,32712061E 03 CPU: 397.076 0101000010 0101000010 1010101100 000000100 S: 0.18708500E 05 0010001000010100 0101000010 0101000010 1010111000 000001000 S: 0.18708500E 05 0010001000010100 N: 96 R: 0 **P F(75): 0101000100 COST: 0.32193750E 03 CPU: 402.683 0101000100 0101000100 1010111010 000001000 S: 0.35206500E 05 0010001000010100 N: 96 R: 0 **PF(76): 0101001000 COST: 0.28157593E 03 CPU: 408.477 0101001000 0101001100 1010110011 000000001 S: 0.37918500E 05 0010001000010100 N: 96 R: 0 **PF(77): 0101001010 COST: 0.34297852E 03 CPU: 412.182 0101001010 0101001010 1010110000 000000000 S: 0.22776500E 05 0010001000010100 N: 64 R: 0 **PF(78): 0101001100 COST: 0.33256958E 03 CPU: 415.920 0101001100 0101001100 1010110010 000000000 S 0.39274500E 05 0010001000010100 N: 64 R: 0 **PF(79): 0101010000 COST: 0,29140576E 03 CPU: 421.525 0101010000 0101010100 1010101011 000000001 S 0.37918500E 05 0010001000010100 N: 96 R: 0 **PF(80): 0101010010 COST: 0.35280908E 03 CPU: 425.174 0101010010 0101010010 1010101000 000000000 S: 0.22776500E 05 0010001000010 100 N: 64 R: O

408 **PF(81): 0101010100 COST: 0,34762622E 03 CPU: 428.804 0101010100 0101010100 i01OO11OlO 000000000 St O039274500E 05 001 0001000010100 N: 64 R: 0 PF(28) CONTAINS FEST SOLUTION(S). 7232 SOLUTIONS CONSIDERED, MEAN COST OVER ALL SOLUTIONS: 035064868E 03 MAXIMIJM COST ENCOUNTERED: 0.57534351E 03 ********** SOLUTION T mMRt NATED *i********

Appendix F SOLUTION LISTING FOR CASE 2 OF THE MEDICAL DIAGNOSIS PROBLEM ********** SOLUTION BEGI NS ********** **PF( 1): 0000000000 COST: 0.19451155E 03 CPU: 4.286 0000000000 0001000000 1110101111 110000111 S: 0.18482500E 05 0010001000010100 N: 64 R: 0 **PF( 2): 0000000010 COST: 0.23521910E 03 CPU: 8.485 0000000010 0001000010 1110101101 110000100 S: 0.19838500E 05 0010001000010100 N: 64 R: 0 **PF( 3): 0000000100 COST: 0.22721243E 03 CPU: 12.602 0000000100 0100000110 1011111001 001101000 S: 0.26618500E 05 00100020000 10100 0000000100 0001000110 1110111001 110001000 S: 0.26618500E 05 00 100020000 10 100 N: 64 R: 0 **PF( 4): 0000001000 COST: 0.19406752E 03 CPU: 18.755 0000001000 0000001000 1011110011 001100001 S: 0.22098500E 05 00100010000100100 0000001000 0000001000 1110110011 110000001 S:O 0.22098500E 05 0010001000010100 Nt 96 R: 0 409

410 **PF( 5): 0000001010 COST: 0.23477510E 03 CPUr 22.885 0000001010 0000001010 011 10001 001100000 S: 0.23454500E 05 00100010000 10100 0000001010 0000001010 1 110110001 110000000 S: 0,23454500E 05 0010001000010100 N: 64 R: 0 **PF( 6): 0000001100 COST: 0.24145790E 03 CPU 27 060 0000001100 0000001100 1011110011 001100001 S: 0.34754500E 05 0010001000010100 0000001100 0000001100 1110110011 110000001 S: 0.34754500E 05 00100010000 10100 N: 64 R: 0 **PF( 7): 0000010000 COST: 0.20734515E 03 CPU: 33.207 0000010000 0100010000 1011101011 001100001 S: 0,23906500E 05 0000010000 0001010000 1110101011 110000001 S: 0.23906500E 05 00 1000 1000 01 00 N: 96 R: 0 **PF( 8): 0000010010 COST: 0.24805269E 03 CPU: 3 7.311 0000010010 0100010010 1011101001 001100000 S: 0.25262500E 05 0010001000010100 0000010010 0001010010 1110101001 110000000 S: 0.25262500E 05 0010001000010100 N: 64 R: 0 **PF( 9): 0000010100 COST: 0O26157520E 03 CPU: 41 416 0000010100 0100010100 1011101011 001100001 S: 0.36562500E 05 0010001000010100 0000010100 000101010100 1110101011 110000001 S: 0,36562500E 05 00 1000 1000010100 N: 64 R: 0

**PF(10): 0000.100000 COST: 0.18838756E 03 CPUt 49,613 0000100000 0000100000 1011001111 001000111 St 0.22098500E 05 0010001000010100 00001000 0000100000 1011011011 001001001 S: 0.22098500E 05 0010001000010100 0000100000 0000100000 1110001111 110000111 S: 0.22098500E 05 001000 1000010 100 0000100000 0000100000 1110011011 110001001 S: 0.22098500E 05 0010001000010100 N: 128 R: 0 **PF((11): 0000100010 COST: 0.22909509E 03 CPU: 55.744 0000100010 0000100010 1011001101 001000100 S: 0.23454500E 05 0010001000010100 0000100010 0000100010 1011011001 001001000 St 0.23454500E 05 0010001000010100 0000100010 0000100010 1110001101 110000100 S: 0.23454500E 05 0010001000010100 0000100010 0000100010 1110011001 110001000 S: 0.23454500E 05 0010001000010100 N: 96 R: 0 **PF(12): 0000100100 COST: 0.24260991E 03 CPU: 61.852 0000100100 0000100100 1011011011 001001001 S: 0.34754500E 05 0010001000010100 0000100100 0000100100 1110011011 110001001 S: 0.34754500E 05 0010001000010100 N: 96 R: 0 **PF(13): 0000101000 COST: 0.20901950E 03 CPU: 67.914 0000101000 0000101000 1011010011 001000001 S: 0.27522500E 05 0010001000010100 0000101000 0000101000 I 1100100 1 110000001 S 0.27522500E 05 0010001000010100 N: 96 R: 0 **PF(14): 0000101010 COST: 0*24972711E 03 CPU: 71.952 0000101010 0000101010 1011010001 001000000 S: 0.28878500E 05 0010001000010100 0000101010 0000101010 1110010001 110000000 St 0.28878500E 05 0010001000010100 N: 64 R: O

412 **PF(15): 0000101100 COST: 0.25640967E 03 CPU: 76.022 0000101100 0000101100 1011010011 001000001 S: 0O40178500E 05 0010001000010100 0000101100 0000101100 1110010011 110000001 S:* 0,40178500F 05 0010001000010100 N: 64 P: 0 **PF(16): 0000110000 COST: 0.22228516E 03 CPU: 82.168 0000110000 0000110000 1011001011 001000001 S: 0.27522500E 05 0010001000010100 0000110000 0000110000 1110001011 110000001 S: 0.27522500E 05 0010001000010100 N: 96 R: 0 **PF(17): 0000110010 COST: 0.26299243E 03 CPU: 86.23 5 0000110010 0000110010 1011001001 001000000 S: 0.28878500E 05 001000100010100 0000110010 0000110010 1110001001 110000000 S: 0.28878500E 05 0010001000010100 N: 64 R: 0 **PF(18): 00001C0100 COST: 0*27651514E 03 CPU: 90.279 0000110100 0000110100 1011001011 001000001 S: 0.40178500E 05 0010001000010100 0000110100 0000110100 1110001011 110000001 S: 0.40178500E 05 0010001000010100 N: 64 R: 0 **PF(19): 0001000000 COST: 0.19188356E 03 CPU: 102.476 0001000000 0001000000 1110101111 110000111 S: 0.20742500E 05 001 0001000010100 0001000000 0001000000 1110111011 110001001 S: 0.20742500E 05 0010001000010100 N: 192 Rs 0 **PF(20): 0001000010 COST: 0.23259109E 03 CPU: 111.484 0001000010 0001000010 1110101101 110000100 S: 0.22098500E 05 001000 1000010100 0001000010 0001000010 1110111001 110001000 S: 0.22098500E 05 01OOQ000100010 0 N: 144 R: O

413 **PF(21): 0001000100 COST: 0.22953642E 03 CPU: 120.407 0001000100 0001000110 1110111001 110001000 S: 0.28878500E 05 0010002000010100 N: 144 R: O0 **PF(22): 0001001000 COST: 0.21252350E 03 CPU: 129.336 0001001000 0001001000 1110110011 110000001 S: 0.26166500E 05 00100010000 10 100 N: 144 R: 0 **PF(23): 0001001010 COST: 0.25323109E 03 CPU: 135.231 0001001010 0001001010 1110110001 110000000 S: 0.27522500E 05 00100010000010100 N: 96 R: 0 **PF(24): 0001001100 COST: 0.25991357E 03 CPU: 141.192 0001001100 0001001100 1110110011 110000001 S: 0.38822500E 05 0010001000010100 N: 96 RI: 0 **PF(25): 0001010000 COST: 0.22578911E 03 CPU: 150.184 0001010000 0001010000 1110101011 110000001 S: 0.26166500E 05 0010001000010100 N: 144 R: 0 **PF(26): 0001010010 COST: 0.26649634E 03 CPU: 156.101 0001010010 0001010010 1110101001 110000000 S: 0.27522500E 05 0010001000010100 N: 96 R: 0 **PF(27): 0001010100 COST: 0.27023193E 03 CPU: 161.993 0001010100 0001010110 1110101001 110000000 S: 0.34302500E 05 0010002000010100 N: 96 R: 0

414 **PF(28): 0010000000 COST: 0.18838756E 03 CPU: 168.150 0010000000 0010000000 1001101111 000100111 S: 0,22098500E 05 0010001000010100 0010000000 00100000000 100111101 00 1 101001 S: 0.22098 500E 05 0010001000010100 N: 96 R: 0 **PF(29): 0010000010 COST: 0.22909509E 03 CPU: 173.214 0010000010 0010000010 1001101101 000100100 S: 0.23454500E 05 O0 10001000010100 0010000010 0010000010 1001111001 000101000 S: 0.23454500E 05 0010001000010100 N: 80 R: 0 **PF(30): 0010000100 COST: 0.23378389E 03 CPU: 178.211 0010000100 0011000100 1100111011 10000100 01 S: 032494500E 05 0010001000010100 N: 80 R: 0 **PF(3 1 ): 0010001000 COST: 0.20018150E 03 CPU: 184.190 0010001000 0010001000 1100110011 100000001 S: 0.23454500E 05 0010001 000010100 N: 96 R: 0 **PF(32 ) 001000 1 010 COST: 0.2408891 1E 03 CPU: 188.149 0010001010 0010001010 1100110001 100000000 S: 0.24810500E 05 00100010000 10100 N: 64 R: 0 **PF(33 ): 0 00 0001100 COST: 0.24757190E 03 CPU: 1921 59 0010001100 0010001100 1100110011 100000001 S: 0.36110500E 05 0010001000010100 N: 64 R: 0 **PF(34): 0010010000 COST: 0.21345917E 03 CPU: 198.159 0010010000 0011010000 1O10010l011 100000001 S: 0.25262500E 05 0010001000010100 N: 96 R: O

415 **PF(35): 0010010010 COST: 0.25416669E 03 CPU: 202*115 0010010010 0011010010 1100101001 100000000 S: 0.26618500E 05 0010001000010100 N: 64 R: 0 **PF(36): 0010010100 COST: 0.26768921E 03 CPU: 206.067 0010010100 0011010100 1100101011 100000001 S: 0.37918500E 05 0010001000010100 N: 64 R: 0 **PF(37): 0010100000 COST: 0,19450156E 03 CPU: 212.098 0010100000 0010100000 1100001111 100000111 S: 0.23454500E 05 0010001000010100 0010100000 0010100000 1100011011 100001001 S: 0.23454500E 05 0010001000010100 N: 96 R: 0 **PF(38): 0010100010 COST: 0.23520912E 03 CPU: 216.600 0010100010 0010100010 1100001101 100000100 S: 0.24810500E 05 0010001000010100 0010100010 0010100010 1100011001 100001000 S: 0.24810500E 05 0010001000010100 N: 72 R; 0 **PF(39): 0010100100 COST: 0.24872389E 03 CPU: 221.079 0010100100 0010100100 1100011011 100001001 S: 0.36110500E 05 0010001000010100 N: 72 R: 0 **PF(40): 0010101000 COST: 0,21513350E 03 CPU: 225.537 0010101000 0010101000 1100010011 100000001 S: 0.28878500E 05 0010001000010100 N: 72 R: 0 **PF(41): 0010101010 COST: 0.2.5584109E 03 CPU: 228.502 0010101010 0010101010 1100010001 100000000 S: 0,30234500E 05 0010001 000010100 N: 48 R: O

416 **PF(42) 001 01 01100 COST: 0.26252344E 03 CPU: 23 11499 0010101100 0010101100 1100010011 100000001 S: 0,41534500E 05 0010001000010100 N: 48 R: 0 **PF(43): 00101 10000 COS T: 0.22839911E 03 CPU: 236.030 0010110000 0010110000 1100001011 100000001 S: 0.28878500E 05 00100001 000010100 N: 72 R: 0 **PF(44): 0010110010 COST: 0.26910645E 03 CPU: 239.021 0010110010 0010110010 1100001001 100000000 S: 0.30234500E 05 0010001000010100 N:,8 R: 0 **PF(45): 0010110100 COST: 0.28262915E 03 CPU: 242.006 0010110100 0010110100 000lOiO ill 100000001 S: 0*41534500E 05 0010001000010100 N: 48 R: 0 **PF(46): 001 1000000 COST: 0.19599956E 03 CPU: 250.154 0011000000 0011000000 1100101111 100000111 S: 0o22098500E 05 00100010000 10 100 0011000000 0011000000 1100111011 100001001 S: 0.22098500E 05 0010001000010100 N: 128 R: 0 **PF(47): 0011000010 COST: 0.23670711E 03 CPU: 256.192 0011000010 0011000010 1100101101 10000000100 S: 0.23454500E 05 0010001000010100 1000010 00 11000010 1100111001 100001000 S: 0,.23454500E 05 0010001000010100 N: 96 R: 0 **PF(48): 0011000100 COST: 0.25022989E 03 CPU: 262.159 0011000100 0011000100 1100111011 100001001 St 0.34754500E 05 00100010000'10 100 N: 96 R: O

417 **PF(49): 0011001000 COST: 0.21663951E 03 CPU: 268.124 0011001000 0011001000 1100110011 100000001 S: 0.27522500E 05 0010001000010100 N: 96 R: 0 **PF(50): 0011001010 COST: 0.2573466FE 03 CPU: 272.073 0011001010 0011001010 1100110001 100000000 S: 0.28878500E 05 0010001000010100 N: 64 R: 0 **PF(51): 0011001100 COST: 0.26402954E 03 CPU: 276.090 0011001100 0011001100 1100110011 100000001 S: 0.40178500E 05 0010001000010100 N: 64 R: 0 **PF(52): 0011010000 COST: 0.22990511E 03 CPU: 282.135 0011010000 0011010000 1100101011 100000001 S: 0.27522500E 05 00100010000 10100 N: 96 R: 0 **PF(53): 0011010010 COST: 0.27061230E 03 CPU: 286.106 0011010010 0011010010 1100101001 100000000 S: 0.28878500E 05 0010001000010100 N,: 64 P: 0 **PF(54): 0011010100 COST: 0.28413525E 03 CPU: 290.089 0011010100 0011010100 1100101011 100000001 S: 0.40178500E 05 0010001 0000010100 N: 64 R: 0 **PF(C55): 0100000000 COST: 0.19188356E 03 CPU: 300.349 0100000000 0100000000 1011101111 001100111 S: 0.20742500E 05 001 0001000010 100 0100000000 0100000000 1011111011 001101001 S: 0.20742500E 05 0010001000010100 N: 160 R: 0

418 **PF(56): 0100000010 COST: 0.23259109E 03 CPU: 308.444 0100000010 0100000010 1011101101 001100100 S: 0O22098500E 05 0010001000010100 0100000010 0100000010 1011111001 001101000 S: 0220985500E 05 0100010000100 100 N: 128 R: 0 **PF(57): 0100000100 COST: 0.22953642E 03 CPU: 316.396 0100000100 0100000110 1011111001 001101000 S: 0.28878500E 05 00 100020000 10 100 N: 128 R: 0 **PF(58): 0100001000 COST: 0.20333751E 03 CPU: 325.395 0100001000 0100001000 1010110011 000000001 S: 0.23454500E 05 0010001000010100 N 1i44 R: 0 **PF(59): 0100001010 COST: 0.24404510E 03 CPU: 331,347 0100001010 0100001010 1010110001 000000000 S: 0.24810500E 05 00100010000 10 100 N: 96 R: O **PF(60): 0100001100 COST: 0.25072789E 03 CPU: 337.344 0100001100 0100001100 1010110011 000000001 S: 0.36110500E 05 0010001000010100 N: 96 R: 0 **PF(61): 0100010000 COST: 0.2 1661516E 03 CPU: 3461 52 0100010000 0101010000 1010101011 000000001 S: 0.25262500E 05 0010001000010100 N: 144 R: 0 **PF(62): 0100010010 COST: 0.25732227E 03 CPU: 351.952 0100010010 0101010010 1010101001 000000000 S: 0.26618500E 05 0010001000010100 N~ 96 R: O

419 **PF(63): 0100010100 COST: 0.27023193E 03 CPU: 357.899 0100010100 0100010110 1011101001 001100000 S: 0.34302500E 05 001 0002000010100 N: 96 FR 0 **PF(64): 0100100000 COST: 0.19765756E 03 CPU: 366.142 0100100000 0100100000 1010001111 000000111 S: 0.23454500E 05 0010001000010 100 C100100000 0100100000 1010011011 000001001 S: 0.23454500E 05 00 1000 10000101 00 N: 128 R: 0 **PF(65): 0100100010 COST: 0,25836511E 03 CPU: 372.274 0100100010 0100100010 1010001101 000000100 S: 0.24810500E 05 0010001000010100 0100100010 0100100010 1010011001 000001000 S: 0.24810500E 05 00100010000 10 100 N: 96 R: 0 **PF(66): 0100100100 COST: 0.2 5187988E 03 CPU: 378.349 0100100100 0100100100 1010011011 000001001 S: 0.36110500E 05 0010001000010100 N: 96 R: 0 **>kPF(67): 0 100101000 COST: 0.21828951E 03 CPU: 384.432 0100101000 0100101000 1010010011 000000001 S: 0.28878500E 05 0010001000010100 N: 96 R: 0 **PF(68): 0100101010 COST: 0.25899683E 03 CPU: 388.453 0100101010 0100101010 1010010001 000000000 S: 0.30234500E 05 00 100010000 10 100 N: 64 R: 0 **PF(69): 0100101100 COST: 0.26567969E 03 CPU: 392.515 0100101100 010001100 1010010011 000000001 S: 0.41534500E 05 O0100000010100 N: 64 R: O

420 **PF(70): 01001 10000 COST: 0o23155510E 03 CPU: 398.665 0100110000 0100110000 1010001011 000000001 S: 0,28878500E 05 0010001000010100 N: 96 R: 0 **PF( 71): 0100110010 COST: 0.27226245F 03 CPU: 402.717 01l001 10010 01001 1010 1010001001 000000000 S: 0.30234500E 05 0010001000010100 N: 64 R: 0 **PF(72): 0100110100 COST: 0.28578516E 03 CPU: 406.750 0100110100 0100110100 10l00010101 000000001 S: 0.41534500E 05 0010001000010100 N: 64 R: 0 **PF(73): 0101000000 COST: 0.20115356E 03 CPU: 414.999 01010000000 0110000000 1010101111 000000111 S: 0.22098500E 05 00100010000 10100 0101000000 0101000000 1010111011 000001001 S: 0.22098500E 05 0010001000010100 N: 128 R: 0 **PF(74): 0101000010 COST: 0.24186110E 03 CPU: 421.120 0101000010 0101000010 1010101101 000000100 S: 0.23454500E 05 0010001000010100 0101000010 0101000010 1010111001 000001000 S: 0,23454500E 05 0010001000010100 N: 96 R: 0 **PF(75): 0101000100 COST: 0.25538390E 03 CPU: 427.200 0101000100 0101000100 1010111011 000001001 S:. 0.34754500E 05 0010001000010100 N: 96 R: 0 **PF(76): 0101001000 COST: 0.22 1 79350E 03 CPU: 433.278 0101001000 0101001000 1010110011 000000001 S: 0.27522500E 05 0010001000010100 N: 96 P: O

421 **PF(77): 0101001010 COST: 0.26250049E 03 CP I: 437.294 0101001010 0101001010 1010110001 000000000 S: 0.28878500E 05 00100010000 10 100 N: 64 P: 0 **PF(78): 0101001 100 COST: 0.269183 59E 03 CPU: 441.349 0101001100 0101001100 1010110011 000000001 S: 0.40178500E 05 0010001000010100 N: 64 R: 0 **PF(79): 0101010000 COST: 0.23505910E 03 CPU: 447.451 0101010000 0101010000 1010101011 000000001 S: 0.27522500E 05 0010001000010100 N: 96 R: 0 **PF(80): 0101010010 COST: 0.2757663 6E 03 CPU: 451.474 0101010010 0101010010 1010101001 00OOOO0000 S: 0.28878500E 05 0010001000010100 N: 64 R: 0 **PF(FI): 0101010100 COST: 0.28928882E 03 CPU: 455.490 0101010100 0101010100 1010101011 000000001 S: 0.40178500E 05 0010001000010100 N: 64 R: 0 PF(28) CONTAINS BEST SOLUTION(S). 7232 SOLUTIONS CONSIDERED. MEAN COST OVER ALL SOLUTIONS: 0*28151123E 03 MAXIMUM COST ENCOUNTERED: 0.42425903E 03 ********** SOLUTION TERMINATED **********

Appendix G SOLUTION LISTING FOR CASE 3 OF THE MEDICAL DIAGNOSIS PROBLEM ********** SOLUTI ON BEGINS ********** **PF( 1): 0000000000 COST: 0.27245435E 03 CPU: 4 *065 0000000000 0001010000 1110101111 110000111 S: 0.30686500E 05 0010001 000010100 N: 64 R: 0 **PF( 2): 0000000010 COST: 0.31508911E 03 CPU: 8.025 0000000010 0001000010 1110101100 110000100 S: 0.15544500E 05 001 0001 000010100 N: 64 R: 0 **PF( 3): 0000000100 COST: 0.29853931E 03 CPU: 11.904 0000000100 0100000100 l011111010 001101000 S: 0.32042500E 05 0010001000010100 0000000100 0001000100 1110111010 110001000 S: 0.32042500E 05 0010001000010100 N: 64 R: 0 **PF( 4): 0000001000 COST: 0.26675464E 03 CPU: 17.696 0000001000 0000001100 1011110011 0011 00001 S: 0.33398500E 05 0010001000010100 0000001000 000001100 1110110011 110000001 S: 0.33398500F 05 0010001000010100 N: 96 R: 0 422

423 **PF( 5): 0000001010 COST: 0.32318921E 03 CPU: 21.550 0000001010 0000001010 1011110000 001100000 S: 0. 18256500E 05 0010001 000010100 0000001010 0000001010 1110110000 110000000 S: 0.18256500E 05 0010001000010100 N: 64 R: 0 **PF( 6): 0000001100 COST: 0.31553931E 03 CPU: 25.466 0000001100 0000001100 1011110010 001100000 S: 0.34754500E 05 00 1000100001 0100 0000001100 0000001100 1110110010 110000000 S: 0.34754500E 05 0010001000010100 N: 64 R: 0 **PF( 7): 0000010000 COST: 0.27250854E 03 CPLI: 31.468 0000010000 0100010100 1011101011 001100001 S: 0.34754500E 05 001COO1000010100 0000010000 0001010100 1110101011 110000001 S: 0.34754500E 05 0010001000010100 N: 96 R: 0 **PF( 8): 0000010010 COST: 0.32894312E 03 CPU: 35.421 0000010010 0100010010 1011101000 001100000 S: 0.19612500E 05 0010001000010100 0000010010 0001010010 1110101000 110000000 S: 0.19612500E 05 0010001000010100 N: 64 R: 0 **PF( 9): 0000010100 COST: 0.32619287E 03 CPU: 39.374 0000010100 0100010100 1011101010 001100000 S: 0.36110500E 05 0010001000010100 0000010100 0001010100 1110101010 110000000 S: 0.36110500E 05 0010001000010100 N: 64 R: 0

424 **PF(10): 0000100000 COST: 0.26295435E 03 CPU: 47.330 0000100000 0000110000 1011001111 001000111 S: 0.33398500E 05 0010001000010100 0000100000 0000100100 1011011011 001001001 S: 0.33398500E 05 00100010000 10100 0000100000 0000110000 I 110001111 110000111 S: 0*33398500E 05 00 1000100001 0100 0000100000 O000100100 1110011011 110001001 S: 0o33398500E 05 0010000010000100 N: 128 R: 0 **PF( 1 1): 0000100010 COST: 0.3 1938916E 03 CPU: 53.276 0000100010 0000100010 1011001100 001000100 S: 0.18256500E 05 0010001000010100 0000100010 0000100010 I011011000 001001000 S: 0.18256500E 05 00 10001000010100 0000100010 0000100010 1110001100 110000100 S: 0.18256500F 05 001000100010100 0000100010 0000100010 1110011000 110001000 S: 0.18256500E 05 0010001000010100 N: 96 R: 0 **PF(12): 0000100100 COST: 0.3 1 633936E 03 CPU: 59.187 0000100100 0000100100 101101101010 001001000 S: 0.34754500E 05 0010001000010100 0000100100 0000100100 1110011010 110001000 S: 0.34754500E 05 00100010000 10100 N: 96 R: 0 **PF( 13 ): 0000101000 COST: 0.28635449E 03 CPU: 65.039 0000101000 0000101100 1011010011 001000001 S: 0.37466500E 05 0010001000010 100 0000101000 0000101100 1110010011 110000001 S: 0.37466500E 05 0010001000010100 N: 96 R: 0 **PF(14): 0000101010 COST: 0.34278906E 03 CPU: 68.942 0000101010 0000101010 1011010000 001000000 S: 0.22324500E 05 00 100010000101 00 000010101 0 0000101010 110100 110000000 S: 0O22324500E 05 000001000010100O N: 64 R: O

425 **PF(15): 0000101100 COST: 0,33513940E 03 CPU: 72*880 0000101100 0000101100 1011010010 001000000 S: 0.38822500F 05 0010001000010100 0000101100 0000101100 1110010010 110000000 S: 0.38822500E 05 00 10001000010100 N: 64 R: 0 **PF(16): 0000110000 COST: 0.29030835E 03 CPU: 78.796 0000110000 0000110100 1011001011 001000001 S: 0.37466500E 05 0010001000010100 0000110000 0000110100 1110001011 110000001 S: 0.37466500E 05 0010001000010100 N: 96 R: 0 **PF(17): 0000110010 COST: 0.34674292E 03 CPU: 82,705 0000110010 0000110010 1011001000 001000000 S: 0.22324500E 05 0010001000010100 0000110010 0000110010 1110001000 110000000 S: 0.22324500E 05 0010001000010100 N: 64 R: 0 **PF(18): 0000110100 COST: 0,34399292E 03 CPU: 86.640 00001 10100 0000110100 1011001010 001000000 S: 0.38822500E 05 00 10001000010 100 0000110100 0000110100 1110001010 110000000 S: 0.38822500E 05 0010001000010100 N: 64 R: O**PF(19): 0001000000 COST: 0.25885449E 03 CPU: 98.333 0001000000 0001010000 1110101111 110000111 S: 0.32494500E 05 0010001000010 100 0001000000 0001000100 1110111011 110001001 S: 0,32494500E 05 0010001000010100 N: 192 R: O' **PF(20): 0001000010 COST: 0.31528906E 03 CPU: 106,930 0001000010 0001000010 1110101100 110000100 S: 0.17352500E 05 0010001000010100 0001000010 0001000010 1110111000 110001000 s: 0.17352500E 05 0010001000010100 N: 144 FP: O

426 **PF(21): 0001000100 COST: 0.3 1253931E 03 CPU: 115.413 OOOOO0001000100 0001000100 1110111010 110001000 S: 0*33850500E 05 0010001000010100 N: 144 R O0 **PF(22): 0001001000 COST: 0.28255469E 03 CPU: 123.930 0001001000 0001001100 1110110011 110000001 S: 0,36562500E 05 0010001000010100 N: 144 R: 0 **PF(23 ) 00010001010 COST: 0,33898901 E 03 CPU: 129.528 0001001010 0001001010 1110110000 110000000 S: 0,21420500E 05 0010001000010 100 N: 96 R: 0 **PF(24)* 0001001 100 COST: 0.33133936E 03 CPU: 135.240 0001001100 0001001100 1110110010 110000000 S: 0,37918500E 05 0010001000010 100 N: 96 R: 0 **PF(25): 0001010000 COST: 0,28650854E 03 CPU: 143.812 0001010000 0001010100 1110101011 110000001 S: 0*36562500E 05 0010001000010100 Nt 144 R: 0 **PF(26): 0001010010 COST: 0.342943 12E 03 CPU: 149.388 0001010010 0001010010 1110101000 110000000 S: 0.21420500E 05 0010001000010 100 N: 96 P: 0 **PF(27): 0001010100 COST: 0.34019287E 03 CPU: 155.100 0001010100 0001010100 1110101010 110000000 S: 0.37918500E 05 00100010000 10100 N: 96 R: 0

427 **PF(28): 0010000000 COST: 0.26295435E 03 CPU: 160.957 0010000000 0010010000 1001101111 000100111 S: 0.33398500E 05 0010001000010100 0010000000 0010000100 1001111011 000101001 S: 0.33398500E 05 00 10001000010100 N: 96 R: 0 **PF(29): 0010000010 COST: 0.3 1938916E 03 CPU: 165.814 0010000010 0010000010 1001101100 000100100 S: 0.18256500E 05 00 1000 10000101 00 0010000010 0010000010 1001111000 000101000 S: 0.18256500E 05 00100010000 10100 N. 80 R: 0 **PF(30): 0010000100 COST: O.3 1323926E 03 CPU: 170.606 0010000100 0011000100 110011101 0 100001000 S: 0.33398500E 05 0010001000010100 N: 80 P: 0 **kPF(3 1): 0010001000 COST: 0.28145459E 03 CPU: 176.327 0010001000 0010001100 1100110011 100000001 S: 0.34754500E 05 0010001000010100 N: 96 R: 0 **PF(32): 0010001010 COST: 0.33788892E 03 CPU: 180.070 0010001010 0010001010 1100110000 100000000 S: 0.19612500E 05 0010001000010100 N: 64 R: 0 **PF(33): 0010001100 COST: 0.33023926E 03 CPU: 183.S 949 0010001100 0010001100 1100110010 100000000 S: 0.36110500E 05 0010001 000010100 N: 64 R: 0 **PF(34): 0010010000 COST: 0.28720850E 03 CPU: 189.739 0010010000 0011010100 1100101011 100000001 S: 0.36110500E 05 0010001000010 100 N: 96 R. O

428 **PF(35): 0010010010 COST: 0.34364307E 03 CPU 193 *514 0010010010 0011010010 1100101000 100000000 S: 0.20968500E 05 001000 000010100 N: 64 R: 0 **PF(36): 0010010100 COST: 0.34089307E 03 CPU: 19 7.332 0010010100 0011010100 1100101010 100000000 S: 0.37466500E 05 0010001000010 100 N: 64 R: 0 **PF(37): 001010100000 COST: 0.27765430E 03 CPU: 203,.134 0010100000 0010110000 1100001111 100000111 S: 0.34754500E 05 0010001000010100 0010100000 0010100100 1100011011 100001001 S: 0.34754500E 05 00 1000 1000010100 N: 96 R: 0 **PF(38): 0010100010 COST: 0.33408911E 03 CPU: 207.387 00110100010 0010 10 1100001100 100000100 S: 0.19612500E 05 00 1000 100010100 0010100010 0010100010 1100011000 100001000 S: 0.19612500E 05 0010001000010100 N: 72 R: 0 **PF(39): 0010100100 COST: 0.33103931E 03 CPU: 211.680 0010100100 0010100 1100011010 100001000 S: 0.36110500E 05 0010001000010100 N: 72 R: 0 **PF(40): 0010101000 COST: 0.30105444E 03 CPU: 215.992 0010101000 0010101100 1100010011 100000001 S: 0.38822500E 05 0010001000010100 N: 72 R: 0 **PF(41): 0010101010 COST: 0*35748901E 03 CPU: 218.807 0010101010 0010I101010 110001000 100000000 S: 0.23680500E 05 0010001000010100 N: 48 R: O

429 **PF(42): 0010101100 COST: 0.34983911E 03 CPU: 221.700 0010101100 0010101100000000 S: 0o40178500E 05 0010001000010 100 N: 48 R: 0 **PF(43): 00101 10000 COS T: 0.30500830E 03 CPU: 226.028 0010110000 0010110100 1100001011 100000001 S: 0.38822500E 05 0010001000010100 N: 72 R: 0 **PF(44): 00101 10010 COST: 0.3 61443 12E 03 CPU: 228.928 0010110010 0010110010 1100001000 100000000 S: 0.23680500E 05 001 0001000010100 N: 48 R: 0 **PF(45)9: 0010110100 COST: 0.35869287E 03 CPU: 23 1.757 0010110100 0010110100 1100001010 100000000 S: 0.40178500E 05 0010001000010100 N: 48 R: 0 **PF(46): 0011000000 COST: 0.27165430E 03 CPU: 239.698 0011000000 0011010000 1100101111 100000111 S: 0.33850500E 05 0010001000010100 0011000000 0011000100 1100111011 100001001 S: 0.33850500E 05 0010001000010100 N: 128 R: 0 **PF(47): 0011000010 COST: 0.3280891 1E 03 CPU: 245.550 0011000010 0011000010 1100101100 100000100 S: 0.18708500E 05 0010001000010100 0011000010 0011000010 1100111000 100001000 S: 0.18708500E 05 O0100010000 10100 N: 96 P: 0 **P F (48): 00 1 1000 100 COST: 0.32533911E 03 CPU: 251.314 0011000100 0011000100 1100111010 100001000 S: 0.35206500E 05 0010001000010100 N: 96 R: O

430 **PF(49): 0011001000 COST: 0.29535449E 03 CPU: 257.043 0011001000 0011001100 1100110011 100000001 S: 0,37918500E 05 0010001000010100 N: 96 R: 0 **PF(50): 001100 I 1010 COST: 0.3;5178906E 03 CPU: 260.826 0011001010 0011001010 1100110000 100000000 S: 0.22776500E 05 00100001000 10100 N: 64 R: 0 **PF(51): 0011001100 COST: 0.34413916E 03 CPU: 264.714 0011001100 0011001100 1100110010 100000000 S: 0.39274500E 05 0010001000010100 N: 64 R: 0 **PF(52): 0011010000 COST: 0.29930835E 03 CPlJ: 270.519 0011010000 0011010100 1100101011 100000001 S: 0,37918500E 05 0010001000010100 N: 96 R: 0 **PF(53): 0011010010 COST: 0.355743 16E 03 CPU: 274.326 0011010010 0011010010 1100101000 100000000 S: 0.22776500E 05 0010001000010100 N: 64 R: 0 **PF(54): 001 1010100 COST: 0.35299292E 03 CPU: 278.133 0011010100 0011010100 1100101010 100000000 S: 0.39274500E 05 0010001000010 100 N: 64 R: 0 **PF(55): 0100000000 COST: 0.25885449E 03 CPU: 287.964 0100000000 0100010000 1011101111 001100111 S: 0.32494500E 05 0010001000010100 0100000000 0100000100 1011111011 001101001 S: 0.32494500E 05 0010001000010100 N: 160 R: 0

431 **PF(56): 0100000010 COST: 03 1528906E 03 CPU: 295*706 0100000010 0100000010 1011101100 001100100 S: 0.17352500E 05 00 10001000010100 0100000010 0100000010 1011111000 001101000 S: 0.17352500E 05 0010001000010100 N: 128 P: 0 **PF( 5 7) 0100000 100 COST: 0.3 1083911E 03 CPU: 303.393 0100000100 0101000100 1010111010 000001000 S: 0.33398500E 05 0010001000010100 N' 128 R: 0 **PF(58): 0100001000 COST: 0.27905444E 03 CPU: 3 1 1.970 0100001000 0100001100 1010110011 000000001 S: 0.34754500E 05 0010001000010 100 N: 144 R: 0 **PF(59): 0100001010 COST: 0.33548901E 03 CPU: 3 17.647 0100001010 1010110000 000000000 S: 0.19612500E 05 0010001000010100 N: 96 R: 0 **PF(60): 0100001100 COST: 0.32783911E 03 CPU: 323.387 0100001100 0100001100 1010110010 000000000 S: 0.36110500E 05 0010001000010100 N: 96 P: 0 **PF(61): 0100010000 COST: 0.28480835E 03 CPU: 332.016 01O001O000 0101iOl100 101010i 11 000000001" S: 0.36110500E 05 0010001000010100 N: 144 R: 0 **PF(62): 0100010010 COST: 0.34124292E 03 CPU: 337*703 0100010010 0101010010 1010101000 000000000 S: 0.20968500E 05 0010001000010100 N: 96 R: O

432 **PF(63): 0100010100 COST: 0.33849316E 03 CPU: 343.408 0100010100 0101010100 1010101010 000000000 S: 0.37466500E 05 00100010000 10 100 N: 96 R: 0 **PF(64): 0100100000 COST: Oo27525439E 03 CPU: 351.120 010010000 0100110000 101000111 0000111 S: 0.34754500F 05 0010001 000010100 0100100000 0100100100 1010011011 000001001 S: 0.34754500E 05 0010001000010100 N: 128 R: 0 **PF(65): 0100100010 COST: 0.33 168921E 03 CPU: 356.826 0100100010 0100100010 1010001100 000000100 S: 0.19612500E 05 0010001000010100 0100100010 0100100010 1010011000 000001000 S: 0.19612500E 05 0010001000010100 N: 96 RP 0 **PF(66): 0100100100 COST: 0.3286391 6E 03 CPU: 362.515 0100100100 0100100100 1010011010 000001000 S: 0*36110500E 05 0010001000010 100 N: 96 R: 0 **PF(67): 0100101000 COST: 0.29865454E 03 CPU: 368.200 0100101000 0100101100 1010010011 000000001 S: 0.38822500E 05 0010001000010100 N: 96 R: 0 **PF(68): 0100101010 COST: 0.35508911E 03 CPU: 371.955 0100101010 0100101010 1010010000 000000000 S: 0.23680500E 05 0010001000010100 N: 64 R: 0 **PF(69): 0100101100 COST: 0.34743896E 03 CPU: 375.765 0100101100 0100101100 1010010010 000000000 S: 0.40178500E 05 0010001000010100 N: 64 P: O

433 **PF(70)): 0100110000 COST: 0.30260840E 03 CPU: 381.511 0100110000 0100110100 1010001011 000000001 S: 0.38822500E.05 0010001000010100 N: 96 R: 0 **PF(71): 0100110010 COST: 0.35904321 E 03 CPU: 385.298 0100l1 000 0100110010 1010001000 000000000 S: 0.23680500E 05 0010001000010100 N: 64 R: 0 **-PF(72): 0100110100 COST: 0.35629297E 03 CPU: 389.080 0100110100 0100110100 1010001010 000000000 S: 0.40178500E 05 0010001000010 100 N: 64 R: 0 **PF(73): 0101000000 COST: 0.27115430E 03 CPU: 396.785 0101000000 0101010000 1010101111 00000111 S: 0.33850500E 05 0010001000010100 0101000000 0101000100 1010111011 000001001 S: 0.33850500E 05 001000 1000010100 No: 128 R: 0 **PF(74): 0101000010 COST: 0.327589 1 1IE 03 CPU: 402.623 0101000010 0101000010 1010101100 000000100 S* 0.18708500E 05 0010001000010100 0101000010 0101000010 1010111000 000001000 S: 0.18708500E 05 001 0001 000010100 N: 96 R: 0 **PF(75): 0101000100 COST: 0.32483911E 03 CPU: 408.3 15 0101000100 0101000100 1010111010 000001000 S: 0.35206500E 05 0010001000010100 N: 96 R: 0 **PF(76): 0101001000 COST: 0.29485449E 03 CPU: 413,986 0101001000 0101001100 1010110011 000000001 S: 0.$7918500E 05 0010001000010100 N: 96 R: O

434 **PF(77): 0101001010 COST: 0.35128906E 03 CPU: 417.735 0101001010 0101001010 1010110000 000000000 St 0.22776500E 05 0010001 00010100 N: 64 R: 0 **PF(78): 0101001 100 COST: 0.34363892E 03 CPU: 421.520 0101001100 0101001100 1010110010 000000000 S: 0.39274500E 05 0010001 0001 01000 100 N: 64 P: 0 **PF(79): 0101010000 COST: 0.29880811E 03 CPU: 427.219 0101010000 0101010100 1010101011 000000001 -S 0.37918500E 0 5 0010001000010 100 N: 96 R: 0 **PF(80): 0101010010 COST: 0.35524292E 03 CPU: 430.977 0101010010 0101010010 1010101000 000000000 St 0*22776500E 05 0010001000010100 N: 64 R: 0 **PF(1 ): 0101010100 COST: 0.352493 16E 03 CPU: 434.735 0101010100 0101010100 1010101010 000000000 S: 0*39274500E 05 0010001000010100 N: 64 R: 0 PF(55) CONTAINS BEST SOLUTION(S). 7232 SOLUTIONS CONSIDERED. MEAN COST OVER ALL SOLUTIONS: 0.35939697E 03 MAXIMUM COST ENCOUNTERED: 0.58598242E 03 ********** SOLUTION TERMI NATED **********

Appendix H SOLUTION LISTING FOR PRIMITIVE OPERATION Q7 IN THE MEDICAL DIAGNOSIS- PROBLEM ********** SOLUTI ON PEGI NS ********** **PF( 1): 0000000000 COST: 0.314599S5E 03 CPU: 2.052 0000000000 0100000000 1011111111 001101111 S: 0.19160500E 05 000000 1000000000 0000000000 0001000000 1110111111 110001111 S: 0.19160500E 05 0000001000000000 N: 64 R: 0 **PF( 2): 0000000010 COST: 0.38659985E 03 CPU: 4.135 0000000010 0100000010 1011111101 001101100 S: 0.19520500E 05 0000001000000000 0000000010 0001000010 1110111101 110001100 S: 0*19520500E 05 0000001000000000 N: 64 R: 0 **PF( 3): 0000000100 COST: 0*37699976E 03 CPU: 6.150 0000000100 0100000100 1011111011 001101001 S: 0.20360500E 05 0000001000000000 0000000100 0001000100 1110111011 110001001 S: 0.20360500E 05 0000001000000000 N: 64 P: 0 **PF( 4): 0000001000 COST: 0.320599S5E 03 CPU: 9.037 0000001000 0100001000 1011110011 001100001 S: 0.15560500E 05 0000001000000000 0000001000 0001001000 1110110011 110000001 S: 0.15560500E 05 0000001000000000 N: 96 R: O 435

436 **PF( 5): 0000001010 COST: 0.39019995E 03 CPU 1 I.008 0000001010 0100001010 1011110001 001100000 S: 0.15 20500E 05 000000 1000000000 0000001010 0001001010 1110110001 110000000 S: 0. 15920500E 05 000000000000000 N: 64 R: 0 **PF( 6): 0000001100 COST: 0*40099976E 03 CPU: 13.001 0000001100 0100001100 1011110011 001100001 S: 0.23960500E 05 0000001000000000 0000001100 0001001100 1110110011 110000001 S: 0.23960500E 05 0000001000000000 N: 64 R: 0 **PF( 7): 0000010000 COST: 0.34219971E 03 CPU: 1 5.927 0000010000 0100010000 1011101011 001100001 5: 0.15560500E 05 000000 1000000000 0000010000 0001010000 1110101011 1100000001 S: 0.15560500E 05 0000001000000000 N: 96 R: 0 **PF( 8): 0000010010 COST: 0.41179980E 03 CPU: 17.921 0000010010 0100010010 1011101001 001100000 S: 0.15920500E 05 0000001000000000 0000010010 0001010010 1110101001 110000000 S: 01520500E 050 05 000001000000000 N: 64 R: 0 **PF( 9): 0000010100 COST: 0.43459985E 03 CPU: 1 9.8 59 0000010100 0100010100 1011100011 001100001 S: 0.23960500E 05 000000 1000000000 0000010100 0100010100 1011101011 001100001 S: 0.23960500E 05 0000001000000000 0000010100 0001010100 11101000011 110000001 S: 0.23960500E 05 0000001000000000 0000010100 0001010100 1110101011 110000001 S: 0.23960500E 05 0000001000000000 N: 64 R: 0

437 **PF(10): 0000100000 COST: O.30859985E 03 CPU: 23.73 6 0000100000 0100100000 1011001111 001000111 S: 0.15560500E 05 0000001000000000 0000100000 0100100000 1011011011 001001001 S: 0.15560500E 05 000001000000000 0000100000 0001100000 1110001111 110000111 S: 0.15560500E 05 0000001000000000 0000100000 0001100000 1110011011 110001001 S: 0.15560500E 05 0000001000000000 N: 128 P: 0 **PF(11)* 0000100010 COST: 0.37819971E 03 CPU: 26.644 0000100010 0100100010 1011001101 001000100 S: 0.15920500E 05 0000001000000000 0000100010 010010010 10I 1011001 001001000 S: 0.15920500E 05 0000001 000000000 0000100010 0001100010 1110001101 110000100 S: 0.15920500E 05 00000010000000000 0000100010 0001100010 1110011001 110001000 S: 0.15920500E 05 0000001000000000 N: 96 R: 0 **PF(12): 0000100100 COST: 0.40099976E 03 CPU: 29.525 0000100100 0100100100 1011010011 001000001 S: 0.23960500E 05 0000001000000000 0000100100 0100100100 1011011011 001001001 S: 0.23960500E 05 000000 1000000000 0000100100 0001100100 1110010011 110000001 S: 0.23960500E 05 000000100000000 0000100100 0001100100 1110011011 110001001 S: 0.23960500F 05 0000001000000000 N: 96 R: O **PF(13): 0000101000 COST: 0.34459985E 03 CPlU: 32.450 0000101000 0100101000 1011010011 001000001 S: 0.19160500E 05 0000001000000000 0000101000 C001101000 1110010011O 110000001 S: 0.19160500E 05 0000001 000000000 N: 96 F?: O

438 **PF(14): 0000101010 COST: 0.41419995E 03 CPU: 34.404 0000101010 0100101010 1011010001 001000000 S: 0.19520500E 05 0000000000000000 000 101010 0001101010 1110010001 110000000 S: 0o19520500E 05 00000000000000 N: 64 R: 0 **PF(15): 0000101100 COST: 0.42499976E 03 CPU: 36.357 000010100 0100101100 0100101100 1011010011 001000001 S: 0,27560500E 05 000000 1000000000 0000101100 0001101100 1110010011 110000001 S: 0.27560500E 05 0000001000000000 N: 64 R: 0 **PF( 16): 00001 10000 COST: 0.36619971E 03 CPU: 39.190 0000110000 0100110000 1011001011 001000001 S: 0*19160500F 05 0000001000000000 0000110000 0001110000 1110001011 110000001 S: 0.19160500E 05 000000 1000000000 N: 96 R: 0 **PF(17): 0000110010 COST: 0.43579980E 03 CPU: 41.140 0000110010 0100110010 1011001001 001000000 S: 0.19520500E 05 0000001000000000 0000110010 0001110010 1110001001 110000000 S: 0.19520500E 05 0000001000000000 N: 64 R: 0 **PF(18): 0000110100 COST: 0.45859985E 03 CPU: 43.082 0000110100 0100110100 1011000011 001000001 S: 0.27560500E 05 0000001000000000 0000110100 0100110100 1011001011 001000001 S: 0.27560500E 05 0000001000000000 0000110100 0001110100 1110000011 110000001 S: 0.27560500E 05 0000001000000000 0000110100 0001110100 1110001011 110000001 S: 0.27560500E 05 0000001000000000 N: 64 R: 0

439 **PF(19): 0001000000 COST: 0.31259985E 03 CPU: 48.855 0001000000 0001000000 0110001111 010000111 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 0110011011 010001001 S: 0O13360500E 05 0000001000000000 0001000000 0001000000 01.10101111 010000111 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 0110111011 010001001 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1100001111 100000111 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1100011011 100001001 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1100101111 100000111 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1100111011 100001001 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1110001111 110000111 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1110011011 110001001 S: 0.13360500E 05 0000001000000000 0001000000 0001000000 1110101111 110000111 S: 0.13360500E 05 000000 1000000000 0001000000 0001000000 1110111011 110001001 S: 0.13360500E 05 0000001000000000 N: 192 R: 0

440 **PF(20): 0001000010 COST: 0.38219971E 03 CPU; 535.397 0001000010 0001000010 0110001101 010000100 S: 0. 13720500E 05 0000001000000000 0001000010 0001000010 011001101 010001000 S:o 0o 13720500E 05 0000010000000000 0001000010 0001000010 0110101101 010000100 S: 0o13720500E 05 0000001000000000 0001000010 0001000010 0110111001 010001000 S: 0.13720500E 05 00000100 0000000 0001000010 0001000010 1100001101 100000100 S: 0.13720500E 05 0000001000000000 0001000010 0001000010 1100011001 100001000 S: 0.13720500E 05 0000001000000000 0001000010 0001000010 1100101101 100000100 S: 0*13720500E 05 0000001000000000 0001000010 0001000010 1100111001 100001000 S: 0.13720500E 05 0000001000000000 0001000010 0001000010 1110001101 110000100 S: 0.1372050CE 05 0000001000000000 0001000010 0001000010 1110011001 110001000 S: 0.13720500E 05 0000001000000000 0001000010 0001000010 1110101101 11 0000100 S: 0.13720500E 05 0000001000000000 0001000010 0001000010 1110111001 110001000 S: 0.13720500E 05 0000001000000000 N: 144 R: 0

441 **PF(2 1): 0001000100 COST: 0.40499976E 03 CPU: 57.8 51 0001000100 0001000100 0110010011 010000001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 0110011011 010001001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 0110110011 010000001 S 0.21760500E 05 0000001000000000 0001000100 0001000100 0110111011 010001001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 1100010011 100000001 S: 0.21760500E 05 000000 1000000000 0001000100 0001000100 1100011011 100001001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 1100110011 100000001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 1100111011 100001001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 1110010011 110000001 S: 0.21760500E 05 0000001000000000 0001000100 0001000100 1110011011 110001001 S: 0.21760500E 05 0000001 000000000 0001000100 0001000100 1110110011 110000001 S: 0.21760500E 05 0000001000000000 0001000100 0001 00100 1110111011 110001001 S: 0.21760500E 05 0000001000000000 N: 144 R: 0 **PF(22): 0001001000 COST: 0.34859985E 03 CPU: 62.324 0001001000 0001001000 0110010011 0100000011001001 10000001 S 0.16960500E 05 0000001000000000 0001001000 0001001000 0110110011 010000001 S: 0.1 69 60500E 05 0000001000000000 000 1 00 1000 0001001000 1100010011 100000001 S: 0.1 6960500E 05 0000001 000000000 0001001000 0001001000 110011000011 100000001 S: 0.16960500E 05 0000001000000000 0001001000 0001001000 1110010011 110000001 S: 0.16960500E 05 0000001 000000000 0001001000 0001001000 1110110011 110000001 S: 0.16960500E 05 000000 1000000000 N: 144 R: 0

442 **PF(23): 0001001010 COST: 0.41819995E 03 CPU: 65.301 0001001010 0001001010 001000ICOO 010000000 S: 0o173205CCE 05 000000100000000 0001001010 0001001010 0110110001 010000000 St 0,173205 0C 05 0000001000000000 0001001010 0001010 10 1100010001 100000000 St: 0.17320500E 05 0000001000000000 000100110 0010 0 01010 1100110001 100000000 S: 0.17320500(E 05 0000001000000000 0001001010 000100 101O0 1 10010001 I 110000000 S: 0. 17320500E 05 0000001 000000000 0001001010 0001001010 1110110001 110000000 S: 0,17320500E 05 0000001 000000000 N: 96 R: 0 **PF(24): 0001001 100 COST: 0.42899976E 03 CPU: 68.261 0001001100 0001001100 0110010011 010000001 S: 0.25360500E 05 0000001000000000 0001001100 0001001100 0110110011 0100001 S: 0.2-5360500E 05 0000001000000000 0001001100 0001001100 1100010011 100000001 S: 0.25360500E 05 0000001000000000 0001001100 OOOiOOll10 1100110011 100000001 S: 0.25360500E 05 0000001000000000 0001001100 0001001100 1110010011 110000001 S: 0.2536050(0F 05 0000001000000000 0001001100 0001001100 1110110011 110000001 S: 0.25360500E C05 0000001000000000 N: 96 R: 0 **PF(25): 0001010000 COST: 03 7019971E 03 CPU: 72.560 0001010000 000101 0000 0110001011 010000001 S: 0.16960500E C5 0000001000000000 0001010000 000l101000l 0110101011 010000001 S: 0.16960500CE CS5 000000 1000000000 0001010000 0001010000 1100001011 100000001 St 0.16960500E 05 0000001000000000 0001010000 0001010000 1100101011 100000001 St 0.16960500E 05 0001010000 0001010000 1110001011 1100000101 I 5: 0.16960500QE 05 0000001000000000 0001010000 0001010000 1110101011 110000001 S: 0.169605CC00E 05 0000001000000000 N: 144 R: 0

443 **PF(26): 0001010010 COST: 0.43979980E 03 CPU: 75.517 0001010010 0001010010 0110001001 010000000 S: 0.17320500E 05 0000001000000000 0001010010 0001010010 0110101001 010000000 S: 0.17320500E 05 0000001000000000 0001010010 0001010010 1100001001 100000000 S: 0.17320500E 05 0000001000000000 0001010010 0OO1010010 1100101001 100000000 S: 0.17320500E 05 000000 1000OOO000 0OC0010110 0001010010 1110001001 110000000 S: 0.1732C500E 05 0000001000000000CO 00O010101 0001010010 1110101001 110000000 S: 0.17320500E 05 00o00010000 0000 N: 96 P: 0 **PF(27): 0001010100 COST: 0.46-259985E 03 CPU: 78.462 001010100 0001010100 0110000011 010000001 S: 0.2536050CE 05 0000001000000000 0001010100 0001010100 0110001011 010000001 S: 0.25360500F 05 0000001 000000000 0001010100 0001010100 0110100011 010000001 S: 0.25360500E 05 0000001000000000 0001010100 00C1010100 0110101011 010000001 S: 0.25360500E 05 000 0001000000000 0001010100 0001010100 1100000011 100000001 S: 0.25360500E 05 o000001000000000 0001010100 0001010100 1100001011 100000001 5: 0.25360500E 05 0000001000000000 0001C10100 0001010100 1100100011 100000001 S: 0.25360500E 05 0000001000000000 00C1010100 0001010100 1100101011 100000001 S: 0.25360500E 05 OOOCCO 100C000000 0001010100 0001010100 1110000011 110000001 S: 0.25360500E C5 0000001000000000 0001010100 0001010100 1110001011 110000001 S: 0.25360500E 05 C00000100000COO0 0001010100 0001010100 1110100011 110000001 S: 0.25360500E 05 0000001000000000 000101010( 0001010100 1110101011 110000001 S: 0.25360500E 05 0000001000000000 N: 96 R 0 **P F (2 ): 0010000000 COST: 0.308599q5E 03 CPU: 81.600 0010000000.0110000000 1001101111 000100111 S: 0.15560500E 05 0000001000000000 O010000000 0110000000 1001111011 000101001 S: 0.15560500E 05 0000001 000000000 N: 96 P: C

444 **PF(29): 0010000010 COST: 0.37819971E 03 CPU: 84.049 0000O00010 0110000010 1001101101 000100100 S: 0.15920500F;05 0000001000000000 0010000010 0110000010 1001111001 000101000 S: 0.15920500F 05 0000001000000000 N: 80 F:.0 **PF(30): 0010000100 COST: 0.38299976E 03 CPU: 86.394 0010000100 0011000100 0100111011 000001001 S: 0.20960500E 05 0000001 000000000 0010000100 0011000100 1100111011 100001001 S: 0,2096C500E 05 00000 1000000000 N: 80 R: 0 **PF(3 1): 001000 1000 COST: 0.32659985E 03 CPU: 89.261 0010001000 0011001000 0100110011 0000000001 S: 0.16160500E 05 0000001000000 000000000 0010001000 0011001000 1100110011 100000001 S: 0.16160500E 05 0000001000000000 N: 96 R: 0 **~PF(32): 0010001010 COST: 0.39619995E 03 CPU: 91.211 0010001010 0011001010 0100110001 000OOOOOC S: 0.16520500E 05 0000001000000000 0010001010 0011001010 1 100 10001 100000000 S: 0.16520500E C5 000000 1000000000 N: 64 R: 0 **PF(33): 0010001100 COST: 0.40699976E 03 CPU: 93.119 0010001100 0011001100 0100110011 000000001 S: 0.24560500E 05 0000001000000000 0010001100 0011001100 1100110011 100000001 S: 0.24560500E 05 0000001000000000 N: 64 R: 0 **PF(34): 0010010000 COST: 0.34819971E 03 CPU: 95.969 0010010000 0011010000 0100101011 000000001 S: 0.16160500E 05 0000001000000000 0010010000 0011010000 1100010101 100000001 S: 0.16160500E 05 000000100000000 N: 96 RF: O

445 **PF(35): 0010010010 COST: 0.41779980E 03 CPU: 97.9 1 5 0010010010 OC11010010 0100101001 000000000 S: 0.16520500E 05 0000001000000000 0010010010 0011010010 1100101001 100000000 S: 0.16520500E 05 0000001000000000 N: 64 P: 0 **PF(36): 0010010100 COST: 0.44059985E 03 CPU: 99.839 0010010100 0011010100 0100100011 000000001 S: 0.24560500E 05 0000001 000000000 0010010100 011010100 0100o101011 000000001 S: 0.24560500E 05 0000001000000000 0010010100 0011010100 1100100011 100000001 S: 0.24560500E 05 0000001000000000 0010010100 0011010100 1100101011 100000001 S: 0.24560500E 05 0000001000000000 N: 64 R: 0 **PF(37): 0010100000 COST: 0.3 1459985E 03 CPU: 102.694 0010100000 0011100000 0100001111 000000111 S: 0.16160500E 05 0000001000000000 0010100000 0011100000 0100011011 000001001 S: 0.16160500E 05 0000001000000000 0010100000 0011100000 1100001111 100000111 S: 0.16160500E 05 0000001000000000 0010100000 0011100000 1100011011 100001001 S: 0.16160500E 05 0000001000000000 N: 96 P: 0 **PF(38): 0010100010 COST: 0.38419971E 03 CPU: 104.879 0010100010 0011100010 0100001101 000000100 S: 0.16520500E 05 0000001000000000 0010100010 0011100010 0100011001 000001000 S: 0.16520500E 05 0000001000000000 0010100010 0011100010 1100001101 100000100 S: 0.16520500E 05 0000001000000000 0010100010 0011100010 1100011001 100001000 S: 0.16520500E 05 0000001000000000 N' 72 R: 0

446 k**PF(39): 0010100100 COST: 0.40699976E 03 CPU: 107.052 001010000 0011100100 0100010011 000000001 S: 0.24560500E 05 0000001000000000 0010100100 0011100100 0100011011 000001001 S: 0,24560500E 05 0000001000000000 0010100100 0011100100 1100010011 100000001 S: 0.24560500E 05 000000 1000000000 0010100100 0011100100 1100011011 100001001 S: 0.24560500E 05 0000001000000000 N: 72 R: 0 **PF(40): 0010101000 COST: 0.35059985E 03 CPU: 109.207 0010101000 0011101000 0100010011 000000001 S: 0.19760500E 05 0000001000000000 0010101000 0011101000 1100010011 100000001 S: 0.19760500E 05 0000001000000000 N: 72 R: 0 **PF(41): 00 10101010 COST: 0.42019995E 03 CPU: 110.650 0010101010 0011101010 0100010001 000000000 S: 0.20120500E 05 0000001000000000 0010101010 0011101010 1100010001 100000000 S: 0.20120500E 05 0000001000000000 1N: 48 R: 0 **PF(42): 0010101 100 COST: 0.43099976E 03 CPU: 11211 5 0010101100 0011101100 0100010011 000000001 S: 0.28160500E 05 0000001000000000 0010101100 0011101100 1100010011 100000001 S: 0.28160500E 05 0000001000000000 N: 48 R: 0 **PF(43): 0010110000 COST: 0*37219971E 03 CPU: 114.256 0010110000 00111100000 0100001011 000000001 S: 0.19760500E 05 0000001000000000 0010110000 0011110000 1100001011 100000001 S: 0.19760500E 05 0000001000000000 N: 72 F: 0

447 **PF(44): 0010110010 COST: 0.441 79980E 03 CPU: 1-15.698 0010110010 0011110010 0100001001 000000000 S: 0.20120500E 05 0000001000000000 0010110010 0011110010 1100001001 100000000 S: 0.20120500E 05 0000001000000000 N: 48 R: 0 **PF(45): 0010110100 COST: 0.46459985E 03 CPU: 117.153 0010110100 0011110100 0100000011 000000001 S: 0.28160500E 05 000000 1000000000 0010110100 0011110100 0100001011 000000001 S: 0.28160500E 05 0000001000000000 0010110100 0011110100 1100000011 100000001 S: 0.28160500E 05 0000001000000000 0010110100 0011110100 1100001011 100000001 S: 0.28160500E 05 0000001000000000 N: 48 R: 0 **PF(46): 0011000000 COST: 0.3 1659985E 03 CPU: 121.024 0011000000 0011000000 0100001111 000000111 S: 0. 13960500E 05 0000001000000000 0011000000 0011000000 0100011011 000001001 S: 0.13960500E 05 0000001000000000 0011000000 0011000000 0100101111 000000111 S: 0.13960500E 05 0000001000000000 0011000000 0011000000 0100111011 000001001 S: 0.13960500E 05 0000001000000000 0011000000 0011000000 1100001111 100000111 S: 0.13960500E 05 0000001000000000 0011000000 0011000000 1100011011 100001001 S: 0.13960500E 05 0000001000000000 0011000000 0011000000 1100101111 100000111 S: 0.13960500E 05 0000001000000000 0011000000 0011000000 1100111011 100001001 S: 0*13960500E 05 0000001000000000 N: 128 R: 0

448 **PF(47): 0011000010 COST: 0.38619971E 03 CPU: 124.056 0011000010 0011000010 0100001101 000000100 S: 0o14320500E 05 0000001 00000000 0011000010 0011000010 010011001: 0000000 S,142050CE 05 0000001000000000 0011000010 0011000010 0100101101 000000100 S: 0.14320500E 05 0000001000000000 0011000010 0001 0 0100111001 000001000 S: 0.14320500E 05 0000001000000000 0011000010 0011000010 1100001101 100000100 S: 0.1432050CE 05 000000 1000000000 0011000010 0011000010 110001 1001 100001000 S: 0.14320500E 05 0000001000000000 0011000010 0011000010 1100101101 100000100 S: 0.14320500E 05 0000001000000000 0011000010 0011000010 100111001 100001000 S: 0.143205OF 05 0000001000000000 N: 96 R: 0 **PF(48): 0011000100 COST: 0.40899976E 03 CPLI: 127*033 0011000100 00 1000100 01000 1 0011 000000001 S: 0.22360500E 05 0000001000000000 0011000100 00t1000100 0100011011 000001001 S: 0.22360500E 05 0000001000000000 0011000100 0011000100 0100110011 000000001 S: 0,.22360500E 05 0000001000000000 0011000100 0011000100 0100111011 000001001 S: 0o2236050F C05 0O000001000000000 0011000100 0011000100 1100010011 100000001 S: 0.22360500E 05 0000001000000000 0011000100 0011000100 1100011011 100001001 S: 0.22360500E 05 0000001000000000 0011000100 0011000100 1100110011 100000001 S: 0,22360500E 05 0000001000000000 0011000100 0011000100 1100111011 100001001 S: 0,22360500E 05 0000001000000000 N: 96 R: 0 **PF(49): 0011001000 COST: 0.35259985E 03 CPU: 129.972 0011001000 0011001000 0100010011 000000001 S: 0.1756050CE 05 000001000000000 0011001000 0011001000 0100110011 000000001 S: 0.17560500E 05 0000001000000000 0011001000 0011001000 110010011 100000001 S: 0O.17560500E 05 0000010000000 Ol l0 00 11 0 001101000 1100110011 100000001 S: 0.17560500E 05 0000001000000000 N: 96 P: 0

449 **PF(50): 001 1001010 COST: 0.42219995E 03 CPU: 13 1.965 0011001010 0011001010 0100010001 000000000 S: 0.17920500E 05 0000001000000000 0011001010 0011001010 0100110001 000000000 S: 0.17920500E 05 000000 1000000000 0011001010 0011001010 1100010001 100000000 S: 0.17920500E 05 0000001000000000 C011001010 0011001010 1100110001 100000000 S: 0.17920500E 05 000000 1000000000 N: 64 R: 0 **PF(51): 0011001100 COST: 0.43299976E 03 CPU: 133.921 0011001100 0011001100 0100010011 000000001 S: 0.25960500E 05 0000001000000000 0011001100 0011001100 0100110011 000000001 S: 0.25960500E 05 0000001 000000000 0011001100 0011001100 1100010011 100000001 S: 0.25960500E 05 0000001000000000 0011001100 0.011001100 1100110011 100000001 S: 0.25960500E 05 0000001000000000 N: 64 R: 0 **PF(52): 0011010000 COST: 0.37419971E 03 CPU: 136.840 001 1010000 0011010000 0100001011 000000001 S: 0.17560500E 05 0000001000000000 0011010000 0011010000 0100101011 000000001 S: 0.17560500E 05 0000001000000000 0011010000 0011010000 1100001011 100000001 S: 0.17560500E 05 0000001000000000 0011010000 0011010000 1100101011 100000001 S: 0.17560500E 05 0000001000000000 N: 96 R: 0 **PF(53): 0011010010 COST: 0.44379980E 03 CPU: 138.822 001 1010010 0011010010 0100001001 000000000 S: 0.17920500E 05 000000 1000000000 0011010010 0011010010 0100101001 000000000 S: 0.17920500E 05 0000001000000000 0011010010 0011010010 1100001001 100000000 S: 0.17920500E 05 0000001000000000 0011010010 0011010010 1100101001 100000000 S: 0.17920500E 05 0000001000000000 Nl: 64 R: O

450 **PF(54): 0011010100 COST: 0.46659985E 03 CPU: 1400.818 0011010100 0011010100 0100000011 000000001 S: 0.25960500F 05 000000 000000000 0011010100 0011010100 0100001011 0000001 S: 0.25960500E 05 0000001000000000 0011010100 0011010100 010000011OO 000000001 S: 0.25z960500E 05 00000010000000000 0011010100 0011010100 0100101011 000000001 S: 0.25960500E 05 0000001000000000 0011010100 0011010100 1100I 000011 I00000001 S: 0,S2596050F 05 0000001000000000 0011010100 0011010100 1100001011 100000001 S: 0.25960500E 05 0000001000000000 0011010100 0011010100 1100100011 100000001 S: 0.25960500E 05 0000001000000000 0011010100 0011010100 1100101011 100000Q001 S: 0.25960500E 05 0000001000000000 N: 64 R: 0 **PF(55): 0100000000 COST: 0,3 1259985E 03 CPU: 145.736 0100000000 0100000000 0001101111 000100111 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 0001111011 000101001 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 0011001111 001000111 S: 0.13360500E 05 000000 1000000000 010000000000 0100000000 0011011011 001001001 S: 0.13360500E 05 0000001 000000000 0100000000 0100000000 0011101111 00110011 1 S: 0o13360500E 05 0000001000000000 0100000000 0100000000 0011111011 001101001 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 1001101111 00010011 I S 0.13360500E 05 0000001000000000 0100000000 0100000000 100111 1011 000101001 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 1011001111 001000111 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 1011011011 001001001 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 1011101111 001100111 S: 0.13360500E 05 0000001000000000 0100000000 0100000000 1011111011 001101001 S: 0.13360500E 05 0000001000000000 N: 160 R: 0

451 **PF(56): 01000000010 COST: 0.38219971E 03 CPU: 149,804 0100000010 0100000010 0001101101 000100100 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 0001111001 000101000 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 0011001101 001000100 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 0011011001 001001000 S: 0.13720500E 05 000000 1000000000 0100000010 0100000010 0011101101 001100100 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 0011111001 001101000 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 1001101101 OOO100100 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 1001111001 000101000 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 1011001101 001000100 S: 0.13720500E 05 0000001 000000000 0100000010 0100000010 1011011001 001001000 S: 0.13720500E 05 0000001000000000 0100000010 0100000010 1011101101 001100100 S: 0.13720500E 05 000000 1000000000 0100000010 0100000010 1011111001 001101000 S: 0.13720500E 05 000000 1000000000 N: 128 R: 0 **PF(57): 0100000100 COST: 0.38699976E 03 CPU: 153.735 0100000100 0101000100 0010111011 000001001 S: 0.20960500E 05 0000001000000000 0100000100 0101000100 1010111011 000001001 S: 0.20960500E 05 0000001000000000 N: 128 F: 0 **PF(58): 0100001000 COST: 0.3305S985E 03 CPU: 157.969 0100001000 0101001000 0010110011 000000001 S: O.16160500E 05 0000001000000000 0100001000 0101001000 1010110011 000000001 S: 0.16160500F 05 000000 1000000000 N: 1 4 FP: 0

452 ** PF(59): 0100001010 COST: 0,400199957 0.5 CPU- 160.807 0100001010 0101001010 0010110001 000000000 S: 0.165205COE C5 000000 0000000 0100001010 0101001010 1010110001 000000000 S: 0.16520 500F 05 000000 1000000000 N: 96 R: 0 **PF(60): 01000C1100 COST: 0.41099976E 03 CPU: 163.644 0100001100 0101001100 0010110011 000000001 S: 0 24560500F 05 0000001 000000000 0100001100 0101001100 1010110011 000000001 S: 0,24560500CE 05 000000 1000000000 N: 96 R: 0 **PF(61): 0100010000 COST: 0.35219971E 053 CPU: 167.862 0100010000 0101010000 0010101011 0000000001 S: 016160500E 05 0000001000000000 0100010000 0101010000 1010101011 000000001 S: 0.16160500E 05 0000001000000000 N: 144 R: 0 **PF(62): 0100010010 COST: 0.42179980E 03 CPU: 170.689 0100010010 01010100O0 0010101001 000000000 S: 0.16520500E 05 0000001000000000 0100010010 0101010010 1010101001 000000000 S: 0.16520500E 05 0000001000000000 N: 96 R: 0 **PF(63): 0100010100 COST: 0.44459985E 03 CPU: 173.470 0100010100 0101010100 0010100011 000000001 S: 0.24560500E 05 0000001000000000 0100010100 01010101(0 0010101011 000000001 S: 0.24560500E 05 000o001000000000 010001C100 01G10101010 1010100011 0000000001 S: 0.24560500F 05 0000001000000000 0100010100 0101010100 1010101011 000000001 s: 0.24560500E 05 0000001000000000 N: 96 R: 0

453 **PF(64): 0100100000 COST: 0.3 18599,55E 03 CPUl: 177.251 01C0100000 0101100000 0010001111 000000111 S: 0.1616C500E 05 n0000001000000000 0100100000 0101100000 0010011011 000001001 S: 0.1616050CE 05 00000010000000OOO 01001O000 0101100000 1010001111 000000111 S: 0.16160500E C5 000 0001 000000000 0100100000 0101100000 1010011011 000001001 S: 0.16160500E 05 000000 1 (0000000 N:. 12F PF 0 **PF(65): C100100010 COST: 0.3881S971E 03 CPU: 180.124 O100100010 0101100010 0010001101 000000100 S: 0.16520500E 05 0000001000000000 0100100010 0101100010 0010011001 000001000 S: 0.16520500E 05 000000 000000000 0100100010 0101100010 1010001101 000000100 S: 0.16520500E 05 0000001000000000 0100100010 0101100010 1010011001 000001000 S: 0.16520500E 05 0000001000000000 rN: 96 P: 0 **PF(66): 0100100100 CO.T: 0.4109Q 976E 03 CPLU: 182.9g2 01001001(0 0101100100 0010010011 000000001 S: 0.24560500E 05 000000 1000000000 010ll00100 0101100100 0010011011 000001001 S: 0.24560500E 05 0000001000000000 0100100100 0101100100 1010010011 000000001 S: 0.24560500E 05 0000001000000000 0100100100 0101100100 1010011011 000001001 S: 0.24560500E 05 0000001000000000 N: 96 R: 0 **PF(67): 0100101000 COST: 0.35459985E 03 CPU: 185.793 0100101000 0101101000 0010010011 000000001 S: 0.S19760500E 05 000000 1000000000 0100101000 0101101000 1010010011 000000001 S: 0.19760500E 05 0000001000000000 i: 96 P: 0

454 **FF(69): 0100101010 COST: 0.42419995E 03 CPU: 187.687 0100101010 010110010C 0010010001 000000000 S: 0.20120500E 05 0000001000000000 0100101010 01011101010 1010010001 000000000 S: 0.20120500E 05 000000 1000000000 N: 64 P: 0 **FF(C69): 0100101100 COST: 0.43499976E 03 CPU: 189.593 0100101100 0101101100 0010010011 00OOG001 S: 0o28160500E 05 0000001000000000 0100101100 0101101100 1010010011 000000001 S: 0.28160500E 05 0000001000000000 N: 64 R: 0 **PF(70): 0100110000 COST: 0.37619971E 03 CPU: 192.402 0100110000 0101110000 0010001011 000000001 S: 0.19760500E 05 0000001000000000' 0100110000 0101110000 1010001011 000000001 S: 0.19760500E 05 000000100000000 N: 96 R: 0 **PF( 71 ): 010000 10 COST: 0.44579980 E 03 CPU: 194.287 0100110010 0101110010 0010001001 000000000 S: 0.20120500E 05 0000001000000000 0100110010 0101110010 1010001001 000000000 S: 0*2012050CE 05 0000001000000000 N: 64.: 0 **PF(72): 01001 10100 COST: 0.46859985E 03 CPU: 196.172 0100110100 0101110100 0010000011 000000001 S: 0.28160500E 05 0000001000000000 0100110100 0101110100 0010001011 000000001 S: 0.2816050CE 05 0000001000000000 0100110100 0101110100 1010000011 000000001 S: 0.28160500E 05 0000001000000000 0100110100 0101110100 1010001011 000000001 5: 0.28160500E 05 0000001000000000 N: 64 R: 0

455 **PF(73): 0101000000 COST: 0.32259955E 03 CPU: 199.978 0101000000 0101000000 0000001111 000000111 $: 0.13960500E 05 0000001000000000 0101000000 0101000000 0000011011 000001001 S: 0.13960500E 05 0000001000000000 01C1000000 0101000000 0000101111 00000111 S: 0.13960500E 05 0000001000000000 010 1000000 0101000000 00000111011 000001001 S: 0.13960500E 05 0000001 000000000 0101000000 0101000000 0010001111 000000111 S: 0.13960500E 05 0000001000000000 0101000000 0101000000 0010011011 000001001 S: 0.13960500E 05 0000001000000000 0101000000 0101000000 0010101111 000000111 5: 0.13960500E 05 000000 1000000000 0101000000 0101000000 0010111011 000001001 S: 0.13960500E 05 0000001000000000 0101000000 0101000000 1000001111 000000111 St 0.13960500E 05 0000001000000000 0101000000 0101000000 1000011011 000001001 S: 0.13960500E 05 0000001 000000000 0101000000 0101000000 1000101111 000000111 S: 0.13960500E 05 0000001000000000 0101000000 0101000000 1000111011 000001001 S: 0.13960500E 05 00C0001000000000 0101000000 0101000000 1010001111 000000111 S: 0.13960500E 05 0000001000000000000 0101000000 0101 000000 1010011011 000001001 S: 0.13960500E 05 0000001000000000 0101000000 0101'000000 1010101111 000000111 S: 0.13960500E 05 000000 1000000000 0101000000 0101000000 1010111011 000001001 S: 0.13960500E 05 0000001000000000 N: 128 R: 0

456 **PF(74): 0101000010 COS T 0.3 9219971E 03 CPU: 2031 40 0101000010 0101000010 0000001101 000000100 S: 0.14320500E 05 0000001000000000 010100000 0101000010 0000011001 000001000 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 0000101101 000000100 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 0000111001 000001000 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 0010001101 000000100 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 0010011001 000001000 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 0010101101 000000100 S: 0.14320500E 05 0000001000000000 0101000010 0100000l 0 0010111001 000001000 S: 014320500OE 05 000000 1000000000 0101000010 0101000010 1000001101 000000100 S: 0.1432050CE 05 0000001000000000 0101000010 OlO1OOOOO0 1000011001 000001000 S: 0.1432050C0E 05 00000 1000000000 0101000010 0101000010 1000101101 000000100 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 1000111001 000001000 S: 0O1432050CE 05 0000001000000000 01010000 10 0101000010 1010001101 000000100 S: O. 1 4320500E 05 0000001000000000 0101000010 0101000010 1010011001 000001000 S: 0.14320500E 05 0000001000000000 0101000010 0101000010 1010101101 000001OOOOO 0 S: 0.1432C500E 05 0000001000000000 0101000010 0101000010 1010111001 000001000 S: 0.1432050C0E 05 0000001000000000 N: 96 R: 0

457 **PF(75): 0101000100 COST: 0.41499976E 03 CPU: 206.257 0101000100 0101000100 0000010011 000000001 S: 0.22360500E 05 0000001000000000 0101000100 0101000100 0000011011 000001001 S: 0.22360500E 05 0000001000000000 0101000100 0101000100 0000110011 000000001 S: 0.2236050CE 05 0000001000000000 0101000100 0101000100 0000111011 000001001 S: 0.22360500E 05 0000001000000000 0101000100 0101000100 0010010011 000000001 S: 0.22360500E 05 000000 1000000000 0i01000100 0101000100 0010011011 000001001 St 0.22360500E 05 0000001 000000000 0101000100 0101000100 0010110011 000000001 S: 0.22360500E 05 000000 1000000000 0101000100 0101000100 0010111011 000001001 S: 0.22360500E 05 0000001 000000000 0101000100 0101000100 1000010011 000000001 S: 0.22360500E 05 0000001000000000 0101000100 0101000100 1000011011 000001001 St 0.22360500E 05 OOCOOCI 000000000 0101000100 0101000100 1000110011 000000001 S: 0.22360500E 05 0000001 000000000 0101000100 0101000100 1000111011 000001001 S: 0.22360500E 05 0000001 000000000 0101000100 0101000100 1010010011 000000001 S: 0.22360500E 05 000000 1 000000000 0101000100 0101000100 1010011011 000001001 S: 0.22360500E 05 0000001 000000000 0101000100 0101000100 1010110011 000000001 S: 0.22360500E 05 0000001000000000 0101000100 0101000100 1010111011 0000001001 S: 0.22360500E 05 0000001000000000 N: 96 R: 0

458 **PF(76): 0101001000 COST: 0.35859985E 03 CPU: 209.414 0101001000 0101001000 0000010011 000000001 S: 0.17560500E 05 00000010000000000000 0101001000 0101001000 0000110011 000000001 S: 0.17560500E 05 0000001000000000 0101001000 0101001000 0010010011 000000001 St 0.17560500E 05 0000001000000000 0101001000 0101001000 0010110011 000000001 S: 0,17560500E 05 0000001000000000 0101001000 0101001000 1000010011 000000001 S: 0.o17560500E 05 0000001000000000 0101001000 0101001000 1000110011 000000001 S: 0.17560500E 05 000000 1000000000 0101001000 0101001000 1010010011 000000001 S: 0,17560500E 05 0000001000000000 0101001000 0101001000 1010110011 00000001 5: 0.17560500F 05 0000001000000000 N: 96 R: 0 **PF(77): 0101001010 COST: 0o42819995E 03 CPU: 211.485 0101001010 0101001010 0000010001 0000000O S: 0.17920500E 05 0000001000000000 0101001010 0101001010 0000110001 000000000 S: 0.1792050CE 05 0000001000000000 0101001010 0101001010 0010010001 000000000 s: 0*179205COE 05 0000001000000000 0101001010 0101001010 0010110001 000000000 S: 0.17920500E 05 0000001000000000 0101001010 0101001010 1000010001 000000000 S: 0.17920500F 05 0000001000000000 0101001010 0101001010 1000110001 000000000 S: 0,17920500E 05 0000001000000000 0101001010 0101001010 1010010001 000000000C S: 0.17920500E 05 0000001000000000 010101001010 0101001010 1010110001 000000000 S: 0.17920500E 05 0000001000000000 N: 64 RF 0

459 **PF(7Z): 0101001100 COST: 0.43 99976E 03 CPU: 2 1 3.524 0101001100 0101001100 0000010011 000000001 S: 0.25960500E 05 0000001000000000 0101001 100 0101001100 0000110011 000000001 S: 0.25960500E 05 0000001000000000 0101001100 010100 10100 0010010011 000000001 S: 0.25960500E 05 0000001 000000000 0101001100 C101001100 0010110011 000000001 S: 0.25960500E 05 000000 1000000000 0101001100 0101001100 1000010011 000000001 S: 0.25960500E 05 0000001000000000 0101001101 0 0101001100 1000110011 000000001 S: 0.259605.00E 05 0000001000000000 0101001100 0101001100 1010010011 000000001 S: 0.25960500E 05 0000001000000000 0101001100 0101001100 1010110011 000000001 S: 0.25960500E 05 0000001000000000 N: 64 R: C **PF(79): 0101010000 COST: 0.38019971E 03 CPU: 216.473 0101010000 0101010000 0000001011 000000001 S: 0.17560500E 05 0000001000000000 0101010000 0101010000 0000110011 000000001 S: 0.17560500E 05 0000001 000000000 0101010000 0101010000 0010001011 000000001 S: 0.17560500E 05 000000 1000000000 0101010000 0101010000 0010101011 000000001 S: 0.17560500E 05 0000001000000000 0101010000 0101010000 1000001011 000000001 S: 0.17560500E 05 0000001000000000 0101010000 0101010000 1000101011 000000001 S: 0.17560500E 05 0000001000000000 0101010000 0101010000 1010001011 000000001 5: 0.1756050CE 05 0000001000000000 0101010000 0101010000 1010101011 000000001 S: 0.17560500E 05 0000001 000000000 N: 96 R: 0

460 **PF(80): 0101010010 COST: 0,.44979980E 03 CPU: 218.53 1 0101010010 0101010010 0000001001 000000000 S: 0.17920500E 05 0000001000000000 0101010010 0101010010 0000101001 000000000 S: 0,1792C500E 05 000000 1000000000 01o1010010 0101010010 0010001001 000000000 S: 0.17920500E 05 0000001000000000 0101010010 0101010010 0010101001 000000000 S: 0,17920500E 05 0000001000000000 0101010010 0101010010 1000001001 000000000 S: 0.17920500E 05 0000001000000000 0101010010 0101010010 1000101001 000000000 S: 0.17920500E 05 0000001000000000 0101010010 0101010010 1010001001 000000000 S: 0.17920500E 05 0000001000000000 0101010010 0101010010 1010101001 000000000 S: 0.17920500F 05 00000010000000000 N: 64 R: 0

461 *k*PF(1 ): 0101010100 COST: 0.47259985E 03 CPU: 220.589 0101010100 0101010100 0000000011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 0000001011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 0000100011 000000001 S: 0.25960500E 05 0000001000000000 O1C1OIOOO0 0101010100 0000101011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 0010000011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 0010001011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 0010100011 000000001 S: 0.25960500E 05 0000000000001000000 0101010100 0101010100 0010101011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 1000000011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 1000001011 000000001 S: 0.25960500E 05 0000001 000000000 0101010100 0101010100 1000100011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 1000101011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 1010000011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 1010001011 000000001 S: 0.25960500E 05 0000001000000000 0101010100 0101010100 1010100011 000000001 S: 0.25960500F 05 0000001000000000 0101010100 0101010100 1010101011 000000001 S: 0.25960500E 05 000000100000000000 N: 64 P: 0 PF(28) CONTAINS P EST SOLUTION(S). 7232 SOLUTIONS CONSIDERED. MEP N COST OVER ALL SOLUTIONS: 0.46321021E 03 MAXIMUM COST ENCOUNTERED: 0.67499951E 03 ********** $SOLUTI ON TERMINATED **********

BIBLIOGRAPHY 1. Ash, W. L. and E. H. Sibley, "TRAMP: An Interpretive Associative Processor with Deductive Capabilities", Proc. ACM Nat. Conf. (1968), pp. 143-156. 2. Balas, Egon, "An Additive Algorithm for Solving Linear Programs with Zero-One Variables", Operations Research, 13: 4, JulyAugust 1965, pp. 517-546. 3. Balinski, M. L., "Integer Programming: Methods, Uses, Computation", Management Science, 12: 3, November 1965, pp. 253-313. 4. Berkeley, Edmund C., "The Programming Language LISP: An Introduction and Appraisal", Computers and Automation, September 1964, pp. 16-23. 5. Bobrow, Daniel G. and Bertram Raphael, "A Comparison of List-Processing Computer Languages", CACM 7: 4, April 1964 pp. 231-240. 6. Brewer, S., "Data Base or Data Maze", Proc. ACM Nat. Conf. (1968), pp. 623-630. 7. Busacker, Robert G. and Thomas L. Saaty, Finite Graphs and Networks: An Introduction with Applications (New York: McGrawHill, 1965). 8. Carr, J. W. III, "Recursive Subscripting Compilers and List-Type Memories", CACM, 2: 2, February 1959, pp. 4-6. 9. Chapin, N., "A Deeper Look at Data", Proc. ACM Nat. Conf. (1968), pp. 631-638. 10. Chapin, N., "Common File Organization Techniques Compared", Proc. FJCC (1969), pp. 413-422. 11. Childs, David L., "Description of a Set-Theoretic Data Structure", Proc. FJCC (1968), pp. 557-564. 12. Codd, E. F., "A Relational Model of Data for Large Shared Data Banks", CACM 13: 6, June 1970, pp. 377-387. 462

463 13. Coffmran, E. G. and J. Eve, "File Structures Using Hashing Functions", CACM, 13: 7, July 1970, pp. 427-432. 14. Comfort, W. T., "Multiword List Items", CACM, 7: 6, June 1964, pp. 357-362. 15. Conway, R. W. et al, "CLP - The Cornell List Processor", CACM, 8: 4, April 1965, pp. 215-216. 16. Cooper, D. C. and H. Whitfield, "ALP: An Autocode ListProcessing Language", Computer Journal, April 1962, pp. 28-32. 17. D'Imperio, Mary, "Data Structures and their Representations in Storage: Parts I and II", Reprinted from NSA Technical Journal, 9: 3 and 4, 1964, pp. 59-81, pp. 7-54. 18. D'Imperio, Mary, "Data Structures and their Representation in Storage", Annual Review in Automatic Programming, 5, 1969, pp. 1-75. 19. Dodd, G. G., "Elements of Data Management Systems", Computing Surveys, 1: 2, June 1969, pp. 117-133. 20. Feldman, J. A., "Aspects of Associative Processing", Technical Note 1965-13, MIT Lincoln Laboratory, Lexington, Massachusetts, 1965. 21. Feldman, Jerome A. and Paul D. Rovner, "An Algol-Based Associative Language", CACM, 12: 8, August 1969, pp. 439-449. 22. Gass, Saul I, Linear Programming: Methods and Applications (New York: McGraw-Hill, 1964). 23. Gelernter, H. et al, "A Fortran-Compiled List-Processing Language", JACM, 7: 2, April 1960, pp. 87-101. 24. Golomb, S. W. and L. D. Baumert, "Backtrack Programming", JACM, 12: 4, October 1965, pp. 516-524. 25. Gorry, G. A., "A System for Computer-Aided Diagnosis", Doctoral Thesis, MAC-TR-44, Massachusetts Institute of Technology, September 1967.

464 26. Gorry, G. A. and G. O. Barnett, "Sequential Diagnosis by Computer", JAMA, 205, September 1968, pp. 849-854. 27. Gustafson, D. H. et al, "Subjective Probabilities in Medical Diagnosis", IEEE Trans. Man-Machine Systems, MMS-10: 3, September 1969, pp. 61-65. 28. Halmos, Paul R., Naive Set Theory, (New York: Van Nostrand, 1960). 29. Hammer (Ivanescu), P. L. and S. Rudeanu, Boolean Methods in Operations Research and Related Areas, (New York: SpringerVerlag, 1968). 30. Hansen, Wilfred J., "Compact List Representation: Definition, Garbage Collection, and System Implementation", CACM, 12: 9, September 1969, pp. 499-507. 31. Harrison, M., "BALM - An Extendable List-Processing Language", Proc. SJCC (1970), pp. 507-511. 32. Hellerman, H., "Addressing Multidimensional Arrays", CAC M 5: 4, April 1962, pp. 205-207. 33. Hsiao, David and Frank Harary, "A Formal System for Information Retrieval from Files", CACM, 13: 2, February 1970, pp. 67-73. 34. IBM Corporation, "IBM System/360 Model 67 Functional Characteristics", Form A27-2719-0, 1967. 35. Ivanescu, P. L., "Pseudo-Boolean Programming and Applications", Lecture Notes in Mathematics, 9 (Berlin: Springer-Verlag, 1965). 36. Ivanescu, P. L. and S. Rudeanu, "Pseudo Boolean Methods for Bivalent Programming", Lecture Notes in Mathematics, 23 (Berlin: Springer-Verlag, 1966). 37. Iverson, K. E., "A Programming Notation for Trees" IBM Corp. Research Report, RC-390, January 1961. 38. Iverson, K. E., A Programming Language, (New York: Wiley, 1962).

465 39. Jacquez, J. A. (Ed), The Diagnostic Process, Proc. Conf. held at The University of Michigan, May 1963 (Ann Arbor: Malloy Lithographing, 1964). 40. Kleinmutz, B., Clinical Information Processing by Computer (New York: Holt, Rinehart and Winston, 1969). 41. Knowlton, Kenneth C., A Programmer's Description of L6, Bell Telephone Laboratories' Low-Level Linked List Language, February 1966, Murray Hill, N. J. 42. Knowlton, Kenneth C., "A Programmer's Description of L6" CACM, 9: 8, August 1966, pp. 616-625. 43. Knuth, Donald E., The Art of Computer Programming, Volume 1, Fundamental Algorithms (Reading, Massachusetts: AddisonWesley, 1968). 44. Lang, C. A. and J. C. Gray, "ASP - A Ring Implemented Associative Structure Package", CACM, 11: 8, August 1968, pp. 550- 555. 45. Lawler, E. L. and M. D. Bell, "A Method for Solving Discrete Optimization Problems", Operations Research, 14: 6, NovemberDecember 1966, pp. 1098-1112. 46. Lawler, E. L. and D. E. Wood, "Branch-and-Bound Methods: A Survey", Operations Research 14 4, July-August 1966, pp. 699-719. 47. Lawson, Harold W., Jr., "PL/I List Processing", CACM, 10: 6, June 1967, pp. 358-367. 48. Ledley, R. S., Use of Computers in Biology and Medicine (New York: McGraw-Hill, 1965). 49. Lipovski, G., "The Architecture of a Large Associative Processor", Proc. SJCC (1970), pp. 385-396. 50. Lipschutz, Seymour, Theory and Problems of Set Theory and Related Topics, Schaum's Outline Series, (New York: Schaum, 1964).

466 51. Lowe, Thomas C., "The Influence of Data-Base Characteristics and Usage on Direct-Access File Organization", JACM, 15: 4, October 1968, pp. 535-548. 52. Lusted, L. B., Introduction to Medical Decision Making (Springfield: Thomas, 1969). 53. Madnick, Stuart E., "Strong Processing Techniques", CACM, 10: 7, July 1967, pp. 420-424. 54. McCarthy, John, "Recursive Functions of Symbolic Expressions and their Computation by Machine, Part I", CACM, 3: 4, April 1960, pp. 184-195. 55. McCuskey, William A., "On Automatic Design of Data Organization", Proc. FJCC (1970), pp. 187-199. 56. McGee, William C., "File Structures for Generalized Data Management", Proc. IFIP Congress (1968), Applications 1, Booket F, pp. 68-73. 57. Mealy, George H., "Another Look at Data", Proc. FJCC (1967), pp. 525-534. 58. Morris, Robert, "Scatter Storage Techniques", CACM, 11: 1, January 1968, pp. 38-44. 59. Newell, A. and J. C. Shaw, "Programming the Logic Theory Machine", Proc. Western Joint Computer Conference, 11, February 1957, pp. 230-240. 60. Newell, A. and F. M. Tonge, "An Introduction to Information Processing Language V", CACM, 3: 4, April 1960, pp. 205-211. 61. Newman, William M., A System for Interactive Graphical Programming, Computer Tech. Group Report 67/7, Centre for Computing and Automation, Imperial College, October 1967. 62. Newman, William M., The ASP-7 Ring-Structure Processor, Computer Tech. Group Report 67/8, Centre for Computing and Automation, Imperial College, October 1967.

467 63. Patt, Yale N., "Variable Length Tree Structures Having Minimum Average Search Time", CACM, 12: 2, February 1969, pp. 72-76. 64. Perlis, A. J. and Charles Thornton, "Symbol Manipulation by Threaded Lists", CACM, 3: 4, April 1960, pp. 195-204. 65. Ross, Douglas T., "A Generalized Technique for Symbol Manipulation and Numerical Calculation", CACM, 4: 3, March 1961, pp. 147-150. 66. Roth, Richard H., "An Approach to Solving Linear Discrete Optimization Problems", JACM, 17: 2, April 1970, pp. 303-313. 67. Rovner, Paul D., An Investigation into Paging a SoftwareSimulated Associative Memory System, Document No. 40. 10.90, Univ. of Calif., Berkeley, 1966. 68. Rovner, P. D. and J. A. Feldman, "The LEAP Language and Data Structure", Proc. IFIP Cong. (1968), pp. 579-585. 69. Salton, Gerard, "Manipulation of Trees in Information Retrieval", CACM, 5: 2, February 1962, pp. 103-114. 70. Sammon, J. W., Jr., "A Nonlinear Mapping for Data Structure Analysis", IEEE Trans. Computers, C-18: 5, May 1969, pp. 401-409. 71. Sibley, Edgar H., et al, "Graphical Systems Communication: An Associative Memory Approach", Proc. FJCC (1968), pp. 545- 555. 72. Sutherland, William R., "The CORAL Language and Data Structure", Excerpt from Doctoral Thesis, MIT Lincoln Laboratory, Lexington, Massachusetts, 1966. 73. Symes, L., "Manipulation of Data Structures in a Numerical Analysis Problem Solving System - NAPSS", Proc. SJCC (1970), pp. 157-164. 74. Toronto, A. F. et al, "Evaluation of a Computer Program for Diagnosis of Congenital Heart Disease", Prog. Cardiovascular Diseases, 5: 4, January 1963, pp. 362-377. 75. University of Michigan, "MTS: Michigan Terminal System", Volumes I, II, and III, University of Michigan Publication Distribution Services, February 1971.

3 9811 036905 1l 468 76. Warner, H. R. et al, "A Mathematical Approach to Medical Diagnosis", JAMA, 177: 3, July 1961, pp. 177-183. 77. Warner, H. R. et al, "Experience with Bayes Theorem for Computer Diagnosis of Congenital Heart Disease", Annals NY Acad. Sciences, 115, 1964, pp. 558-567. 78. Weizenbaum, J., "Knotted List Structures", CACM, 5: 3, March 1962, pp. 161-165. 79. Weizenbaum, J., "Symmetric List Processor", CACM, 6: 9, September 1963, pp. 524-536. 80. Williams, Robin, "A Survey of Data Structures for Computer Graphics Systems", Computing Surveys, 3: 1, March 1971, pp. 1-21. 81. Woodward, P. M. and D. P. Jenkins, "Atoms and Lists", Computer Journal, April 1961, pp. 47-53.