THE UNIVERSITY OF MICHIGAN SYSTEMS ENGINEERING LABORATORY Department of Electrical Engineering College of Engineering SEL Technical Report No. 56 FINDING OPTIMAL GROUPS - A QUADRATIC PROGRAMMING APPROACH IN ZERO/ONE VARIABLES WITH APPLICATIONS by Don Michael Cgleman under the direction of Professor Keki B. Irani May 1971 under contract with: ROME AIR DEVELOPMENT CENTER Research and Technology Division Griffiss Air Force Base, New York Contract No. F30602-69-C-0214

WI 9.Qlt d

Abstract On Finding Optimal Groups - A Quadratic Programming Approach in Zero/One Variables with Applications by Don Michael Coleman Co-chairman K. B. Irani E. L. Lawler In this research we consider a mathematical programming problem of the following form: Maximize f(x)= Cx.x. b.x. 131j 3 j k subject to aj x. < 1 k=1, 2,..., m, where bj is an arbitrary real, c..ij > 0 for all i, j, and a.j,,k are positive reals for all j and k, xj e {0, 1} for all j. For values of m < 2 we present an algorithm for solution of this problem based upon generalized Lagrange multipliers. This algorithm is suitable for application to problems with a rather large number of binary variables. This formulation is shown to be applicable to a number of optimal grouping problems; a particular application to computer systems paging is examined in detail. i

Acknowledgments I wish to express my gratitude to Professors L. K. Flanigan, B. A. Galler, K. B. Irani, E. L. Lawler, and E. L. McMahon for serving as my doctoral committee, and for their many constructive suggestions and criticisms. I am particularly indebted to Professor K. B. Irani for his professional guidance and direction. For the task of supervising the typing of this manuscript, I express my thanks to Miss Joyce Doneth. A special note of thanks is due to Mr. Dayrl Knoblock of The University of Michigan Computing Center staff who provided a special version of the SNOBOL translator and gave invaluable assistance in the collecting of statistics on paging behavior. This research was supported by the Rome Air Development Center of the United States Air Force under contract no. AF30602-69C-0214; for this aid I am truly grateful. Finally, I thank my wife for her patience, understanding, and encouragement which made this goal possible. ii

Table of Contents Page List of Tables v List of Figures vi Abstract Chapter Introduction 1 1. 1 General Problem Area 2 1. 2 Areas of Applicability 2 1. 2. 1 Quadratic Knapsack Problem 3 1. 2. 2 Optimal Selection of R and D Project 3 1. 2. 3 Pattern Recognition 5 1. 2.4 A Problem in Computer Systems Storage Allocation 7 1. 2.4, 1 Paging 9 1. 2. 4. 2 Class of Computer Programs for Partitionitg 12 1. 2.4. 3 Review-of Program Partitioning 13 1. 2. 5 Other Relevant Results 18 1. 3 Scope of Dissertation 21 2 Unconstrained Optimization of a Discrete Quadratic Function 22 2, 1 Phase I Branching Algorithm 23 2.2 Computational Results Using Phase I 45 2,3 Phase II 50 2. 3. 1 Optimization by Implicit Enumeration 51 2. 3. 2 Partial Ordering of Binary Vectors 51 2. 3. 3 Partial Enumeration with Quadratic Function 53 2.4 Phase I Computational Efficiency 54 2. 5 Comparison with other Methods 59 3 Techniques for Constrained Optimization 64 3. 1 Generalized Lagrange Multipliers 65 3. 2 Using Generalized Multipliers for Quadratic Knapsack Problem 68 3.3.The Two Constraint Case 70 3.4 Constrained Optimization Using "Families of Solutions" 74 111

Page 3. 5 Example Problem Using Lagrange Multipliers 76 3. 6, Comparison of Lagrange Multipliers and Family of Solutions Techniques 78 3. 7 Existence of Gaps Using Multipliers 79 4 Computer Programming and Graphical Partitioning 81 4. 1 Model for Program Graph 81 4. 1. 1 Program Instruction Units and Data Units 82 4. 2 Figure oi Merit for Program Packing 86 4. 3 Program Partitioning as a Mathematical Programming Problem 87 4.4 Graphical Partitioning 90 4. 5 Sequential Selection oI Partition Blocks 92 4. 0. 1 Selection ot Maximum Weight Partition Block 92 4, 5. 2 Selection of Partition Block with Minimum Interblock Weight 95 4. 5. 3. The Number of Vertices on Selected Block as a Parameter 97 4.6 Algorithm for Graphical Partitioning 103 4.6. 1 Example of Partitioning 103 4.6. 2 Results for Some Partitioning Problems 107 5 Comparison of Program Partitioning Strategies 110 5. 1 Alternative Partitioning Strategies 110 5. 1. 1 Unit Merge Algorithm 110 5. 1. 2 Segmentation Algorithms 111 5.2 Comparative Results 112 5.3 Extension ot Sequential Selection Procedure 114 5. 3. 1 Order ot Selected Blocks 114 5. 3. 2 Multiple Solutions 115 5.4 Branch and Bound Algorithm for Partitioning 117 5. 5 Summary 120 6 Applications to Systems Programming 124 6. 1 Paging oi a SNOBOL Translator in MTS 124 6. 2 Reorganization of a SNOBO.L Translator 126 7 Summary and Future Research 132 7. 1 Extension of Quadratic Algorithm 133 7. 2 Program Partitioning with Paging 133 Appendix I. Solution of Linear Psuedo-Boolean Inequality 138 II Two Partitioning Algorithms^ 148 III Graph Theoretic Formalism 153 Bibliography 157 iv

List of Tables Page Table 2. 1 Unconstrained Results Phase I 48 Table 2. 2 Phase I Computational Complexity 58 Table 4. 1 Quadratic Knapsack Problem Results 94 Table 4. 2 Results of Parametric Partitioning 101 Table 4. 3 Results of Graphical Partitioning 107 Table 4.4 Test Problem Results 108 Table 5. 1 Interblock Weights 112 Table 6. 1 Paging Results Heavy Load 127 Table 6. 2 Paging Results Normal Load 127.~~~~~

List of Figures Page Figure 2. 1 Phase I Algorithm 46 Figure 2. 1 Phase I Algorithm 47 (continued) Figure 2. 2 Time Versus Cost Function Coefficients 49 Figure 2.3 Phase II Algorithm 55 Figure 3. 0 Schematic of Everett's Method 67 Figure 3. 1 Algorithm for Two Constraint Case 73 Figure 4. 1 Example Graph 102 Figure 4. 2 Example Graph 104 Figure 4. 3 Algorithm for Partitioning 105 Figure 5. 1 Schematic for Branch and Bound Procedure 119 vi

Chapter 1 1. Introduction In the design of many large scale systems it is frequently useful to formulate a number of the engineering problems as optimization problems in zero/one variables. Much effort is directed towards developing algorithms for solution of these problems which are readily programmed and do not require unreasonable amounts of computation time and storage. Linear programming has been one of the more successful techniques developed to deal with problems of this nature. For example, special algorithms have been developed for efficient solution of the classical transportation and assignment problems. In this dissertation we provide a method for solving a quadratic binary programming problem. We show this problem to have a wide range of applicability in both operations research and in computer systems design. In the -subsequent Sections of this chapter we present first a description of the general quadratic problem with which we will be concerned and then a review of the areas of applicability and known results. In the concluding portion of this chapter we give the scope of this dissertation in detail. 1

2 1.1 General Problem Area The quadratic programming problem will be defined as follows: (1) Maximize f(X) = E cijxixj- b.x. i,j'J J subject to (2) ak.x <x k k=1,2,...,m jj -k where x= -(X1X2...,Xn) x. {0,1} for all i, Cij.. >, for all -- 11 — k. i and j, a j, Elk are positive real numbers for all j and k, bj is arbitrary real number. For small values of n (less than 40) a general algorithm such as the one given by Lawler and Bell [ 31] can be used to solve this problem. However, when n becomes relatively large, these algorithms are not efficient when directly applied to this problem. In this dissertation we present an algorithm although not as general as [ 31], which can be used to solve (1) and (2) when m < 2. Our algorithm may be used to solve problems for relatively large values of n without requiring prohibitive amounts of computation time and storage. In the following section we explore some areas of applicability of problem (1) and (2) for the case of m < 2. 1. 2 Areas of Applicability There are numerous areas for application of the problem defined in sectionl. 1. In the following three sections we give some operations research problems which may be modelled using the formulation presented in section 1.1.

3 We begin with a problem we define as the quadratic knapsack problem which is seen to be a special case of the problem of section 1.2.1 Quadratic Knapsack Problem We define the quadratic knapsack problem as follows: Maximize f(x) =.. X..,j 13 1 subject to a.ix. < 11 where x =(x1,x2,,xn), xi. {0,1}for all i, >0, cij >0 for all i,j, ai > 0 for all i. Here, as opposed to the classical linear knapsack problem [ 8 ], the value associated with selection of a particular item is related to the set of items which are selected along with it. Of course x. =1 implies that the ith item has been selected while x. =0 implies rejection. The constant cij is a measure of the value of selecting items i and j while a. is the weight of the ith item and M is a constraint on the total weight of the items selected. A particular instance where such a model is applicable is in the program planning and budgeting field [2], [ 5 ], [45]. 1. 2. 2 Optimal Selection of R and D Projects It can be seen from the literature that the problem of selecting a firm's research projects has been given considerable attention (see

4 [ 5 ] for example). In most of the work previously published, considerable attention has been focused on the research and development allocation problems which can be formulated as linear programming models. An interesting variation of the optimal project selection problem is handled by the quadratic knapsack problem. The problem briefly stated is as follows. A large research and development concern is faced with the problem of selecting from among n projects those which should be researched. It is further assumed that among the n projects one has quantified a matrix {c'ij} which represents the value of success, c'.. = the value of the successful outcome of th th the i and j research proposal One might see [ 2 ] for a discussion of how one estimates the value of successful projects. There also might be associated with each pair of projects a number Pij the probability of a successful outcome of the th th i and j project. Therefore we can define c = j c'ij which represents a weighted interrelated value coefficient. And we select the set of projects which maximize (1) C..x.x.. 1j 1 j Of course x. = 1 implies that the ith project has been selected while x. = 0 implies otherwise. Now usually management estimates, from past experience, the cost per man-hour associated with a given type of research project. This is represented by a. for the i project.

5 The total available man-hours per year 1i for all projects which are utilized among the projects is a cost constraint known by management. And it is this constraint which must be satisfied when the set of research projects are selected i. e., (2) a.x. < M The maximization of (1) subject to (2) gives the best selection of research projects and is a direct application of quadratic knapsack problem. There are numerous algorithms in the literature for the classical knapsack problem [ 8 ], [9 1, [171. However there are few results concerning the quadratic case. In chapter four we give some computational results for the problem of selecting an optimal set with inter-related value coefficients. The next section illustrates an application in the area of pattern recognition. 1. 2. 3 Pattern Recognition A common problem in pattern recognition [43 ] is that of classifying signals into sets of common patterns, i. e., signals or points which are somehow similar are expected to be classified into different classes. Essentially there are three basic elements to the classification problem in pattern recognition. The first is that of metricizing, i.e., a method of measuring distance between the points to be classified.

6 Generally the distance measure is a way of quantifying the similarity or dissimilarity of the points to be classified. Next is the concept of a clustering criteria which usually includes the notion that "distances" between members of the same class are on the average small. Finally, there must be a computational procedure for selecting the clusters or partitioning the set of points. If the first two elements of the pattern classification problem are given, our quadratic programming formulation can be used as a method for classification. For example let us have a set of points {1,2,...,n} for which we have a metric defined on all pairs of points..th th 1 If c'.. = the distance between the ith and j point define c = ] ij- 1 C'ij It is desired to partition the set of points into two sets such that the sum of the distances of the points in the same set is minimized. One could select this partition by solving the following: maximize C. x.x. + c ij(-xi)(lxj) 1ij 3 ij1 subject to CXi > 1 i 3Xi < 2 xi {0,1} for alli, I1 < 2 This is a special case of the problem defined in section 1. 1. The numbers hw and 2 could be used to control the size of the partition, i. e., how many elements in each set of partitions. The first term in

7 the objective function is a measure of the similarity of the points which are selected for first set of the partitions while the seond term measures the similarity of those points which are not selected and which form the second set of the partition. In the next section we discuss a problem in computer storage allocation. The study of this problem was a motivation for looking at problems of optimal groupings and hence our quadratic problem. 1. 2.4 A Problem in Computer Systems Storage Allocation In the relatively short time that the electronic digital computer has been available, it has undergone extensive changes in its basic makeup and organization. This transformation has seen the computer move from a relatively slow, unreliable machine with limited memory capacity to a highly complex system of processors with a hierarchy of memories and complicated input-output devices. Along with the emergence of new hardware has been the emergence of the art of multiprogramming. This concept has been aided by development of the virtual storage machire [ 1], i.e., each user in the system may assume that he has available a very large address space, which may be many times larger than the size of main memory. The bookkeeping and memory management for efficient operation of such a system is automatic and transparent to the user. Multiprogramming, as the name implies, means having several programs occupy high speed memory simultaneously. The problems

of effectively using storage in a multiprogrammed mode are sometimes appropriately called problems of storage allocation. When we consider this problem in the context of the early days of computing it was relatively straightforward. In early batch systems where programs were run one at a time, each program had the entire high speed memory (core) available. Problems arose when a program was larger than the available core. In these cases the programmer had to improvise by "segmenting" his program (instructions and data), and controlling the overlaying of "segments". Segmenting, of course, referred to dividing the program up into parts (segments); hence during execution the unused segments of a program were kept on a backing store and brought into core, i. e., overlaid as they were needed. If the problem of segmenting a program was given to the operating system instead of the programmer we had what was called an automatic segmenting system; and the problem of segmenting the program in an optimum manner became known as the classical overlay problem [36 ] or the problem of program segmentation [ 26 ]. Segmentation has come to mean something slightly different in modern terminology [11 ]. In multiprogrammed systems overlaying or segmentation is intended to increase the size of effective memory. Segmentation can also reduce the delay in loading a large task into the space currently occupied by many smaller ones, since in the case of segmented systems

9 much of the overhead incurred by having to move tasks around to make the larger task fit can be avoided. Likewise the delay is reduced in loading small tasks among large tasks. Therefore with effective segmentation throughput may be increased. A summary of techniques for overlaying is found in [36 ]. 1.2.4.1 Paging In batch systems there was no particular problem associated with the overlays of different sizes (the only constraint was that they were smaller than the core). However in multiprogramming systems, if the size of the unit of core allocation is varied to suit the needs of the information being transferred, then the problem of storage fragmentation is incurred; that is,the high speed memory available for storage becomes fragmented into numerous little sets of contiguous locations. There is a high system overhead incurred by physically moving tasks in and about to utilize a fragmented core, due to the added burden of keeping track of things once they have been committed to high speed memory. In the first dynamic storage allocation system, the Ferranti ATLAS, this problem was "solved" as follows. High speed memory was divided into fixed size blocks called page frames. The programs and data which were to be brought into high speed storage were divided into similarly fixed sized blocks called pages. Now wheninformation was to be brought into core, the set of available frames was matched with the set of blocks or pages which were to be brought in. Systems which

10 use this approach usually have hardware mapping devices which make the address of items in a page independent of the particular page frame in which a given item resides. Such systems are called paging systems and these are the systems with which we will be directly concerned. Actually the physical dividing up of core does not "solve" the core fragmentation problem. However, in such systems fragmentation is now within individual pages rather than in high speed memory. This is because it is often the case that a page(s) allocated to a task will only be partially used. The great virtue of such systems is that they are simpler to implement since a page can be placed into any available page frame. Some modern systems do not have a uniform unit of core allocations, for example the MULTICS system has two such frame sizes - 1024 words and 64 words, and are still referred to as paging systems. A good summary of the allocation strategies and hardware facilities of several modern day systems is given in [41]. A basic problem in paging systems is that of memory management, that is, the decisions by which the operating system determines what pages are to be brought in to high speed memory, called fetch strategies, the decisions which determine where the incoming pages are to be placed, called placement strategies, and the decisions to determine which pages must be removed when the conditions of high speed core so dictate, called replacement strategies. A set of such strategies is usually called a page turning rule or algorithm. These strategies must

11 be formulated in the total system environment with consideration given to such system policies as scheduling, which is most often concerned with the selection of the task to use the CPU, sharing of programs, and other system features, However, the main objective of such pageturning policies is to decrease the page traffic between high speed memory and the backing store. The reasons for such an objective are well known. First, there is a system's overhead associated with making a decision about what to move and where to move it. Second if the volume of transfer between the two levels of storage grows too fast, there can be congestion in the channel between the two memories which could in some cases seriously affect processor efficiency due to tasks being queued for the use of the channel. The operation of the CPU is further degraded, by the "stealing" of memory cycles by other auxiliary memory devices, if the page traffic is high. In most of the literature the memory management problem is divided into two stages: (1) Paging in - Locate the required page in the backing store, bring it into the high speed memory and turn on the appropriate bit in the appropriate page table. (2) Paging out - Remove some page from main memory; turn the in-core bit off in the appropriate page table. Denning [ 10 notes that the problem of paging in is solved apriori since most systems use the so called demand paging strategy, a page is

12 brought in only when it is first referenced by an executing task. So most work is concerned with the page replacement part of the management strategy. A summary of some of the current replacement rules is given in [10]. One current view is that by careful analysis of the structural and behavioral characteristics of programs which run on paged systems, one can decrease the page traffic. Structural and behavioral characteristics of a program are referred to as microscopic properties as opposed to its macroscopic properties, e.g., execution time, size, etc. The structural characteristics are determined by the threads of control between blocks of program code while behavioral characteristics refer to frequencies of references between program blocks. In Chapter 4 we present a detailed discussion of models for program reorganization. In this chapter we show how the quadratic programming problem may be used as a tool for selecting a block of a computer program. 1. 2.4. 2 Class of Computer Programs for Partitioning Given the prevailing conditions for collecting statistics on program behavior, we initially consider program reorganizational techniques to apply to a single class of programs. These are large systems programs which are run frequently, such as compilers, assemblers, and certain library routines. These programs tend to be composed of a large number of subroutines and tend to be combined at execution

13 according to a rather constant pattern. Such programs are heavily used and are available for study. The time spent gathering statistics for model building is justified if only a small increase in operating efficiency is obtained. In our subsequent discussions we draw freely on the descriptive terminology of the graph theory. For the reader unfamiliar with such terminology we refer him to Appendix III. 1. 2. 4. 3 A Review of Program Partitioning We now review what has been set forth in the past regarding the microscopic approach to the problem of memory management. Ramamoorthy [39] has previously proposed an operating system which would detect, during the compilation phase, those structural properties of a program which would be incorporated into a page-turning algorithm. He advocates an advance paging policy as opposed to demand paging. There is some merit to advance rather than demand paging. Although demand paging tends to minimize the amount of high speed memory allocated to the tasks in the system, since only pages which are referenced are loaded, a more significant measure of a strategy's effectiveness is the space time product. A program awaiting pages will continue to occupy high speed memory; in this case the space time product will be affected by the time taken to fetch pages. Pinkerton [38] has results which show conditions under which an advance paging policy is more effective, in terms of space time product, than demand

14 paging. More important to our immediate goals are the suggestions Ramamoorthy has about how pages should be produced, what we call the "page packing problem". He advocates that the blocks of code placed on the pages (packing of pages) after the program has been compiled should be chosen on the basis of connectivity and frequency of access. He uses the directed graph model of a program, as previously introduced by Karp [22] and Marimont [34], as the basic tool for study of the page packing problem. Each node of the graph represents a small block of data or instructions. The directed arcs represent the transfer of control between instructions, and the undirected edges are used to indicate an instruction referencing data. Since the edge is undirected control is to remain in the instruction block. Ramamoorthy uses the connection matrix of the graph to determine a number of useful program properties. For example, the program can be examined for logical consistency, i.e., no infinite loops, or redundant instructions by a series of algebraic operations on the connection matrix. His main result concerning the packing of programs has to do with the observation of the property of strong connectedness among the nodes of the graph. A set of nodes are strongly connected if given any pair of nodes vi, v there is a path from v. to v. and also a path from v. to v.. Ramamoorthy states that the page packing which minimizes traffic between high speed memory and the backing store will be one which places all nodes of the strongly connected components of a graph on the same page. However, since the strongly connected

15 components of a graph can be any size, ranging from only one instruction to perhaps the complete program, the fact that there are constraints on the page sizes indicates that this result will in general not be very useful for practical programs. Kral [26] has formulated the problem of program segmentation as a problem in pseudo-boolean programming. His paper consists of showing that the packing of a program can be reduced to the problem of solving a linear integer programming problem in zero-one variables. He concludes by observing that all the algorithms of pseudo-boolean programming can be applied to this problem. However, most of the known algorithms are not sufficiently effective to solve problems of the ranges involved in program segmentation. The essential part of this paper is the formulation of the linear integer programming problem. In this paper Kral gives no algorithm for the problem solution. Pinkerton r38] uses a Markov model for study of the problem of selecting the segments of a program. An optimization problem is formulated. No algorithm is given for the solution of the optimization problem. He concludes that the cost of an effective solution to the problem is not justified in terms of all the assumptions necessary in the building of the model. Kernighan [24] studies the problem of assigning subroutines to pages and graphical partitioning. He develops an algorithm for assignment of subroutines to pages such that interpage transfers are minimized under the following conditions.

16 1. The program graph is a tree 2. The edge weights of the graph are monotone non-decreasing on any path away from the root (initial vertex) of the tree. That is, if vi,vj, and vk are a sequence of vertices on a path away from the root then c(i, j) < c(j,k), where c(i, j) is the weight of edge (vi,v). 3. Cost on all edges leaving a vertex is the same. That is if vertex vi is the initial vertex for arcs (vi,vj) (vi,Vk),.. (vivQ), then c(i,j) = c(i,k) =... = c(i, f). From this basic procedure an algorithm for the more general case acyclic graphs, is. developed. Under similar restrictions on the edge weights of the graph and with duplication of vertices allowed, Kernighan gives a procedure for optimum assignment of subroutines to pages. The restrictions of his procedures to acyclic graphs is limiting if the procedure is to be applied to real programs. However, if a suitable methods were available for representing general directed graphs by acyclic graphs, then his algorithms would be a good initial step for program partitioning. Estrin and Martin [13] have given procedures for transformation of certain cyclic structures to acyclic graphs. Kernighan and Lin [25] have given a heuristic procedure for graphical partitioning. In this case there is no restriction on the type of graph or on the edge weights. Essentially a "hill climbing" algorithm is used to find an optimum partition of a graph into two sets of equal

17 vertices. The procedure is reapplied to the two subgraphs defined by the two sets of vertices. The procedure is applied as often as necessary to the resulting subgraphs until the required partition is found. Essentially, the idea of "hill climbing" is to start with an arbitrary two-block partition and move progressively to better two-block partitions until an optimum two-block partition is found. Better twoblock partitions are found by interchanging vertices between the blocks of a current partition; a gain g. is computed for each interchange. k The strategy is to maximize the sum E g over k interchanges. i=l When this is done, the resulting partition is treated as the initial partition and the procedure is repeated. When the maximum of the sum gi is zero, a local maximum is reached. Kernighan and Lin give ways of trying to improve this two-way partition. The procedure requires all vertices to be the same size and ways of dealing with unequal size vertices are given. When the graph is small an algorithm developed at Informatics [20] can be used to find an optimum partition for arbitrary graphs. This algorithm, backtrack programming, is essentially a method of implicit enumeration. A very simple heuristic algorithm is reported in [ 20]; in Chaper 5 we compare computer results using this algorithm. In the next section we note a final area of application for our quadratic programming problem.

18 1. 2. 5 Other Relevant Results A great deal of research literature has been devoted to problems of finding optimum groupings. The quadratic programming problem is closely related to many of these, such as the assignment problem graphical partitioning, and cluster analysis. In the particular area Of numerical taxonomy [46] and cluster analysis [21], [42], [44] we find a number of algorithms designed to give groupings of items according to some specified criteria. With cluster analysis the problem is to orgainize a set of data into homogeneous groups where such homogeneity is based upon some measure of associativity between items of data. We note that this problem is significant only when there is a rather large number of elements to be organized, this being on the order of a few hundred. With cluster analysis, there is a measure (objective function) to judge the efficacy of a given partition or grouping of the data. Much effort has been devoted to the problem of extremizing some function of the distance metric [21] which is defined on the set of objects to be partitioned. When the distance metric is given, the criterion for optimality will depend upon the given application. Typically one is concerned with extremizing the within group distances of the data objects to be grouped or clustered. For example let c.. be the metric distance between data items i and j of a set of items to be clustered into M groups. Suppose that the criterion is to maximize the within group distances. One common

19 way of proceeding (for example see [12]) is to divide the data objects into the two most compact groups and repeat the procedure sequentially until M groups are obtained. Finding the two most compact groups when maximizing the within group distance can be accomplished by solving the following. Maximize c.x.. + c CI(1-xi)(lxj.) Subject to 3X.> 1 i1 xi E 0,1}) for all i x. =1 data item i in group 1, 0 data item i in group 2. When M = 2 the solution to this problem gives the optimal clustering. However when this procedure is used sequentially, i. e., M > 2 the solution may very well be suboptimal. However the problem of dividing the partition into two groups where we have only the single constraint E x. > 1, is solved efficiently by the method of generalized multipliers and our algorithm for unconstrained optimization which we present in chapters two and three. In reviewing the literature for problems of nontrivial dimension we find no algorithms which guarantee optimality other than total enumeration. Therefore most clustering algorithms are concerned with a search for the best alternative among a small subset of alternatives. Rubin [42] reported an algorithm which gave "good" results for a

20 clustering problem consisting of 150 items. The algorithm can be classified essentially as a hill climbing procedure, as with Kernighan and Lin, which ran moderately fast for these types of problems; the reported times for this algorithm ranged from 3 to 12 minutes on the IBM 7094. To obtain a feel for the combinational complexity of problems of this type consider the following. Given a set of N objects, it can be partitioned into M disjoint subsets in S(N,M) ways vh ere S(N,M) is the Stirlings number of the second kind, S(N,M) K=M M-K M KN S(N, M: (-1) (K) 0 K For the following values of M and N Jensen [21] gives the following, S(25, 10) = 1, 203, 163,392, 175,387, 500 S(24,6) = 6,090,236,036,084, 530 So even with program partitioning, where some partitions are infeasible because of size constraints, we are confronted by an enormous combinatorial problem when we consider partitions for sets where N -- 125, M = 20, as is the case for a typical compiler. Lawler et al [32] reports an algorithm for partitioning logical networks into modules such that intermodular delay is minimized. The procedures are developed first for networks represented by trees and then generalized for the case of acyclic networks; optimal clusterings are obtained provided duplication is allowed of network elements. The procedure consists of an efficient labelling procedure which is similar to that found in [ 24].

21 1.3 Scope of Dissertation In chapter two we develop a two phase algorithm for maximization of an unconstrained quadratic function. The third chapter is devoted to techniques used to convert constrained optimization problems to unconstrained problems. Chapter four is a chapter of applications in which we give a model for program partitioning along with an algorithm for partitioning based upon solutions of the problem defined in section 1. 1. In chapter five we give some alternative strategies for program partitioning and some comparative results. Chapter six is an actual application of program partitioning to a large systems program. Finally in chapter seven we give conclusions and future research problems.

Chapter 2 2. Unconstrained Optimization of a Discrete Quadratic Function In this chapter we develop an algorithm for solution of an unconstrained quadratic function in zero/one variables. The algorithm will be the basic tool used in solution of the constrained problems defined in the first chapter. Basically the unconstrained problem is defined as: Maximize Y(x) = Z c..x.. - 3 b.x (2. 1) - ij 1J 1 j j j where we have c.. >0, for all i, j, b. real and x e {0, 1} for all j. In all our subsequent discussions we assume that cii= 0; since if a term c..iix is in the objective function it can be replaced by ciixi. We simply set such a nonzero coefficient c1i to zero and change the linear coefficient of xi to bi- cii. One of the main difficulties in the solution of (2. 1) i. e., obtaining a free maximum, is the dimension of the problems with which we are concerned. For solutions of problems with more than about 40 variables, conventional mathematical programming methods do not exist or are inefficient when directly applied. In this chapter we devise an algorithm for obtaining a free maximum which has two phases. The first phase of the algorithm is a direct approach; sufficient conditions are developed to determine if the variables of a given set are either zero or one in a maximizing solution. These tests are not exhaustive, i. e., not all values for all xi are found but they are sufficiently powerful to find a large number of the boolean values in a maximizing solution. Some computation 22

23 results are given using these tests where the test problems ranged from 6 to 150 variables and there were up to 600 nonzero coefficients c... The results of these problems where c.. and b. were randomly generated suggested that the first phase of this algorithm was sufficiently powerful to determine enough of the variables so that those remaining variables in a maximizing solution could be determined by more conventional techniques. In the second phase of the algorithm we deal with a problem of exactly the same form as (2. 1) but of considerably smaller dimension. In this phase we use an implicit enumeration scheme for determination of the remaining variables. 2.1 Phase I Branching Algorithm In this section we give the proofs of a number of sufficiency tests which allow easy determination of a subset of the variables which maximize (2. 1). The first phase of our algorithm is really a structured search of the space of feasible solutions (all boolean n vectors) where we can determine easily which paths are not to be taken if we are to determine an optimum solution vector x*. We start with a test to determine which variables are to be zero for a solution vector x* which maximizes Y in equation (2. 1). Note that a lower bound of Y in a free maximum is zero. This observation leads to the following theorem: Theorem 1. A boolean vector x* = (xl*...xn*) maximizes

24 Y= ci.jxix.- b.x, cij >0, bj >0 for all i,j, i,j 1 j - - only if x.*= Owhenever e(cij + cji) <bi Proof Assume that X = (-.x* 1x*,...,x* ) maximizes Assume that x* =(X*l,. i1' i' n Y and suppose that < (cij+ c i) < b Let us consider Y(x*) z + x ( (c..+ c..)x.* -bi) where z contains all terms of 1 i. 121i j j 2 1 Y(x*) not involving xi*. Now since (cij+ c i) - b. < Y(x*) = x +1. x ( (c + c..)x.* - b) < z + O hence x* cannot maximize Y. ij +c i Q. E.D. Theorem lb A boolean vector x* = (x*,oxn*) maximizes Y' = j c1 xxi - b.x., ci >, for all i,j ~ij 3 1ij only if x*= 1 whenever b < 0. 1 1 Proof Assume x* = (x* x*i x*. l x* *) maximizes Y and suppose that b. <0. Consider Y(x*) = z + X.* [ (c..+ cx*.-b.] r1ezcti _- i1 ci1 ji)x2 j 1 where z contains all terms of Y(x*) not involving x*i Now since bi < 0 wehave Y(x*) =z + 0* [(cij+ci)x* -b < z+l*[C(+C)* -bj ] Q.E.D.

25 We use these initial theorems as a first step (see flow chart of figure 2. 1) in our algorithm for finding a maximizing vector of (2. 1). We next consider additional sufficient conditions for setting a Boolean variable, xi, of equation (2.1) equal to one. Sufficient Conditions for x. 1 Given an x. we want to decide if it is desirable to set x. equal to 1 1 one. The main idea is when we set an xi equal to one, we should be certain that the contribution to the objective function is nonnegative; since we can always make a nonnegative (zero) contribution, with respect to a given variable, by setting that variable equal to zero in equation (2. 1). For example, suppose for some i and j, cij > bi + bj in (2. 1). Then we know that we can set x. =1 and x.=1, since in this case the contribution to (2.1) can never be negative. We express this sufficient condition more generally in Theorem 2. Definitions 1) We now define the following "coefficient sets" S. and S'+ 1 1 which will be used in construction of the algorithm for maximization of Y as Si= {j: cij O or cji.O} S = {j: c. - b > or c.. -b. >O}< Si 31 3- - 1 We note that if ci or c ji and bj are all zero then i; and for every i, i' Si+, since c.. = 0. n h sig s 11 2) We define the assignment sets A, A' and A. as i' ~~1

26 A = {j: optimum value of x. is not known} A'i = {j: c AO or c. j0, x*.= 1} A i ij ji' j - Ai = {j c i.0 or C ji 0 x*= O} A We note at this point that after application of lemma la, we have the following conditions satisfied for each x. not set to zero and i e A; (Cij. + c.ji) > bi and theorem lb guarantees that bi > O if i e A. jEA Theorem 2 if Z C(c..+ji) > b + j b j then - i + jo S. jE Si x *IlforalltE S andx.* =1. Proof'1 he Si Let S.'=,t2,...t. Notice IS I =m. We can.write the objective function (2. 1) as + (c+ ct) xtCx + + bx (C.Ctj + C X. )xt x ++ (c)h+ chQxxh - i, j Sij hj SS + i 1 + S (c+c..)x C x.. -b..x-i b.x. + b j] S+Cj Im 1 jjjS.+ CU{i}:+1 Let I c S+U {i} and y Y(X') where, x. =O for je I; Jxjt' =~ 1for j e[Si+ U{i}]-I; xj. arbitrary 1+U{i}.

27 Let y _ Y(x?") where x.i' =1 for je S.+U{i}; 3 1 x.1" =X.'t for j si.U+{i}; J 1 We claim y _ > 0. First let I = Si+ U{i}. Then we have yl_ yO= 0 +(Cij.c+ 1] j~i~x (Ctlj+C t)Xj+ c'+ c (c j+tl x+ j i+ O+ (Ctm + C tm)X]j j /si + ZS+% Qh+ jS+(cijc i) b- b. >0. e hS.+1 + 1 - +1 i The bracketed terms always sum to a nonnegative value, regardless of the values of x.'s. By our hypothesis the sum of the terms outside the brackets is nonnegative; hence we get that any maximizing vector x* cannot have all x.j*= 0, for j Si+ U{i}. We now show that no proper subset of variables with indices in Si+U{i}, can be zero in a maximizing vector x*. Suppose I is a proper subset of Si U{i}. Let us assume that I = {tl, t2... tr}, r < m. Note that in this case i % I. Then we have Y' Y =[Y (Ctlj+ Cjtl)x1+...+ j (Ci (+ ct )Xj.j j rSj ji si

28 + Z (CQhc+ ) + (i+ ct ) - bt f EI I I hES.+ Since for t.e I cS.+ i1 cit.- bt > or ct i-b >0 J J ti t and since the terms in brackets are always nonnegative, yl yO 0. Suppose the set I contains i, that is I {i,t1, t2,... tr}, then Y Y= + (C ji)xj + (C + C t)X * i i 1 +... + (ct j+ cjt )x] + A (c(j+ c ) 1 S + j S. + j +(cij+ cji)bi bt j ES. t.EI j 1 3 Now we know that the bracketed terms have a nonnegative sum and Z +(cij+cji)-b- btj > +(.ij+c.i)- b- b.> O. + ~ t.EI j jES.+ jeSi J 1 i Therefore Y'- YO > 0. Hence we see that no subset of variables of x with indices in Si+U{i} can be zero for any x which maximizes Y. This completes the proof of theorem 2. The motivation for our next lemma results from examination of the set of assigned variables Aj', for every unassigned variable x., j E A. Suppose we:have unassigned variables x. and there exists a 3

29 qj such thatc jb andx* -1=, i.e., q A.' Then we see thatx. can be assigned the value one, since we are assured that we can get only a nonnegative contribution to Y, our objective function. Writing this condition in a more: general form gives theorem 3. Theorem 3 If A (cij+ cji) > bi then x.* = 1 je i Proof: Obvious. The next theorem is a generalization of the theorem 2, in that we consider several sets Si+ simultaneously. Theorem 4 Let Lc A, and let P=U Si+; and suppose L c P. If for each i e L there exists j and k in L, j/ki, such that i e Sj+ and j e Sk+, then x = 1 for all Je P. Proof Let I c P and let I1 = I n L, 12 = I - I1. Let Y Y(xt) where Xi. =1 i P xi' arbitrary i j P. Let Y Y(x") where x. =0 iEI x."= O iE P-I 1 1i = xi' i P

30 We claim y1 - Y > O for all I c P. y -Y~ Z c jxj + C.-X.C.-+ C..x- b Z c~+ Zc 31 Z I 13 i 3I iUI iCI ieI j$P jgP jeP jeP The terms outside the brackets are nonnegative. We can write the terms within the brackets as 1C.. + C. b+ C+ ZcZ C b eI1 icI2 i2 iI2 jeP jeP jeP jeP Now for each i e I>, since I1 C L P, there exists a je L such that i e Sj, therefore, we have either cij..> bior cji.>bi. By hypothesis we have that there exists a k e L, where k i such that j c Skj. We then have a distinct c jk> bj orkj > b.. Therefore, for any i and j in I1 we have b. and b. "covered" by distinct c's. This means the first of the bracketed expressions is nonnegative. Suppose i I2, then there exists a j such that j E L, ie S. Now j jI2 (by construction), therefore, for each i e I2 there is distinct cij or c..i such that either cij > bi or cji > bi. Thus the second of the bracketed expressions is nonnegative and y- Y > O0. This completes the proof of theorem 4. Definition An index i is said to meet the adjoint test with respect to a set of indices P if

31 (cij + cji) > bi jEP After we find a set of variables for which the optimum value is known, we readjust the constants bj to the new value bj' j bj.= bj- (c.. +c Ci) J A 1E j If the modified value bj' of bj is negative, i. e., the index j meets the adjoint condition with respect to the set A'j, it indicates that the test of theorem 3 is successful. In that case the corresponding x.* is set to one. If, however, the test of theorem 3 fails, the corresponding bj is replaced by the modified bj' and we continue our search for optimum values x.*, j e A. Definition s++ _ {j: there exists a sequence i, j J2''jm' j such that jl e Sij, j2 Sj,',j m SJ,je S } Example S1++, S3++ Let S1+= 3,4, 5}, S2+ {6}, S3+ {2,7} S4+ {9} ~S5+S6,=S7,=S9,=,, =++ S+= S 6 S1 2,3,4, 5,6,7 9} S3++ = {2, 6, 7}.

32 Definition Let P cA be a set of indices. A subset M. cPis said to be a mutually covering set (MCS) with respect to P (i) if M.=s.{j}, s. cS+ (ii) if i S and k P then i Sk+ iff k= j; (iii) j Sk+iff kE s. Definition Let P cA be a set of indices; an index j is said to be "covered" in P if there exists an i -P such that j e SiT. If, in addition, there exists a k e P such that k A j and i Sk+, then j is "distinctly covered" by i. Finally, if there exists a set K c P such that IK I> 1 and for each ke K we have je Sk, then the index j is "multiply covered" by indices of K. We first prove a few lemmas concerning MCS which will help us show additional sufficiency tests for x.* = 1. Lemma la ++ -+ Let i e S+ If S does not contain a MCS with 1 1 cardinality greater than one, then each element j eSi is distinctly covered by some element k E Si++.

33 Proof Let j ES.++ and assume that j is not distinctly covered. Let T cS++ be the set of indices which cover j. If IT > 1, then j is multiply covered. Now since j is not distinctly covered we must have for each tc T that t e S.. Now for at least one t e T there must be a 3 kc++,kj suchthatteSk; otherwise the set M.= {i}UT is a MCS in with cardinality greater than one, which is a contradiction. But note that if for some t ET we have a k j i and k Si such that t ESk, then by definition t distinctly covers jo Q.E.DO Lemma lb Let M. and M. be two MCS with respect to P. If i Es. then, Mi= {i} if IM I>2 and M.=M. if IM. I = 2 Proof In addition to i, let k, k$ i be an element of M.. k has to be equal to j, otherwise i es. and i e Sk+ which contradicts the fact that M. is a MCS. In the case that j=k, k cannot be in Mi if M.j I> 2. For, there is an RfM, cM i, j, such that ij E S + which contradicts that M. is a MCSO Q.E.D.

34 Lemma 2 Let M. and M. be two MCS with respect to P. ] 1 j i, then s. nsj = 00 Proof Case 1 IM.i = IM.j =1 obvious0 Case 2 [M.j >1 and i E s., The result follows from lemma lbo Case 3 IMj.I > 1, isj, j si. Suppose that si s.j 0. Then Q E s.i s. l, j i i. It follows that Sj and f E Si+ which contradicts condition (ii) of the definition of a MC S. Q.E.D. Lemma 3a Si =( U S.) uSi 1 s + J jESi Proof If j Si++ then by definition there exists a sequence i, j1,j2, **, ij,' such that e Si+ e S+.., j e S There are two cases to consider.' 2 Case 1 t= 0; in this case j e Si+. 1++ Case 2 t/ 0; in this case j e S.++ i e S+. J1 Hence we have S++ c ( U S+) S. i+ If j e (U St+) u theneither e U ++ or S. + No if j e Si+, then obviously j e Si++ Ifje U St++ then j e S++ te S /:,q+1

35 Therefore there exists a sequence t, j, j2.. m' j where t Si+ ill i2 Jm' wi+ 1e St i-2 e S.,..,j S. and j e S. Therefore ( U S. +) u S.+ c Si++. jeSi 1 Q.E.D. Lemma 3b ++ ( U +)US + ++ j jeSi Proof Let P = (U S ++) u S.+ by previous lemma 3a we have jeSi s.= ( U S. ) u Si+ and since S.+ c S++, Si++ c P. The proof of the lemma is complete if we show that Si++ D P. Let k e P then there are two cases. Case 1 ke Si., then obviously k E Si++ Case 2 k Si+ andk e S++ for some jE Si+ in this case there is a sequence i, 1' j 2, j, kl' k2''* kn' k such that j1 E S +,...j k S k S., k e S Therefore we see n that k e S.++ and also P c S.++. 1 1 Q. E. D. Lemma 4 ++ ++ If i e S.++ then there exists a L S such that S 1. S. ++ U S. 1 jeL J

36 Proof We can prove the existence of such a set L by showing that S.++ is one such set. Let P = U S. By definition S.+ cS'++ ++ ] ] ] jE Si p =U S + c(U S.+) c{ U S. )u s i+} si Je ii JE jSiS jESi++ j by lemma 3b. The proof is completed if we show that S.++ P. Let 1 k ESi++ ie., there exists a sequence i, k,k2 o,kn k such that, 1. 2,' k ES Sk+ There are two cases to consider. 1 n Case 1 n= O; in this case k Si and since i Si, k e P. 1 Case 2 nA,0; inthiscasekESk +andkneSi andkEP. n Hence S.++ c P and the proof is complete. Lemma 5 Let P = Si++U {i or P = Si AMCS M. (IM. I > 1), exists with respect to P only if i ESi. If since M. exists in P, then i E M.. Proof First let us show that if i Si++ then no M. can exist with respect to P such that IM. I > 1. We first note that j cannot be equal to i. Otherwise, IM. I > 1 implies that there exists a k e s. such that i E Sk+ and hence i E Si So M.j P and j 4 i. Notice that ib Mj, otherwise iE sj, i+ i e stise i ++ ++ which implies that i ES, and sincejcSi'. Sj ++~~ it would imply that i ES.+

37 Let k ~ j i be an element of M.. Since k eSi++ there exists a + +k + sequence i, j2'. Jmk such that i eS,. jmES ~m-1 There are two things to be noted. First, mAO 0. Otherwise, k eSi+ which cannot be the case because by definition of a MCS if k eMj, kA j then k eSi+ if and only if i = j. Secondly, jm=j, because again k eSj and k iM., k4 j, imply by definition of a MCS that j= j. If m= j then by definition of Mj, jm-1 esj, and consequently in-2= j. We continue until we get to jlo At this point, depending on whether m is odd or even j = j or jle sj. If J = j then we have j ES which implies that i eMj, which we have already shown not to be the + + case. If j1E Si and since j S,+ according to the definition of a MCS i has to be equal to j, which we have shown not to be the case. Therefore we have shown that if ij Si++ there is no M. c P such that IMj > 1. Now let ieS++ i.e., P =S.++ Let M. cP such that IM. I > 1. If j=i, ieMj by definition of a MCS. If j i there is a k Mj, kj; if k = i the second part of the lemma is proved. If k ~ i there exists a sequence i, jl,j2,..., mk suchthat jli SjE kSjm If m = 0, by the same arguments as previously used, i = j. If m/4 0, jm= j as previously shown. Again by the above arguments either jl= or j e sj. For the former case i eMj; for the latter i = j. This completes the proof of lemma 5.

38 Corollary A set M. such that IM I > 1, is a MCS with respect to P = Si++ U {i if and only if it is a MCS with respect ++ to S Proof Let M. be a MCS with respect to P = S++U{i where IM.I > 1. J 1 j We claim that M. is also a MCS with respect to S.++. We have by lemma J 1 5 that if M. exists with respect to P that i ESi++ in this case Si++= P hence it follows that M. is MCS with respect to S.++ 3 1 If M. in a MCS with respect to S.++ then since Si+ c P it follows J 1 1 directly that M. is a MCS with respect to P. Q.E.D. In defining MCS with respect to a set S.++ it is possible for sets M. Mk of cardinality two to be identical. This can be seen from the following example. = {23, 4} S1 = {1,5} S + 05 s52= 0 The set S1+= {1, 2,3,4,5} andM1 and M2 with respect to S1are both {1, 2}. Mutually covering sets Mj with respect to a set ++ are shown to be unique under certain circumstances by lemma 6 shown to be unique under certain circumstances by lemma 6.

39 Lemma 6 Let M. be a MCS with respect to Si++ such that IM.I > 1. (i) If [Mj I > 2 then M. is the only MCS of cardinality greater than one in Si++. (ii) If IM. I = 2 then there are exactly two MCS of cardinality greater than one namely M. and Mj; furthermore M.=M.. 1 3 Proof Let Mj and Mk j 4 k ~ i be MCS in S.++ both of cardinality greater than one. According to lemma5, i e M. and iMk, i.e., i sjsk. But this is impossible by lemma 2. If k = i, then the lemma follows from the fact that i E Mj and lemma lb. Q. E. D. We note that if i E S.++ it is not necessary for a MCS of cardinality greater than one to exist. This is shown by the following example. Example S+ = {2,3,4} S++= {1,2,3,4 S + 2 s2 = {1) %3+= {2} No MCS exists in S ++ with cardinality greater than one. No MCSexist in. 1 o

40 The following lemma is useful in making certain that there exists no MCS Mj, Mj >1 in Si++ Lemma 7 If for each j S i Sj+ then there can be no M. CS++ such that IM. I >1. Proof Obviously Mi cannot be in S++, Mi > 1, if for each j E Si,i +' Sj Suppose for j ~ i a MCS M. exists in Si++ where IMj I > 1. By lemma 5 i e M.. Now by (ii) and (iii) of definition of MCS we have j e S.+ and i e S, but by hypothesis if j eSi., i Sj+ 1 j S, hence a contradiction. Therefore no MCS M. exists such that IM. I > 1. Q. E. D. Theorem 5 If i ESi+ andno M CSi exists such that IM. > 1, then x.* =lfor all j Si. Proof This follows from lemma la, lemma 4, and theorem 4. Q. E. D. The next set of Theorems deal with those cases involving Si++ both when i E Si++ and when i B Si. Theorem 6 Suppose thlere exists a MCS, MQ, with respect to ++ such that IMQI >1. Then if there exists an index 1i

41 t e M such that Z (Ctk+ ckt) > bt' k E S++ S + kS t St then xk* =1 for all k e Si Proof We have i e Si++ and suppose that there exists a MCS, Mi, in Si++ such that I I> 1>. Let t e M such that the condition of the theorem holds. Let Y =y (x') where -y (x ekES. Xk' -1, ke S i+ Xk( is arbitrary k $ Si. LetY = Y(x") where Xk"-=0, ke ICS.++ ++; k e S I; x " =XkT otherwise The theorem will be proved if we can show that yl- Y > 0 for 44 1 0O all I S.++ ConsiderY - Y= ++ (cjk + ckj)xi. c k) ++ E~I kESE The terms outside the brackets are nonnegative. We have two cases to consider. Case 1, tfI. We can then write the terms within the brackets as follows:

42 (a) E (ck.+ c k + (ckj+ ck)- bk- Z bk k eInM k eI-M k eIj('kMi k kI-M k j es++ jeSi - M qI Consider any k e I (nhM if k # Q then this index k is "covered" by Q, i.e., Ck/ >bk orc >kbk. If k =l, then sincete M Qandt/II, we can have t as a unique coverer of Q, i. e., ct~ > b or ct > b~Now consider any k e I - M/, since k e S it is clear that there exists at least one j e S++ "covering" k, i.e., Cjk -. bk or Ckj>- bko We note since there is a single MCS in S++, with cardinality greater than one, there is no chance that {j,k} form a mutually covering set with respect to Si++ It therefore follows that expression (a) is nonnegative. Case 2 t I. Now we write the terms within the brackets as (b) (C.- 4-C + (C b +T ~ + cct).btl Z k~C+ k 3 (kjk+ck k { ++ (ctj+c } kEICM 3 kEI-M keI-{t} j eSi. -St t jESESI ++I 1 t,+ (t+ C) j+ c t jt Each k e I QM/~ k/ t is covered by index f by the same argument given for case 10 Similarly all indices in I - MQ are covered again by the same argument given for case 1. Now t is the only index which is not covered but we see that the terms in the braces are nonnegative by hypothesis. It follows then that expression (b) is nonnegative since t always covers the index C. Q.E. D.

43 Theorem 7 Letii Si.+ and let + +(c+c i) bi 1 ++ - 1' 1 1 then x.* = 1 for all j e S. ++ U {i. j 1 Proof Let P = S. ++'J{i} and let Y = Y(x') where x.'-1 je P x.' arbitrary j / P 0 Let I be an arbitrary subset of Po Let YO= Y(x") where x.5' =O j e I CP, 1 j P - I, x." = X. otherwise J J 1 0O We claim that Y' - > 0 for all I cP. jylyO = (ckC)k + (I jk+ ckj jeI kP k EP The terms outside the brackets are nonnegative. It remains to show that the terms within the brackets are nonnegative. We consider two cases; the first case considers the index i to be in the set Io We then write the terms within the brackets as followso (j cc (c.cj.Si 3- ick Now by hypothesis we have

44 jE +.(cij+c.ji) > bi; 1 1 it therefore remains to show that the rest of the expression (a) is nonnegative. We can write Z b.= b.+ b.. j e - {i} J je(I- i) nsi+ J j eI- {i}- Si+ J Now if j e (I- i) nS.i then there exists a k e P, viz i, such that cjk > b 1 j or ckj > bj; This j will then be "covered" by a term in the first summation of expression (a). Suppose j e I- {i} - Si+,then since by lemma 5 there does not exist a MCS, Mj, with respect to P such that IM. I > 1; we then have j "covered" by a Cjk where k e Si++ It then follows that expression (a) is nonnegative and we have completed the proof for the case i E I. The next case considers i' I. In this case yl_ y0 _ C (jk + ckj) Xk + (cjk + ckj) - C b. j EI j CI j EI k VP EkjP As before the terms outside of the brackets are nonnegative; the terms within the brackets can be written (b) C (C.k + e j) + b b jk,j)+ C (cjk+ Ckj) - b.-k k j 1 EIS jkkj jEI(TS. j EI-S. j E I PIS.+ j EI-SQ e I S. EI- S kE P kE P-InSS. If j E I NS.+ then j is "covered" by i. If j E I - S+, then there exists a k E P such that Ckj > bj or cjk > bj and since there is no MCS in P j does not cover k uniquely. It then follows that (b) is nonnegative. Q. E.Do

45 Theorem 7 is the last test in the first phase of the algorithm. A flow chart of the algorithm is presented in figure 2. 1. In figure 2. 1 we see that the conditions of theorems la, lb, 2,and 3 are tested in the boxes numbered 1, 2, and 3 respectively. In box number 5 we use lemma 7 to facilitate the testing for conditions of theorem 5; this is shown is box 5. The tests of theorems 6 and 7 are represented by boxes 6 and 4 respectively. 2. 2 Computational Results Using Phase I Table 2. 1 summarizes the results of the computations for a series of test problems. The coefficients c.. and b. were randomly selected using a standard random number generator. To avoid trivial solutions all b. selected were positive. The number of variables for each test J problem is designated by N, the number of distinct coefficients cij in the objective functions is represented by M. The number of variables not found by the first phase of the algorithm is A. In table 2. 1 we list the maximum number not found in any of the five test problems. The times shown represent the running time of the algorithm written in Fortran running on the IBM 360 model 67. Table 2. 1 shows that the average running times ranged from.02 seconds for the 10 variable problems to 9. 27 seconds for the 150 variable problems. The growth of the running time with N does not appear prohibitive. Due to the nature of the directed search of the first phase of the algorithm the more significant growth parameter is the number

Start Reduce setA untilall h iEA meet condition of theorem la. Check theorem lb.'and if necessary adjust b.'s For iEAA generate S.+ o n dit~io n S Reduce A of theorem 2 hold + IAI 0 N Readjust b.'s II if theorem 3 applies Further reduce A Figure 2. 1

y For iEA generate Si++ I r Reduce A i + Y AS S. ++ - S. -A and IM.!>1 Theorem 5, 6, 7 N ++ y onditions onditions S.l S;++ U ii S. of theorem 7 of theorem 6 hold? hold? NI~~~ IN~~ s.+h e Get next iEA tt~enerated for all iE~ Figure 2.1 (cont.)

48 N M |Problems Average Time, Maximum Time Max IAl (Seconds) (Seconds) 10.15 5.02.05 3 25 50 5.11 1.18 0 50 100 5.58.82 2 75 200 5 1.40 1.98 2 100 300 5 2.36 3.84 0 125 400 5 4.43 6.37 2 150 600 5 9.27 12.16 0.,. I...,.I-_ I. L. i Table 2. 1* After this table was constructed we did generate one example consisting of 150 variables and 500 coefficients where [AJ was 17 at the end of phase I.

49 N =150 5 40 / * 1: I I, I.., 100 200 300 400 500 600 NUMBER OF COST FUNCTION COEFFICIENTS Figure 2. 2

50 of coefficients M in the cost function. For N fixed at 150 figure 2. 2 shows the variation of running time with M. The solution times ranged from 1.78 seconds M = 100 to 5. 17 seconds M= 600. The dashed curve was fitted to the data points. The time is seen to vary linearly with M. The slope is 5. 0 milliseconds per coefficient, which says that for N= 150 about 500 milliseconds in additional phase I solution time is needed for every 100 additional nonzero coefficients in the objective function. In summary it appears that the algorithm is quite effective in reducing the dimension of a quadratic function as defined by equation (2. 1). It seems that the tests are sufficiently powerful to allow use cf this algorithm for problems with a large number of (0,1) variables, especially in those cases where the coefficient matrix [cr) is rather sparse. 2. 3 Phase II The purpose of the first phase of the algorithm is to reduce the dimension of the problem. Once this has been accomplished we begin the second phase of our algorithm. In the second phase we obtain the optimum values for those indices which remain in the set A, therefore the problem with which we deal is z c..x.x.- X b.. je A Note that some b.'s have been modified as suggested on page (31),

51 since many of the variables will be determined in the first phase of the algorithm. Since the dimension of the problem was small after application of phase I of the algorithm, we could use an enumerative procedure for finding the remaining variables. In section 2. 5 we discuss an alternative approach when the number of variables is large. 2.3. 1 Optimization by Implicit Enumeration In phase II of our procedure we turn to implicit enumeration to determine the value of the remainder of the variables. We now review a rapid technique for implicit enumeration. We use a technique for searching the solution space based upon an algorithm of Lawler and Bell [31], developed for purposes of solving discrete variable optimization problem with not a large number of variables. The idea is based on the observation that if the solution vectors can be partially ordered, and the objective function and constraints are monotone nondecreasing functions of the variables, a large number of potential solution vectors can be skipped in a systematic search of the solution space. 2. 3. 2 Partial Ordering of Binary Vectors We define a vector partial ordering as follows. Suppose - = (1' 2' En) and m = (ml' m2..., mn) are vectors we say

52 2 < m if and only if Qi < mi for i 1, 2,...,n. For example, if 2 = (10010) and m = (10110) we have 2 < m. We can associate a numerical ordering with a set of binary vectors by associating with each vector 2 the corresponding integer n-1 n-2 0 N()= 2+2 + -22 +*0+ n ~ Observe that 2 < m implies N(2) < N(m). However, N(2) < M(m) does not imply f < m. If we list all binary n-vectors in numerical order (0....,0,0) We observe that following an arbitrary vector 2 there may (or may not) be several vectors 2' with the property 2 <2'. These are vectors which differ from 2 only in that they have l's in one or more of the right most O's of f. For example, the vectors following = (0, 1,0, 0) in the numerical ordering are (0,1,0,1), (0,1,1,0), and (0, 1,1,1) and each is greater than 2 in the vector partial ordering. A vector f* is denoted as the first vector following 2 in the numerical ordering with the property that 2 _ Q*. For arbitrary 2 the vector 2* is calculated in the following manner:

53 Consider f to be a binary nlumber: (1) Subtract I from I (2) Logically "OR" I and f-1 to obtain l* - 1 (3) Add 1 to obtain Q*. An example 1~ = 10010 f_-1 = 10001 Q*-1 = 10011 = 10100 Note that the vectors in the interval [ Q, l*-1] are all partially ordered as f < _ + 1,.I., < * - 1. It is this partial ordering which allows skipping in the enumeration procedure. 2. 3. 3 Partial Enumeration with Quadratic Function Consider the objective function Y: c..x.x.- b.x.,j 3 ij j IJ We can write Y as the difference between two nondecreasing functions of x= (xl,...,xn), Y = gl() - g2(x) g1(x) i= jxx j >0 g2(x)= bxj, bj > 0, since 1, 1 - j those cases where b. < 0 will be eliminated in phase I. 3

54 Suppose we examine all 2 possible solution vectors in numerical order starting with (0,0,..,0) and ending with (1, 1, 1,..., 1). The process can be considerably shortened by using the following observation. Suppose we test each vector in the numerical ordering keeping track of the best solution so far found. Let Z be the solution which has maximized Y(x). Let f be the solution currently being considered. We use the following rule to determine if we need test for any solution vector in the interval [ Q, * -. If gl(**- 1)- g2(_) <Y(Z) then skip to _*. This is because gl(/*- 1) maximizes gl in the interval [ A, /*- 1], and I maximizes -g2 in that interval. Therefore it is impossible for some _' e [, *-1] to exist such that gl(e') - g2(') > Y(. Figure 2.3 gives a flow chart of this phase of the algorithm. 2.4 Phase I Computation Efficiency In this section we analyze the first phase of the algorithm for unconstrained maximization. We attempt to assess the amount of computation involved for a problem consisting of N variables x. with M distinct nonzero coefficients c... The algorithm begins (see figure 2. 1) with tests for conditions of theorems la and lb. For theorem la for any variable x. it will be necessary to examine at most M coefficients for execution of this te st. If all variables meet conditions of this test, then the operations

55 (0, O...O0)-* 1 _ -c —- (1) Fiue. Figure 2.3

56 are performed once for each variable x.. In the case that a variable is found for which it is necessary to set to zero by the conditions of the theorem la, we are then required to remove all coefficients involving this variable and then recheck the conditions of the theorem for all indices i which comprise the set A. The maximum number of times this must be performed or an upper bound on the passes for this test is N. The total growth then is at most MN for theorem la. Table 2. 2 summarizes these results. The check for theorem lb involves examination of the coefficient b. for each variable x.. For any variable this involves a single operation; and it must be performed for at most N variables. The check for theorem 2 for any variable xj requires generation of the set S.+ Generation of a set S.+ requires at most M operations. j J In box 2 figure 2. 1 we check conditions for a given x. after generation of S.+. If conditions are not met we might require this test for at most N variables or we have a maximum of N possible executions of the operations for theorem 2 for a given pass. Whenever we find values for variables via the later tests, theorems 5, 6, 7 we eventually return to the entry labelled acin figure 2. 1. We therefore are bounded by N on the number of times we return to perform theorem 2. This gives a total growth rate of MN for this test.

57 To perform the test for theorem 3 for a given variable it is necessary to perform at most M operations. This test is performed when other tests have found values for variables with indices in the set A. If the test for theorem 3 for some variable x. is successful, then the test is repeated for all remaining variables in set A. This means that during performance of this test for a pass it might be performed at most K times where K is the number of indices in set A. In table 2. 2 we see that a maximum growth for this test is MN3. The tests for theorems 5, 6, 7 involve generation of the set ++. Essentially a set Si+ for any i where Si+ = {j j2'. j in} n is obtained as follows. We first construct r = U S.+ Now for t=l jt any I E F such that fl Si+ we augment by replacing r by r US We continue until it is no longer possible to augment and this r= S ++ Construction of an Sj+ involves at most M operations; this must be done for at most N indices so maximum operations for construction of S.++ is at most MN. The checking for a MCS for a given set Si++ involves examination of at most (N-1) M terms. Therefore the number of operations for the tests for theorems 5, 6 and 7 involve the MN operations for construction of Si++ sets plus at most (N-1) M operations for a check for MCS's and a maximum of M(N-1) operations for theorem 6 and a maximum of N operations for theorem 7. Since this test could be required for N variables

Test A Maximum Number Upper Bound Upper Bound Rate of Operations on Execution on Passes Maximum Per Pass j _ Growth Theorem la M N N MN Theorem lb N 1 N N Theorem 2 M N MN ITheorem 3 M N2 N-2 M Theorem 5, 6, 7 NM N N-3 MN3 Phase I Computational Complexity Table 2. 2

59 in the set A and repeated at most N-3 times when some test is successful. An upper bound on the growth for these various tests is N2 M(N-3) or operations on the order of MN3. 2. 5 Comparison with Other Methods There are few other known methods which deal with the unconstrained problem as defined by equation (2. 1). Of course the algorithm of Hammer and Rudeanu [19]for general boolean functions could be used for solution of this problem. But this algorithm is highly algebraic and would not be efficient for application to problems containing a large number of variables. Recently Lawler [30] and then Balinski [4] have discussed a model developed for making optimal selections which could be used as a vehicle for the maximization of (2. 1). Their formulation is as follows: Let there be m items i=l, 2,..., m; and suppose there are n subsets of the m items S1, S2'... Sn. Associate a value dj with a subset Sj and a cost b. with an item i. There are no constraints on the collection 1 of subsets {Sj, j=l, n} of the m items. It is desired to choose a collection of subsets together with all items which belong to this collection of maximum value. This is done by solving: Maximize d.v. - b.u j 1 subject to vj - ui <O whenever i e $S. u;vj e ui 1. u..c {0,i}.

60 u. = 1 if item i is chosen. = 0 otherwise. v. = 1 if all items in subset S. are chosen. - 0 otherwise. If we define a set S. for each coefficient c k > 0 of (2. 1) and assign the value cfk to d. and likewise each coefficient b. in (2. 1) is associated with the cost of item i then the analogy is clear. They then show that this problem can be solved by graphical network techniques. A directed network is constructed whose nodes consists of a source note s, and a sink node t, and a node for each item i and for each subset of items S.. The arcs of the network consists of arcs (s, S.) for each set Sj and (i, t) for each item i. Each arc (s, Sj) is assigned a capacity d. and arcs (i,t) are assigned a capacity b.. If i e Sj then an arc of infinite capacity (Sj, i) is constructed in the network. d t b 2 -2' S 2 ss'item'n subsets items

61 A minimum cut of a flow network is defined as a partition of nodes into two sets A and B, where the source node s e A and the sink t e B. The subset of arcs incident to a node of A and one in B is called a cutset. If these arcs are deleted from the graph all paths between s and t are removed. The value of the cutset is the sum of the capacities on the arcs (i, j) with i e A and j e B. It can be shown (see [4]) that the minimum cut of the above flow network corresponds to the selection of the sets from {Sj, j=l,..., n} and associated items of maximum value. For such a minimum cut the optimum sets from {Sj, j=l,..., m} are identified as those sets S. which are still connected to s; the set of nodes still connected to the S. identifies the items. J We now show how an unconstrained maximization can be obtained using this network flow formulation. We solve the following problem Maximize f= 5xx2 + 3 + x2x 3 - x2 - x2 10x3 If we first solve this problem by our algorithm we get x3 = 0 by theorem la; the problem then is reduced to: Maximize 5x x -2; whic2 1 2 x2 which we havex1 = x2 1 by theorem 2 and f* =3.

62 Formulation of this problem in terms of a flow network gives 55 1 s 1 - S= {1, 2}, S2={1,3}, S3 = {2,3} A minimum cut consists of arcs (s, S2), (s, S3), (1, t), and (2, t) Therefore an optimum set of subsets is identified with S1 since it is still connected to node s and items 1 and 2 which are connected to node S1. This set has value of 5 - 1 - 1 = 3 which is the same answer we obtained with our algorithm. The algorithm for finding a minimum cut of network with P nodes has growth on the order of P for arbitrary values of arc capacities (see [30]). In solving (2. 1) with N variables and M coefficients the corresponding network will have N+M+2 nodes. This means the algorithm will have growth rate proportional to (N+M+2)5 for solution of the unconstrained problem. We found that our algorithm has maximum growth of MN3 at worst. However, this algorithm does not find all the variables of a

63 maximizing solution. So in this regard a network solution would be superior. However, in all problems solved the branching algorithm found at least 85% of the variables therefore the problems which remained after application of the algorithm were of trivial dimension. The network flow solution technique could be used for the second phase of our procedure for unconstrained optimization since it is clearly superior to implicit enumeration. If the coefficients in (2. 1) are limited to integers then the corresponding flow graph problem can be solved with a growth rate proportional to k(N+M+2) 2. Clearly this would be a more efficient algorithm for this case. In conclusion this chapter has presented an algorithm for the optimum solution of a quadratic binary function. The quadratic function arises from problems in operations research. The major part of the algorithm consists in reducing the dimension of the quadratic function which is to be extremized to a size which can be handled by conventional enumerative procedures. The first phase of the algorithm has been found to be very effective in this regard.

Chapter 3 3. Techniques for Constrained Optimization In this chapter we discuss techniques for solution of constrained optimization problems of the following form, Maximize f(x) = cixi. b.x 1, ] 1 J j. J subject to (A) gi(x) < i i=1, 2,..., m = (Xl,X2,.. X n) xi. {0 1} i= 1,2,... n cij0, ij =J1,2,... n The method used for solution of a given problem defined by (A) will in most cases depend on the nature of the objective function f(x) and the constraints gi(x). This is particularly true when either the objective function or the constraints are non-linear in x. and/or there are a large number of variables. In this chapter we consider the case where f(x) is quadratic in x. and the constraints gi(x) are linear. For a review of techniques involved in solution of (A) when both the objective function and constraints are linear see [3 and [18]; and for a discussion of techniques with no particular restrictions on f(x) and g.(x) see [33]. For the problems with which we were concerned, i. e., m=l or 2 in problem (A), we found the generalized Lagrange multiplier method of Everett [14] and the family of solutions techniques of Hammer and Rudeanu [19] to be effective when dealing with problems with a moderately large number of variables. With both these methods it is assumed 64

65 that one has available techniques for dealing with the free or unconstrained optimization of a function of the same form as f(x). We use the algorithm developed in chapter 2 for the unconstrained optimization. The generalized Lagrange multiplier method is in a sense more general than the family of solutions technique since with the former there are no restrictions on either f(x), g.(x), or the domain of optimization, while with the latter the constraints gi(x) are required to be linear in x. before the family technique can be applied. We begin with a discussion of solution of constrained optimization by the method of generalized Lagrange multipliers. 3 1. Generalized Lagrange Multipliers One effective method for constrained optimization problems is the technique of Lagrange multipliers. A rather recent generalization of this technique to problems containing non-differentiable functions has been made by Everett [14]. We begin by presenting his theorem. For the statement of Everett's Theorem we first need to state problem B: maximize f(x4 (B) subject to x = (x1,x2...,x) e X.

66 Everett's Theorem: For the Lagrangian m L(x,X) =f(x)- 3 xig(x) where f(x) and gi(' are the same as in problem B and X = (X1,..., XM) is a set of non-negative Lagrange multipliers, if x* E X is a global maximum for the Lagrangian L(x, X), then x* is the optimum solution of problem B over all x e X if /i is replaced by gi(x*) for i= 1,2,...,m. Note that there are no restrictions on the form of the functions f(x) and gi(x) or on the nature of the set X over which the maximization is to be performed. Proof: Assume that m m f(x*)- Xigi(x*) >f(x)- 3 Xigi(x) i=1 i =1 for all xE X subject toXi > i =1,2,..., m. We can write m f(x*) > f(x) + Xi(gi(x*) -gi(x)) i=1 Since for each i we have Xi > 0 and gi(x*) > gi(x); it follows that f(x*) > f(X), which proves the theorem. If gi(x*) = gLi for each i, then the problem (B) has been solved. However, for arbitrary choices of Xi's this in general will not be the case. The procedure, then, for the solution of problem (BH) is to solve first Everett's problem with arbitrary Xi's and if all gi(x*) f Pi, choose

67 another set of multipliers, optimize the new Lagrangian, and recheck the constraints. The procedure is repeated until for each i we have gi(x*) sufficiently close to gi. The value of the Lagrange multiplier method depends upon how practical it is to optimize the Lagrangian, and on the efficiency of selecting a set of multipliers which give values of constraints which are reasonably close to the given pi's. A second result Everett obtained was that for a given i,if X. are fixed for j, i, gi(x*) is a monotone nonincreasing function of Xi. This allows interpolation to be used to select the.i which gives the desired value of pi. We must add however that there may be values of,i such that there does not exist Xi which will give an x* such that g.(x*) = ji Everett refers to such a situation as a "gap" and gives some modifications to his basic theorem to determine solutions in such cases. Figure 3.0 is a schematic of the procedure for solving a constrained optimization problem by the method of generalized Lagrange multipliers. The procedure consists of three steps. First a set of X L(x,X) x* h gi Figure 3.0

68 nonnegative multipliers is selected. This gives rise to a particular Lagrangian L(x, ). The next step consists of finding x*, a vector which maximizes L(x,X). Now each constraint g.(x*) is evaluated. If each gi(x*) has the appropriate value, by Everett's theorem, the problem is solved. If not, a new set of multipliers must be selected. This is indicated by the function h which takes the current multipliers X and the current values of constraints gi(x*) and uses this information to select a new set of multipliers. The particular mapping h used for the case of quadratic knapsack problem is defined in section 3. 2. This procedure is continued until either a set of constraint values is found which are appropriate or until it is discovered that no X exists which will give the values of gi(x*) within the required range of g i' In general it is not possible to say apriori if a set of multipliers exists which will give a solution to a given constrained optimization problem. However, when f(x) and each gi(x) are linear, a necessary and sufficient condition [35] for existence of multipliers which give a solution is that the continuous analog of problem (A), i. e., the problem in which the constraint Xi e {0,1} is replaced by 0 < xi 1 1, has an extreme point solution. 3. 2 Using Generalized Multipliers for Quadratic Knapsack Problem In this section we show how multipliers can be used to solve the problem defined in Section 1. 2. 1 as the quadratic knapsack problem. Recall this problem was defined as:

69 Maximize f(x) = c..x.x. i,j 1J 1 J subject to (A) g(x) = a.x. <x M Xi e {0,1} and a.i> for all i, >c cij > 0, for all i, j. In this case we deal only with one multiplier since there is only one constraint. The Lagrangian for problem (A') is: L(x,X) = E cx..xx - xZa.x. The crux of the generalized multiplier technique is to find the appropriate value of X if it exists. The appropriate X is designated as X where we define X as follows. Given some positive e we have g(x*(X)) = M 0O < ML- M< e We use a binary search to find the appropriate value of X. The strategy is to select first an arbitrary XO such that x*, the maximizing 0 0 vector of L(x, X ),gives g(x*) = 0. This initial value X is an initial upper bound for X; the initial lower bound is zero. The fact that g(x*(X)) is a monotone decreasing function of X makes the search for X straightforward. The algorithm used for finding A follows: Step 0. Select >0, >0, X such that g(x*) = 0; XL.O -H= 2X

70 AH + AL Step 1. X + (X, <H) 2 Step 2. Find x* which maximizes L(x, X). Step 3 If g(x*) > 11, then go to step 6. Step 4. IfH - AL 6 or -g(x*) <E, then set =X, - g(x*), save x*, go to step 7. Step 5. Set H =,X save x* go to step 1. Step 6. If H- X > 6 then set L = X go to step 1. Step 7. If x* has been found, then print x* stop; Otherwise print "no multiplier solution" and stop. The above procedure finds ) if one exists. In those cases where no X exists such that we can find a feasible solution, i.e., a "gap" as characterized by Everett, we switch to the family of solutions technique which is discussed in Section 3.4. The above procedure was used effectively to solve a number of quadratic knapsack problems. We now discuss the more difficult problem when there are two constraints. 3.3 The Two Constraint Case For some applications, to be considered later, it is necessary to have a pair of constraints as opposed to a single constraint as formulated with the problem in section 3. 2. In this case we are concerned with selecting a pair A1 and A2 of multiplielrs and the procedure is more complicated. Suppose the constraints for problem (A') are written

gl(- _ j axi < i g2(x) = b.x < 2 2 4 i The Lagrangian is now L(x X1,X2) = Ci jx -.aii.x- 2 bix In this case appropriate values for X1 and X2 are defined as 1 and k2, where for some positive E1 and e2 we have gl(x* (X' 2))-' 1 g2(X*('X1 2)) = 2 and 0 < 1 1E1 <2 21 2 We know from Everett's results that if X2 is fixed we still have gl(x*(> )) as monotone function of X. We therefore select an initial 0 2 as some arbitrary large real number. With this value of >2 we use the binary search procedure to find X1. Now if we find that for this value of X> we have g2(x*) = i2 then we stop; otherwise we proceed to select a new value of 2 and repeat the binary search procedure for X1'. The new value of X2 is selected as follows. X2 is assumed to be 0 2 2 in the interval (0,X2); we initially set.2 = 2X02 where always 22, and X2 is computed using: XL H'

72 X2 +x2 L H (a) L H As with 1' we select subsequent values of X2 by using a binary 0 search of the interval (0, X2); if g2(x*) >;L2' then the new value of X is k2 and 2H remains the same. If g2(x*) < t2,then the new value of H is 2 2 and X2 remains the same. In each case X2 is computed using (a). The effectiveness with which we select the initial arbitrary value for X2 will determine the number of iterations necessary for solution of a given problem. We have found that a reasonable initial value for X2 is cij, the sum of all coefficients in the cost i,j function. In some of our experimental problems we have been able to 0 decrease the number of iterations by selecting k 2 as some fraction of Z c... i,j 1J Figure 3. 1 is a flow chart of the algorithm when there are two constraints. In the flow diagram of Figure 3. 1 we have one program exit from a box which says "NO MULTIPLIER SOLUTIONS"; this is necessary since we cannot predict apriori that for all values of the givens (cij'ai'bi l'L2 ) we can find Lagrange multiplier solutions which are feasible. When we encounter this situation we abandon the generalized multiplier approach for constrained optimization. The next section presents another approach to the problem of constrained optimization.

Select El E2 6A0 Find A: using. binary, search algorithm'i 2 2 2 ndpL ~n Nt Y Y feasible < solution? H L Nu 2 2 No multiplier Y 2 A + A solutions 2 Ah.h? L ti END 2, H 22 Figure 3. 1

74 3.4 Constrained Optimization Using "Families of Solutions" Given a constrained optimization problem of the following form, (1) Maximize f(x) subject to (2) g(x) < M x (Xi~X2'@eoXn) Xi E0,1} i=1,2,...,n A possible procedure for the solution of this problem is to find the set of vectors S = {x: g(x) <l} and from S we select x* such that f(x*) = max {f(x)}. Of course the number of feasible solutions IS I XES might be quite large; therefore it would be helpful to characterize the solutions of (2) in such a way as to avoid evaluation of f(x) for every x e S. Hammer and Rudeanu [19] have given a procedure for classifying the solutions of a linear inequality into disjoint "f'Yamilies of solutions"; {F1,F2,...,Fm}, such that given any x E S, x belongs to exactly one family Fi, i < m. The number of families m is much smaller than the number of feasible solutions IS 1. A A A A A family Fi is characterized as follows: Let x = (x1,x2,...,xn) be a member of S and let I be a set of indices IC{1,2,...,n}, let r(x,I) be the set of all vectors (xl,x2,..,Xn) e {0,1}n - A {0,1} x {0,1},x..x {0,1} where xi =xi if i E I, the other variables xi(i, I) are arbitrary. If all vectors x - (xl,x2,...,xn) e r(x,I) satisfy (2), then r(x,I) is called afamily Fj of solutions to (2)., ]~~~~~~~~~~

75 A When x is what is called a basic solution (see appendix I) then an I can be found such that all x E r(x, I) satisfy (2). The pair (x, I) is said to generate the family r(x, I); the variables xi for which i E I are called fixed variables of the family. When the set I consists of p elements, the family consists of 2n-p solutions, where p < n. Hammer and Rudeanu developed an algorithm for finding basic solutions and the sets I associated with each basic solution. Once the set of disjoint families {F1, F2,...,Fm} of solutions to (2) is known a procedure for solving (1) and (2) is defined. For each family F. we find the maximum with our algorithm, described in chapter 2, for unconstrained maximization. The maximum of the problem defined by (1) and (2) is therefore solved as follows: (1) For each family Fi let f(xl) be the maximum obtained by application of unconstrained optimization algorithm. (2) f(x*) = max {f(xi)} i=1,2,.oo,m Therefore for a problem with m families we will be required to solve m unconstrained optimization problems. Each unconstrained problem has fewer variables than the original problem since all xi (i E I) have fixed values. We present in appendix I the algorithm of Hammer and Rudeanu for determining the families of solutions of a linear inequality.

76 Average n Time (m s.) 4 0.5 7 1.0 8 1.1 14 1.5 -16 -..2.6 Table 3. 1 A Fortran program running of the IBM 360 model 67 was used to test the speed of the family method. Table 3.1 shows the average time in milliseconds to generate one family for a number of test problems, n is the number of binary variables. 3. 5 Example Problem Using Lagrange Multipliers In this section we give an example of a problem solved using the algorithms thus far described. We want to solve the following problem. Maximize f(x) =1. 40x1x3 +.42x3x6 + 8x +.42 +. 42x6x+.9x8 - 7x -.6x- 91x3-. 09x4- 21x5-. 42x- 49x -x subject to gl(x)= 5x1+ 9x2+ 3x3+ 4x4+ 7x5+ x6+ 9x7+ 8 10 g(X)1+ X2+ x 3+ x 4+ x5+ x7+ x > 2 2- 1 2 3 4 5 6 7 8Xi f{0,1} i=1,8 We select fore1 =2, e2=1, 6=.001, X2=-1.0 Thefirst step in our program is to find X1. In this case X1 = 0.273, gl(x*) = 9, f(x*) = -. 79

77 x1 * =x6*=x8* =1 x2* =x * =X *=X7* -0 2' 4 5 7 There were 11 solutions of the Lagrangian for the values of X1, which took.17 seconds, using the program for unconstrained optimization developed in chapter 2. Since g2(x*), 12' the second multiplier is adjusted to 2=. 5. This gives the same x* for 1 =.12. The binary search technique is used until the proper X1, A2 are found, which in this case are: =.089 2 318 which give x * =x *=X *=1 1' 3 6 X *=X *=X *=X* * =X 0 x2 4 5 - 7 8 f(x*) = -. 21 gl(x*) = 8 g (x*)= 3 The values obtained for X1 and X2 are not unique. The procedure is aimed at finding multipliers which lie in the required intervals. The total number of interations required for solution of this example was 35 with a total time required of.51 seconds. In the sections on applications we will give more computational results obtained with our procedure for constrained optimization problems.

78 3. 6 Comparison of Lagrange Multiplier and Family of Solutions Techniques The generalized multiplier technique and the family of solution technique were applied to a number of diverse problems. We found advantages and disadvantages with each particular technique. The main disadvantage of the family of solutions technique is the rapid growth of the number of families. For example if we are finding the number of families for the constraint n Z a.x, <t i =1 where a. = 1 for all i. We see that the number of basic solutions is (). Now in addition to the time required for generation of each of the basic solutions, we must solve each constrained problem associated with a basic solution. The solution time of the unconstrained problem was reduced when using the family technique. This is because, with this method, the number of binary variables in a problem is reduced when we solve for the free maximum. However, for large n we can quickly run up the number of families such that the time to generate the families is prohibitive. We define an iteration to be the solution of the unconstrained problem for a given set of multipliers. When using the generalized multiplier method, the number of iterations required forsolutions was small compared to the number of families generated. Recall that we

79 must find a free maximum for each family generated. In solving problems with two constraints using the algorithm discussed in 3.3 for a fixed X2' the average number of iterations for the search for A1 was 14. This was for a set of problems which ranged from 4 to 50 variables. The main difficulty with multipliers occurs when one encounters "gaps". This phenomena is most likely to occur in problems in which all the given values are integers. The following example illustrates this problem. 3. 7 Existence of Gaps Using Multipliers Suppose we want to solve the following problem with generalized multipliers: Maximize f(x) = 2x1X2+ 2x13+ 2x 2- 2x3 subject to x1+ x2+ X3 ~ 2 xl+ x2+ x3 > 1 Feasible solutions will have 1+ x 2+ X3 = 2 or xl+ x2+ x3 = 1 Now the Lagrangian for this problem is L(x,)= 2X x2+ 2xX3 + 2x2x3- (2+ X1- X2)x1- (2+ X1X2) x2-(2+X1-2)x3 If we maximize L(x,3 as a function of 1-A2, where X1 and X2 are constrained to be nonnegative, we see that if x1- A, > O, x* = (O; O, 0).

80 On the other hand, when X1 - 2 < 0 x* = (1,1,1). These are the only maximizing solutions for all values of the multipliers. Since both solutions are infeasible we have a gap,and Lagrange multipliers cannot be used for solution of this problem. The occurrence of gaps in the problems we solved was indeed infrequent. In those cases in which gaps were found the family of solution technique was used. In the chapter on applications more detailed results are presented on the relative frequency of occurrence of gaps. In conclusion we see that we have rather straightforward ways of converting the constrained problems of the first chapter to an unconstrained form for immediate application of our algorithm for free maximization. In the subsequent chapters we present applications of these techniques to problems of practical interest.

Chapter 4 4. Computer Programming and Graphical Partitioning In this chapter we consider in detail a particular application of the problem of finding optimal groupings. We consider an example from the area of computer systems storage allocation. This area of application was selected primarily for two reasons. First, as we shall see in chapter six, we had available facilities to collect the necessary statistics for model building; and second the availability of other (chapter five) algorithms or strategies for this particular application allowed assessment of different approaches to the problem. We first present a model for computer program partitioning and a discussion of the mathematical programming problem derived from this model. A brief discussion is given of the general problem of graphical partitioning of which computer program partitioning is a special case. We present some sequential partitioning procedures which utilize the solution of the quadratic programming problem, and we close with some computational results which are based on the sequential partitioning procedures. 4. 1 Model for Program Graph In studying the program packing or pagination problem of com - puter systems attention is focused on the structural anJ behavioral 81

82 properties of programs. In this regard the Markov morlel [40] is a natural vehicle for description of program structure, i. e., transfers between the various parts of a given program, The frequencies of these transfers or behavioral characteristics can be used in a Markov model for developing a figure of merit for the packing of programs. Per - haps, the most fundamental measure of the effectiveness of a page packing strategy is the resultant number of pages moved between me - mories per program execution; and since for most systems the decisions relating to traffic between core and secondary devices are highly dependent upon overall system environment, e. g., scheduling algorithms, paging strategies, etc., the effects of packing a program on page traffic are very difficult to ascertain directly. Given these prevailing conditions it is nevertheless true that a program structured to minimize interpage page transfers, reduces the number of systems level decisions relating to that particular program with regards to paging. Therefore it is in this context that the Markov model is used to develop a figure of merit relevant to the average number of interpage transfers per execution of a given program. 4. 1* 1 Program Instruction Units and Data Units The model used for a program graph is the one proposed in [20]. The salient details of the model are as follows. A program consists of two

83 collections of units, instruction units a = {l... am} and data units B =.1"'. *n }' The first collection of units is defined as follows. Each instruction unit aie a is an ordered collection of instruction elements. An instruction element is the smallest functional part of a program, generally a byte or word. Each element, with the exception of exit elements, has associated with it a family of successor elements which are the instruction elements which may be executed immediately after it. An instruction unit is then an ordered collection of elements el, e2'". ef such that: 1) Each element is an instruction element and appears in exactly one unit; 2) For 1 < i < f, the successor of e. consists of the single element ei+l' 3) For 1 < i < f, e. is the successor of exactly one 1 element, and that element is ei_. 4) The total volume of the instruction unit (which equals f bytes if the elements are bytes and size is measured in bytes) is not greater than some fixed maximum.

84 So we have an instruction unit consisting of a sequence of instructions which may have a single branch point (at the end), and a single entry point (at the beginning), and occupying some number of memory units. Each data unit fi E j is composed of a set of data elements. Those data elements which can be read or written on by instruction element ei are said to be referenced by the instruction ei or referenced by the instruction unit to which e. belongs. Data units are an unordered collection of elements b... b such that: 11. Each element is a data element and appears in exactly one unit; 2. The total volume of the unit (which equals h bytes if elements are bytes and size is measured in bytes) is not greater than some fixed maximum. Also an entry point and an exit point is specified for the program. If there are multiple entry points and exits one can include dummy units to make the entry and exit unique. For the problem with which we will be concerned we use the so-called Markov description of a program. Here the matrix P= {Pij} describes the transfers of control between instruction units, i probability that a. is executed after ai 1<i, j <m and 3Pij =lforalli.

85 For the data units we have the matrix A = {Ziu}, z. = probability that a. references data unit flu for lu 1 1 <i<m, 1 <u<n. The following is an example of a program graph along with its associated matrices. a2 - /0 io 1 o o o P = 1/4 0 1/4 1/2 Z =2.3 0 1/2 0 1/2 0 5 a4 2. —-- 0 5 0 exit The dashed lines indicated that control is not transferred when data units are referenced. The complete packing problem requires specifying the allocation to pages of both the instruction units and data units. There are various packing strategies one can employ, e. g., mixing of instruction and data units on the same pages, or perhaps having pages consisting of instruction units or data units or both. The point of view we take in this paper is that the pages consists of entirely instruction or entirely data and are not mixed; in this regard we consider only the packing of

86 the instruction units. The techniques discussed could be applied to pages of data given the appropriate data matrices. 4. 2. Figure ot Merit for Program Packing We in this section consider the packing of instruction units only. In order to develop a figure of merit for packing of a program, we use the "frequency of execution? concept developed independently by Kral [ 27] and at Informatics [ 20 ]. Implicit in the objective function, to be developed, will be the number of times each instruction unit is executed. Programs will always have an exit unit; an exit will correspond to an absorbent state of a Markov chain. The remaining units will correspond to the transient states of a Markov chain. It can be shown [.23] that if Q represents the transitions between transient states of a Markov chain, then {I-Q}-1 = {nij} gives the mean number of visits to the transient states of the chain before absorption. That is, nij, is the mean number of times state j is visited given that the chain started in state i. Therefore if the program is known to start in state one (unit al), then we can compute the first row of (I-Q)-1 to determine the frequency of execution of each instruction units. It should be noted that Q is derived from the P matrix (defined in the previous section) by deleting from P all rows and columns which correspond to non-transient states. We let Yi be the number of times unit ai is executed in one execution of the program. In the case that unit one is the unique initial state then yi= nli. Let qij be the probability of a transfer from unit ai to unit aj, then yiqij is the mean number of transfers from oai to aj. Therefore it is the

87 sum of the quantities Yiqij over all program units which we desire to minimize by reorganization of the program. In the next section the mathematical programming formulation is presented. 4. 3 Program Partitioning as a Mathematical Programming Problem It was noted in the last section that yiqij and Yjqji are measures of transfer activity between instruction units a. and a.. The transfer 1 3 activity of all instruction units is measured by (iqij + j qji) 1, j Now if a pair of instruction units are on the same page, then transfer between these units are not interpage transfers, these transfers can be tabulated as follows: i c ij xij, i, j x.. = 0, ai and ao on same page; 1, otherwise. where Cij = Yiqij + jqji = c... It is the quantity C c.. x.. which is used as a measure of effectiveness i, j for program packing. Since cij and cji both appear in the summation we 31 must take one half this quantity, i.e., r cijx. to get the true number of interpage transfers.

88 Each instruction unit a. has a size a. associated with the number of memory units occupied with the instruction elements. A page of instruction elements is said to be feasible if the instruction units on that page have a total size not exceeding a fixed constant pi which is a size of a physical page. The size constraint can be incorporated as, n E (1 - xi)ai < j=, 2,...,n. i=1 1] This constraint simply adds the sizes of all instruction units which are on the same page; noting of course the reflexive constraint that xii = 0 for all i. It is necessary to incorporate a transitivity constraint, i. e., if ai and a. are on the same page and a.i and ak are on the same page, then a. and ak are on the same page. This is written as x.i +Xjk +Xik 2 for all i j,k. Finally we have the obvious symmetric constraint x.. = x... It therefore follows that a mathematical programming problem representing program pagination is: (1) Minimize f(x) = C.. X., subject to n (I) (2) (1 - xij)ai<, j=1, 2,..., n; i=1 (3) xij + Xjk + Xik2, for alli,j,k; (4) x.. = x;, for all i, j;

89 (5) xi = 0, i-1 2, n; (6) xij E {0,1}, x= (x1, x12,.. xn, n- Xnn) The problem defined by (1) - (6) is a linear integer programming problem in zero/one variables. The optimum solution of this problem gives the assignment of program units to pages which will result in the minimum mean number of interpage transfers. Consider for a moment the size of this integer linear programming problem. For any program n2-n containing n units we will have a problem in 2 variables; this means we take advantage of symmetry constraint (4) and of constraint (5) that all x.. = 0. We also have a total of n +(~) constraints from inequalities (2) and (3). So we see that when n is any meaningful size, e. g., greater than 20, the current techniques for solution of linear programming problems in zero/one variables, with the number of constraints indicated, are inefficient. For example,one of the fastest algorithms among existing techniques is the algorithm of Geoffrion [15]. But even here the number of constraints present in the program partitioning problems, as presently formulated, would make application of this algorithm impractical. We now consider the packing problem in the context of the graphical partitioning problem.

90 4. 4 Graphical Partitioning Problem It is possible to consider a special case of the program packing problem in the context of the general problem of graphical partitioning which we can state as follows. Let G = (V, F) be a graph, where V = {v, v2,...,vn} is the set of vertices and F = {f1, f2,..fm} is the set of edges. Let = {Fi:Fj c F i=l, 2,... n} be a collection of subsets of F, Fj = {fi:f. is incident to vj}. Let 62( ) be the set of all subsets 1 11 of 8 and let w be a weighing function where w: 6( ) ->R. A partition of a graph G is a collection of blocks C = { E 1' ~ 2,, t such that Ei( hj = if if j and u ~i = 6. It is possible to define many different partitioning problems depending of course on the nature of the function w. The graphical partitioning problem is to find a maximal weight partition of the graph G for an arbitrary w(.) where the feasibility of a partition is pre-specified, e.g., a feasible partition might be defined as one in which all blocks Ej of the partition have I -jl |< 3. In the case of optimal packing of program pages, the function w can be defined as (1) w( Ek) = I A(k) I - -IA(- Ak)%A(.k)'I, where A( k)= UF FjkEk The function w( E k) counts the number of edges with both ends incident to vertices of 6 k' In the context of pagination the vertices of the graph correspond to program units and the edges represent those

91 transitions between units. We note also that the edge weights are integral valued for this formulation. An additional condition on the formulation of partitions is that of feasibility, i.e., a partition 6k is said to be feasible if and only if the size of each block g( k) < g. Where the size g( C6k) is the sum of the sizes of the vertices which make up block ~ k' The weight of a partition C is defined as the sum of the weights w( k) of the individual blocks of the partition. We can minimize the number of interpage transfers by finding the maximum weight feasible partition. For arbitrary weighing functions w( C k) no efficient procedure is known for solution of this graphical partitioning problem. However, for a special class of weighing functions w( ~ k) Lawler [29] has given an efficient procedure, a growth rate proportional to n5, for the solution to the graphical partitioning problem. Briefly his procedure is to formulate the problem as a problem in dynamic programming; and for the restricted class of weighing functions, the dynamic programming calculations are required only over a very restricted class of subsets of f. One such cost function for which Lawler's results hold is (2) w (Ek) = IFI - IA(Q_- k) n A(&-k) I This weighing function bears a superficial resemblance to the weighing function of program packing, but unfortunately it is not

92 possible to restrict the class of subsets for (1) as was done with weighing function (2). 4. 5 Sequential Selection of Partition Blocks It appears after one examines the alternative approaches to solving the graphical partitioning problem, especially in the context of program packing, that if graphs of nontrivial order are to be dealt with, one must use some procedure which is suboptimal. In this regard we introduce in this section procedures for partitioning in which the "'optimal" blocks of a partition are selected sequentially. In doing so we obtain a resultant partition which will in general be suboptimal. However, for a number of test problems we found this procedure to give optimal results. 4. 5. 1 Selection of Maximum Weight Partition Blocks As previously stated, concomitant with the minimization of the inter-block weight is the maximization of the sum of the weight of the individual blocks which comprise the partition. One might be inclined to use as a heuristic strategy a partitioning which selects the blocks with the largest weight. That is, we select from our graph a maximum weight feasible block V1; a block such that the sum of the weights of the edges with both ends incident to vertices in V1 is maximum over all feasible blocks in G. This will be the first block of our partition. Now from the graph G-V1 we repeat this procedure for V2 the second

93 block of our partition. We continue in this manner until we have a set of blocks {V1,..., Vm} which comprise a complete partition of the graph G. The inter-block weight of this partition will in most cases be close to the inter-block weight of the optimum partition. The selection of the block Vk with maximum weight can be obtained by solution of the following optimization problem. max f(x) = c..x.x. subject to i a.x. < 1 Sax <" E {o,1} i= 1,2,..., n (II) (i =1 Vie Vk, = 0 otherwise, ij..weight of edge (vi, vj) a. size of vertex v. maximum allowable block size. This problem, even though nonlinear, can be solved with the methods described in chapters two and three. There are some obvious advantages to this formulation. First we deal with n variables 2 as opposed to 2 and we have only one constraint as opposed to constraints on the order of n3 as with the linear formulation.

94 Successive applications of problem (II) were found to give in many cases an optimal partitioning when applied to small test problems for which the optimal results were known. However we could construct examples for which the process of successive selection of the heaviest feasible block could lead to a very poor partitioning. To assess the computational effectiveness of block selection by solution of (II) a number of tests problems were solved. Table 4. 1 summarizes the results. Column N represents the number of variables and M the number of distinct coefficients cij in the objective function. The coefficients c.. and a. were selected using a random number generator from (0, 10). N The value of ilwas taken as.4 E a.. The column labeled T represents i=1 the time taken to select an optimal block form of graph of N vertices and M edges. N M T D 10 1 5.097.34 20 40.71.25 30 60 5. 2.14 40 90 9.6.12 50 107 13. 3. 09 Table 4. 1

95 In the program written to solve the problem we use an upper triangular matrix. That is if there are terms c.. x.x. and. c. x.x. in -] i i ] l 1 ] the objective function where i <j we store the value cij where cij =c.ij+c. The value of M represents the number of terms in the upper triangular matrix, Since we have c.. = 0 for all i, the density of the matrix 2M [cij] is N2_N Column D represents the density for each value of N. For the value of density given the solution time seems to grow 5. 2 as N 4. 5. 2 Selection of Partition Blocks with Minimum Interblock Weight An alternative procedure for selection of the blocks of a partition which maximize the sum of the weights of the individual blocks is now examined. In using problem (II) for block selection, poor partitions could result if the blocks which were selected tended to leave the remaining graph highly disconnected. For example consider the following graph, where all vertices are unit size and -= 2. If we use the problem (II) as a vehicle for block selection, we could obtain vertices {2, 3} as the first block of a partition, since this set is a maximum weight feasible set of vertices. It is clear however that either {1, 2} or {3, 4} is a superior set for the first block of the partition since in these cases we leave the remaining graph connected.

96 To anticipate this type of graphical structure we can reformulate the strategy for sequential block selection as follows. Rather than having the weights of the edges with both ends incident to vertices in Vk (the selected block) maximized, we can select Vk such that the weight of the edges between Vk and V - Vk is minimized. This of course is equivalent to maximizing the sum of the weights of the edges with both ends in Vk and with both ends in V - Vk. In this case our objective function would be f(x) = c.X + c ( - x) ( The second term weighs the edges on the graph after the block of vertices Vk is removed. We can rewrite f(x) as f(x) 2Z c..xx. - b.x + K.(C1 3 c 3].bj = (cij+ cji) K=, c.. i,j Now the quadratic programming problem for sequential block selection becomes: Maximize f(x) = 2c..ix.x. - b.x. 1313 33 Subject to (III) g1 (x) = Za.x. A-1 g2(x) =;Xi > 2 _i - "

X. E {O, 1} x. = 1, v. e V 1! k = 0, otherwise, i2 1. Notice that we have added the constraint 3xi > "2' this constraint is necessary to avoid the trivial solution x. = O for all i, which will indeed maximize f(x) and be feasible without the second constraint. Problem (III) was found for a number of small graphs to give better results than problem (II) when used as a tool for selection of the optimal blocks of a partition. Note that the constraint K is dropped from the objective function in problem (III) since it plays no role in the maximization of f(x). Next we consider some of the problems encountered in using a sequential selection strategy when problem (III) is the vehicle for block selection. 4. 5. 3 The Number of Vertices on Selected Block as a Parameter For any graph of Nvertices, the minimum number of partition blocks is MINPAG = a. i=L 1 The maximum number of vertices on a block is MAXVER = N - MINPAG + 1 [a * = smallest integer greater than or equal to a

98 If x* is an optimal solution to problem (EI) then for any feasible two blocK partitioning we will nave gz(x*) bounded, 1 < g2 (x*) < MAXVER. In our procedure we select a value, MINVER, as the minimum number of vertices on any block to be removed. We choose MINVER as follows: MINVER = MAXVER - CON, if MAXVER. > CON + 1 =, otherwise The value 0f CON is preset in the algorithm. Of course by controlling the number of vertices per block we control the number of blocks in a partition. Recall that in using the Lagrange multipler method of solution we selected the values of the multiplier arbitrarily and used a binary search to obtain intervals for X and X2 such that the constraints were satisfied. Now with problem (III) we know that g 1 represents the physical limitation on the sum of the sizes of the selected vertices; but g2 is only specified to be greater than one. In other words we do not know in advance what the number of vertices should be which will give an optimum two block partition when we solve (III). In fact when we select MINVER as the smallest number of vertices we will remove, Everett's Theorem only tells us that if our block has size g1(x*) and it contains g2(x*) = MINVER vertices, then no block of size less than gl(x*) containing at least g2(x*) vertices exists which will give a higher f(x*). Therefore before we select a block for our partition we solve for several values of g2(x*). That is we solve our problem for some

99 initial set of multipliers such that g2(x*) is as large as possible, we call this first value g2(x*) max' Then we solve the problem for values of g2(x*) in the range MINVER < g2(x*) < g2(x*) The value selected for X2 for the first solution of the Lagrangian is arbitrary. We want to select the initial X2 such that the initial number of vertices removed is as large as possible. For any value of X2 selected g2(X*)max is bounded above by MAXVER. We found experimentally that after X2 is larger than some constant, X2max, that the same resulting partition was obtained for all values of A 2. A value we found for ma was max{bj} where 2 - max 2max. b. is the coefficient of the linear term in our objective function f(x). After a partition is obtained with this initial value of X2 we have associated with this partition g2(x*)ax vertices. Now we look for a partition with g2(x*) = g2(x*)max - 1 vertices. This is done by 92(1* 2- m ax using a binary search of the interval (o, X2ma) for a value of 2 that gives g2(x*)ax - 1 vertices. When the proper X2 is found, iL e., it gives the appropriate g2(x*), we record the value of f(x*).- Now the interval (o, X2) is used to find the next value of the second multiplier; this is the value which gives g2(x_*) - 1 vertices. This procedure is continued until we have g2(x*) = MINVER. The partition selected is the one with vertices in the range of (MINVER, g2(x*)max) with maximum f(x*)

100 So rather than treating block size /i1 and the number of vertices on a block as fixed known quantities, they are treated as parameters in as much as we look at the value of the objective function for different combinations of block size gl(x*) and number of vertices g2(x*). Therefore in this case, where p.2 is an unknown the algorithm is as follows: Step 0 Select X2 MAXVER, MINVER, X2L = 0, X2L+ X2max A2'= 2 2 Step 1 For this X2, find Xi such that a block of maximum feasible size is selected. Record g2(x*). Set g2(x*) max = g2(x*)' If no feasible x* is found go to step 8. Step 2 12 = g2(x*) - 1 Step 3 a2 = X2L+ 2 max 2 Find X1 such that we obtain maximum size gl(x*) feasible block; otherwise if no feasible x* is found go to step 8. Step 4 If g2(x*) = /a2, go to step 7. Step 5 > 2 = X2max and go to step 3. Ifg2(x*) >2, X2max Step 6 If g2(x*) < 12 2L = ~2 and go to step 3. Step 7 If p.2 = MINVER Stop; otherwise p12 = g2(x*) - 1, X2max = X2, go to step 3.

101 Step 8 Stop multiplier search gap exists, go to family method. After an optimum set of vertices V has been removed, the procedure is repeated-on the graph G - V. For the graph of figure 4. 1, table 4. 2 given the result of the parametric partitioning. X1 2, gl(x*) g2(x*) f(x ) 0. 2727 1.0 9 4 -0 7 0. 120 0. 545 9 4 -0.79 0. 0898 0. 3175 8 3 -0. 21 Table 4. 2 When X is set to 1. 0 our binary search procedure selects values XJ for X1 until the value X1 = 0. 2727 is found. This corresponds to a feasible block of size 9 containing 4 vertices. The value f (x*) represents the weight of the edges between the two blocks. When we selected X2 to be. 3175 we get a X1 of. 0898, which corresponds to a feasible block of size 8 containing 3 vertices. Smaller values for X2 give no improvement when MINVER is two as in this case. In this particular case it turns out that the optimum two block partition corresponds to X1 =.0898 andX2 =' 3175, which gives a between block edge weight f (x*) of -0. 21. To get the total weight of edges with both ends incident to vertices on the same block, we add a constant K representing the total arc weight of the original graph to the negative number f(x*). The block of vertices removed from figure 4. 1 is {1, 3, 6}.

102 9 \ 21 Oo 21 4 (2 (5) 7 (6)1,0 9 0o 21\ /Oo 21 Block size = 10 Numbers beside the vertex represent the vertex size Figure 4. 1

103 4. 6 Algorithm for Graphical Partitioning The approach taken is to use the generalized multiplier technique for partitioning the graph whenever possible. If during the process of partitioning a graph a gap in the values for multipliers is discovered we switch to the family of solutions technique. We can state that during the actual running of the algorithm on graphs where all the vertices were not the same size, gaps occurred only when there were very few vertices on the infeasible graph (graph which remained to be partitioned after removal of some vertices). For cases where not all vertices were the same size, we found when it was necessary to use the family method, that the largest number of families generated for a single partitioning was 87, this occurred when the total number of families generated was 115 (see table 404 n = 46). Figure 4. 3 is a flow chart of the partitioning algorithm. 4. 6. 1 Example of Partitioning We illustrate the steps in application of this algorithm by finding the optimum partition for the graph of figure 4. 1. First the bounds on A the maximum block size is set; this is shown in box 1. of Figure 4. 3. In this. hox we also read CON which determines MINVER. InA box two we see all infeasible connections are removed, i. e., if vertices v. and v. have edge cij between them and a. + a. > pl, edge c.. is deleted from graph. The edge c.; is called infeasible because it is impossible,

104 1(5 0o 7 2)3 /0o21 4 ( 7 6)1 () 9 \ 21 /0 21 O.09\\ //0g 49 1 (a) 0o 09\\ /0o 49 (b) \ 49 (c) 07 (d) Figure 4.2

105 Sta rt ReadI1 1 CON Remove-all 2 infeasible connections Set M INVER elect AI' I. II -Select I 1 I I^ Solve L(x, A) New IA I lgure4 3 Use family partition Nmethod und Remaining N y, graph feasible Select optional pa rt ity max fx*1, ))over X1, I'fs End Figure 4. 3

106 due to size constraints, for the vertices connected by this edge to be on the same block. The sum of all infeasible edges is a lower bound on the inter-block weight of any partitioning of the graph. In box 2 the lower bound MINVER is calculated from formula MINVER = MAXVER - CON; MAXVER in this case is 5 and CON is 2. Figure 4. 2 (a) represents the graph with which we deal after the checks of box 2 of our algorithm have been applied. Notice that vertex 2 is not on the graph since it is isolated after infeasible edges are removed and no longer plays a role in the optimum partitioning. The first value of X2 is selected. Then the interpolation scheme, i. e., binary search is used to find the appropriate value for X1. This sequence is indicated by box 3. After amaximum size gl(x*) feasible block has been found with the given X2 using a binary search of values for X1, we consider a new value for X2. This is box 6. If g2(x*) MINVER, the search procedure previously indicated is used to select a new value for X2, i. e., first step in box 3 of figure 4. 3. Table 4.3 indicates the optimum values found after application of procedure to graph of figure 4. 2(a). Figure 4. 2(b) is the subsequent graph after removal of optimum block. Table 4. 2 gives the values found after application of procedure to graph of figure 4. 3(b). The resulting graph is shown in figure 4. 2(c). When we apply the procedure to this graph we note that the multiplier method fails, i. e., a "gap" is encountered.

107 In this case we use the family method, box 5 in flow chart, to get the optimum block. The resulting graph figure 4. 2(d) is feasible and the partitioning is complete. X f(x*) Vertices Removed.0&9.317 21 1, 3, 6.083.44 -o 09 4 Gap Gap -. 21 7, 8 Table 4.3 4. 6. 2 Results for Some Partitioning Problems Table 4.4 summarizes statistics for a set of partitioning problems solved using the sequential optimization procedure, when problem (mT) is used for block selection. The problem for n = 4, 8, 14 were problems for which the optimal results were known and obtained. The two larger problems were selected by using a random number generator for values of cij in the interval (0, 10). The values of vertex sizes were randomly selected integers in the range [1,,]. The table shows that for the largest problem solved a total solution time of 4. 80 seconds. This time is the sum of T1, the total time to generate all families when a gap is encountered and, T2, the total time to solve all unconstrained problems. The 4 and 14 vertex problems were problems in which all vertices had identical sizes. Note that, I, the number of unconstrained problems solved is quite

108 n T1 T2 T3 F II1 4. 002. 01 TU12 6 8.004.47.474 3 58 10 14 1. 54 1. 88 3. 42 1326 1326 4 25.02 2. 42 2. 44 5 163 10 46.28 4. 62 4. 80 115 182 20 Table 4 4 n = number vertices Ti = total family solution time (sec.) T2 = total unconstrained solution time (sec.) T3 - total solution time (sec.) F: number of families generated I - total number of unconstrained solutions = - block size

109 large for the 14 vertex problems, since an unconstrained problem must be solved for each family. The number of iterations, solutions to unconstrained problems, was fairly constant when no gaps were present. The average was 14 iterations per solution for a given X2. With the 46 vertex problem a gap was found after 31 vertices had been removed from the graph by application of the generalized multiplier technique. This resulted in 87 families being generated for the next optimum partition of the graph which at that point consisted of 15 vertices. This was the largest number of families generated for a'partition when the sizes of the vertices were not all identicaL Generally when gaps did occur, for problems where vertex sizes are mixed, they occurred after a number of vertices had been removed. This meant that the number of families generated was not prohibitive.

Chapter 5 5. Comparison of Program Partitioning Strategies This section deals with the question of accuracy; that is how close to the optimum is a program partition which is obtained by a sequential selection procedure. We attempt to assess sequential selection as a partitioning strategy by solving a set of problems by two alternative partitioning procedures and comparing the results. 5. 1 Alternative Partitioning Strategies In appendix II a detailed presentation of the unit merge algorithm developed at Informatics [20] is found along with the dynamic programming segmentation algorithm of Kernighan [24]. These algorithms represent different approaches to the problem of graphical partitioning or program reorganization. Both procedures were much faster than the parametric sequential selection procedure, therefore the basic comparison of the algorithms is with accuracy. 5. 1. 1 Unit Merge Algorithm Basically, the unit merge algorithm is a procedure which fits into the class of "greedy" algorithms. That is, blocks of vertices are found by selection of the set s of vertices which have the highest weight edges connecting them. Suppose the edges are ordered as follows, 1 2 m {c ijr Cnk1...,pq where, lc >.. >c >.O ij- ak-' p q 110

Now if vi. and vj, the vertices which are connected by cij, have a feasible total size, i. e., a. + a. < A, where a. and a. are the sizes of vi and v., I J]- t I they are said to form a feasible merger and hence they are combined to make a new vertex v.', where v.' has a size equal to the sum of the 1 1 sizes of v. and v.. All edges previously connected to either of these two t J vertices are now connected to vertex v.'. However, if it is the case that a. + a. >,u, then the vertices connected I J 2 by c.. are considered for a merger. The algorithm continues to test for possible merges in the order of decreasing edge weight. The procedure is terminated when it is no longer possible to make any feasible merger. The sum of the edge weights between vertices at the time of termination of the algorithm represents the interblock weight of the partition. 5. 1. 2 Segmentation Algorithms Kral [28] and Kernighan [4] have independently developed program partitioning algorithms which operate as follows. A program is considered to be a set of n vertices, where the vertices are program units indexed by {1, 2,.., n}. The size of the vertices are known and the ordering of the vertices is fixed. Minimization of transitions between pages is accomplished by identifying a set of "page breaks" or "break points" between vertices. The vertices between successive break points form the blocks of the partition. This formulation is somewhat simpler than

112 that of the general proigram partitioning problem~ and in appendix II we give the dynamic programming algorithm ot Kernighan tor its solution. 5. 2 Comparative Results Table 5 summarizes the results after the three algorithms were applied to a collection of graphs. The column labeled Al, A2, and A3 are the resulting interblocks weights obtained respectively from the sequential selection procedure, the unit merge algorithm and the segmentation algorithm. The number of vertices and edges on each graph is represented by N and M respectively. All algorithms were programmed on the IBM 360 model 67. For all graphs the maximum total time required by either the merge algorithm or the segmentation algorithm was 1. 6 seconds. The typical times for the algorithm (Al) were given in section 4.6. N M Al A2 A3 4 4 110 180 110 8 10 1 1.6 1.6 2.51 14 17 43. 5 44. 0 43. 5 25 50 214. 17 204. 35 265. 30 25 63 218. 84 218. 97 285. 5 46 75 311. 2 287. 2 386. 61 46 95 319. 43 339. 12 403. 2 50 106 141.2 159. 1 225. 3 Interblock Weights Table 5

113 Table 5 shows that the sequential selection algorithm (Al) and (A2) clearly give the best results (smaller interblock weights) in most cases. For the small graphs, n = 4, 8, 14, the values obtained by the sequential algorithm (Al) were optimum. The segmentation algorithm (A3) gave results which were in general not as good as the first two algorithms. In some cases a better (smaller) interblock weight was obtained with (A3) by renumbering the vertices. For the graphs with n > 14, which were generated randomly, the optimum results are not known; here we find for the same value of N that if the graph has a larger number of edges M, we get slightly better results using sequential selection. On the other hand when the graph is more sparse, fewer edges, (A2) gives better results. The time required by (Al) is considerably more than that required by (A2): For example, the interblock weight obtained by (Al) for N = 25 and M = 46 is approximately the same as that obtained by (A2). But for this case the total time required by (Al) is 6. 4 seconds as compared to. 22 seconds for (A2), For'the largest graphs N = 50 we obtained the best results when problem II (quadratic knapsack section 4. 3) was used as the vehicle for sequential block selection. This result is shown in table 5; the total time required was 13. 5 seconds as compared to 1. 6 seconds for (A2).

114 For incorporation of a program packing procedure in a compiler, it is obvious that a fast algorithm such as the unit merge algorithm would be required, since in this case the time added to compilation must be less than the time saved during paging. A procedure such as unit merge meets this requirement. 5. 3 Extensions of Sequential Solution Procedures In this section we present an expanded branch and bound algorithm for graphical partitioning. Using this expanded algorithm we attempt to deal with some questions such as multiple solutions which may arise when using the sequential selection procedure previously desc ribed and the fact that the order in which the partition blocks are selected may affect the overall result. 5. 3. 1 Order of Selected Blocks Using the sequential optimization procedure to select the block which minimizes the weight of the edges between the remaining graph and the chosen block (or equivalently maximizes the sum total weight of the edges which have both ends adjacent to vertices on the selected block or both ends adjacent to vertices on the graph which remains after the vertices of the selected block are removed) we can in some cases select suboptimal partitions. For example consider the following graph where all vertices have size 1 and the block size is 3.

115 21 2 1 G Q 2 2 Clearly the feasible block of vertices to remove for minimum interblock weight is {1, 2> In this case the sequential selection procedure has MINVER > 1 hence the single block {2} is not feasible. Now this gives an incremental cost of 3. The graph which remains can be optimally partitioned for cost of 4; then the total interblock weight is 7. Now it is clear that {1, 3, 5} is a block with an interblock weight of 5; this incremental interblock weight is the total interblock weight since no further partitioning is required. This example shows how the sequential selection procedure which selects the optimum feasible block at each stage can be lead to suboptimal results since this procedure has no "look ahead. " The branch and bound algorithm which we shall presently describe is one method ot dealing with this problem. Next we show how multiple optimum solutions can lead to suboptimal results. 5. 3. 2 Multiple Solutions A problem which can be encountered when solving the sequential optimization problem is that of multiple solutions. That is there may be

116 more than one optimum block which can be selected by sequential optimization algorithm. Consider the following 1 2 Here the maximum allowable block size is 3; and all vertices have size i, with the exception of vertices 4 and 5 which have a size of 2. In this case we have two blocks which- ay be selected as- an initial block in the sequential optimization procedure. We may select either {1, 2, 3} or block {1, 2} for an interblock weight of 5. It is clear that the additional partitioning cost will differ depending upon which of the initial optimal blocks is selected, e. g., if {1, 2, 3} is selected we can get an optimal partition with interblock weight of 8. These examples illustratethe fact that it is not possible to characterize a given block as belonging to an optimal partitioning until the complete structure of the partitioning is known. Hence we see that in general any sequential procedure for selection of partition blocks may lead to suboptimal results when the number of blocks required is more than two.

117 In this next section we describe a procedure for exploring optimal solutions which, at the expense of more computation time, allows one to "look a head" in the sequential selection procedure. 5. 4 Branch and Bound Algorighm for Partitioning One method for dealing with the problem of "order of selection" and "multiple solutions" is to add "look ahead" to the sequential selection procedures. This was done as follows. Suppose we solve the partitioning problem and we have selected m blocks bf a partition V1, V2,,. Vmi And suppose block V1 has size si, where sl _</ where tL is the bound on the block size used in the sequential selection procedure. We have discussed two particular ways of selecting blocks (problems II and III in section 4. 5). It was found that in general the second strategy (problem III) gave the best results with the algorithm discussed in chapter 4. However with the branch and bound strategy we used the quadratic knapsack (problem II)as. the tool for block selection. Associated with the partition V1, V2,.e o Vm is an interblock weight TMIN; we now want to discover if we had selectid another block smaller than V1 as our initial partition block whether we could have obtained an overall smaller interblock weight TMINo Since we know that the block with maximum weight and largest size obtained from the original graph has size sl, originally, we set the bound in the block size,I at.'. = sl - q, where q > O and start the problem over again.

118 With this particular g the original initial block will not be feasible, hence the algorithm will select a new first block V;. Now the graph which remains to be partitioned is not the same graph as when origiially we selected the first block using p as a size bound. In figure 5. 1 we see a representation of the branch and bound scheme. Each vertex represents the state of the graph to be partitioned. G1 is the original graph. The arcs labeled s1 leading to the node labeled G2 means that after selecting the first block of size s1 we obtained the graph G2. At each vertex Gi there is an associated cost Ti for obtaining the set of blocks leading to the graph Gi. We can compute the incremental cost IC of going from Gi to Gi+1 using sk, k = 1, 2, 3 for the graph shown, and if Ti + IC > TMIN we stop the search along this path. Otherwise we continued until we got a complete partition. If at the end we obtained a smaller TMIN we retained this new partition and this. cost is our new TMIN. We then back tracked to the last vertex at which we had not considered alternative block size. The second phase of the algorithm for unconstrained optimization was programmed so as to record multiple solutions, It was then possible to utilize the same branch and bound routine to trace for alternative partitions which arise due to multiple maximizing vector found in the second phase. However, in this case s1, s2, etc. represent different blocks selected due to different maximizing vectors, not different block sizes.

119 Gi.1 \ S S S2 3 3 ~ s2 - / 1 3 2 \ / 2 Feasible TMIN Fi gu re et I

120 In using the expanded branch and bound procedure it was found that the computational time grew quite rapidly if there were many branches to trace in the tree of figure 5. 1. The increase in computation time did not seen to justify this approach to graphical partitioning. The use of T. + IC < TMIN as a test was not sufficiently powerful to avoid lengthy tracing through the tree of figures 5. 1. If one were to pursue this particular application, a way of decreasing the tree searching is to use a stronger test to decide if a particular path is to be followed. That is, calculate a lower bound on the additional cost before a particular path in the tree is taken. For example suppose we have reached mode G. which represents the graph after i - 1 block have been removed. We could use the max - flow/ min cut algorithm of Gomory and Hu [16] to calculate the minimum additional cost to partition Gi, since this algorithm gives the minimum cost to partition G. into two blocks independent of the block size. This of course, could add more time to the algorithm so that the trade -off between additional computation at each vertex G. and the total number of paths searched would have to be investigated. 5. 5 Summary In using the different algorithms for partitioning, it became clear that the partitions obtained by the sequential selection algorithm are dependent upon the "uniqueness" of the blocks selected in the early

121 stages of the procedure. This means, of course, that the order in which blocks of an optimum partition are selected is very important. This is why for partitions where the number of blocks is limited, and hence order of selection is not as significant, the sequential algorithm gave optimal results. In considering other algorithms which select blocks sequentially, we find this same problem. For example the Kernighan - Lin L25] algorithm is a sequential selection procedure which divides the graph into two subgraphs of equal vertices at each step, it attemps to minimize the sum of the weights of edges between the two subgraphs. But even here, where no attempt is made to get feasible size blocks, there is no way of choosing betweenalternative optimum "two-way" partitions in the early stages of their procedure. This means that after the first optimum two way partition is found, half of the vertices of the original graph can never be considered to be on the same block of a partition, On the other hand, when we consider the unit merge algorithm, we find that the "merges" made do not tend to disconnect or rule out particular sets of vertices for blocks as rapidly as sequential selection. The major problem with the unit merge procedure is that it is highly dependent on the "order" in which the maximum weight edges are selected. Consider the example which follows in which all vertices have unit size and the block size is 3.

122 3 7 09.21. 213 49 ~21.21 This example was given in [20] and the optimal partition was said to have anjnterblock weightof 1. 3. Vertices 1, 3 and 7 are first selected by the merge algorithm for the same block. This leaves the following graph. go 09e 21 Now it is possible using the merge algorithm to select vertices 2, 5 and 8 as the next partition block. This will result in an interpage weight of 1, 39 which is not optimal. This problem is always encountered when the merge algorithm is used with graphs where some of the edges have identical weightso Segmentation procedures [24], [28] can be done efficiently with dynamic programming techniques. The computation time for these algorithms are competetive with the merge algorithm. With the examples we considered, the partitions found by segmenting or finding break points

123 were in general the worst of the three partitionings. In some cases we could find a better partition by renumbering the vertices. It appears that for graphs in which several vertices have a high degree, i. e., a large number of incident edges, this procedure will give partitions which have a relatively high interblock weight. However this approach to partitioning was designed to be used on the instruction level. That is the program unit which the vertices represent are individual instructions rather than sets ot instructions. A graph representing programs at this level would tend to have in general a low average vertex degree. For program re - presentation at this level segmentation algorithms are applicable. However, one is immediately confronted with the problem ot gathering statistics tor programs at this level ot detail. Summarizing we saw that sequential selection tended to give slightly better partitions than the greedy algorithm for graphs with a high edge to vertex ratio. There is however a trade-off in the increased computational time needed tor the sequential selection procedure and in most cases the additional accuracy obtained by this procedure would not be justified in terms of additional time. For graphs which are sparsely connected, the merge algorithm gave results which make it preferable. Segmentation procedures should be used only in graphs for which the average degree of the vertices is relatively small, i.e., at instruction level. In the next chapter we give an example of the changes in paging performance for a system translator which has been reorganized after application ot a program partitioning algorithm.

Chapter 6 6. Application to Systems Programming In this chapter we assess the impact of program partitioning on a systems program's performance in a real paging environment. A program was written to take the results of an optimal grouping of a set of subroutines of a systems program and reorganize the set of routines such that the interpage references among the routines were reduced. The program graph for the systems program (SNOBOL 4) was sparsely connected; therefore the Merge algorithm was used for partitioning. Due to its modular structure we found for purposes of this test that the SNOBOL 4 translator on the Michigan MTS system was well suited as a testing vehicle. Performance results for both the normal and the reorganized version of the translator are given. 6. 1 Paging of a SNOBOL Translator in MTS General computing services at The University of Michigan are provided on the IBM 360 model 67. The software system is called MTS, i. e., the "Michigan Terminal System". The Michigan Terminal System is a reentrant job program which runs under the University of Michigan Multi-Programming System (UMMPS). In addition to batch processing, MTS has the capability of controlling and executing programs from remote terminals. Programs run under MTS are paged in a multi-programmed environment. 124

125 For purposes of assessing the impact of programmed reorganization on paging performaance, a SNOBOL 4 translator was analyzed and optimized using the procedures previously discussed. A special version of the translator was assembled which allowed determination of the following, 1. the inter -page branching between procedures, 2. the number of times each procedure is called, 3. the time spent in each procedure, 4. the average time per call for each procedure. The MTS data collection facility [37] was used to collect the statistical data from the special version of the translator. The data collection facility produced a tape with all the relevant information from which the appropriate data could be derived. This data was used to construct a program graph of the SNOBOL 4 translator. The SNOBOL 4 translator used for this test consisted of 106 independent procedures which call on each other during execution of the program. The procedures ranged in size from 30 bytes to 3076 bytes, with the total translator having 18 pages of procedures. A page consists of 4096 bytes on the IBM 360. To maintain a constant environment this paging experiment was run in the early (4:00AM, 19 Dec. 1970)hours of the morning with few other users on the system. Two special statistical versions of SNOBOL were assembled, one reorganized and one left in its normal state. A set of five SNOBOL programs were run using each translator. To

126 create a constant system paging environment each translator was run with a special background job call PAGE-IT. The PAGE-IT job obtains a large number of virtual storage pages and references them cyclically in rapid succession. PAGE -IT was designed such that when run with a moderate load of normal MTS tasks, the drum channel programs are kept running continuously. The effect of this is to force pages of other programs out of storage at a faster than normal rate. 6. 2 Reorganization of a SNOBOL Translator The SNOBOL translator or program functions as an interpreter; that is its operation is divided into two phases, translation and execution. The translation phase consists of an examination of the source language program (problem program) and a determination of the proper subroutine to execute the tasks specified by the source statement. Execution is that phase of the SNOBOL program in which control is given to certain SNOBOL subroutines for execution of tasks determined during translation. As cited earlier the SNOBOL translator consisted of 18 pages of code. There are an additional 30 pages of data. The reorganization performed consisted only of reorganization of the pages of code since it was not possible to affect the organization of the data pages. During the translation phase of the SNOBOL translator only two data pages are referenced. Therefore it was assumed that the behavior during this phase of the translator could best reflect the paging results, due to reorganization, of the routines of the translator.

127 Program S1 S2 D 1 7560 7324 236 2 7124 7029 95 P=4082 during Si 3 11332 11242 90 P=4397 during S2 4 4348 4365 -17 5 3209 3063 146 Table 6.1 Program S1 S2 D 1 3771 3739 32 2 3433 3612 -179 P=182 during S1 3 5496 5495 1 P=831 during S2 4 2186 2180 6 5 1545 1566 -21 Table 6. 2

128 Table 6. 1 and Table 6.2 represent the translation times of the five test problem programs used in our paging experiment. The column labeled S1 represents the SNOBOL translation time of the problem programs which used the translator which was not reorganized; S2 represents the translation times of the test programs using the reorganized version of SNOBOL; and D is the difference S1-S2; all times are in milliseconds. Table 6. 1 summarizes the results taken during operation of the special program PAGE-IT. The value P beside the tables represents the number of drum reads recorded during operation of these programs; their number is indicative of the paging activity occurring during the running of the test programs, The translation times include time taken for paging. First we note that the times in Table 6. 1 are about twice as large as those in Table 6. 2; this is due of course to the additional time for paging. For example when the five problem programs were run with the translator designated Si, there were 4082 drum reads during the execution of these programs with PAGE-IT creating a heavy load. When these same programs were run on SI without the additional paging load there were only 182 drum reads. For the first problem program we see that when there is relatively small paging load (Table 6. 2) the difference in translation time is 32 milliseconds with the reorganized translator S2 being faster. However, with this same program under heavy paging conditions (Table 6. 1) we get a translation time difference of 236 milliseconds,

129 with S2 being faster. For this problem program we get a translation time difference of approximately 1%, i. e., the time for S1 is approximately 1.01 times the time for S2, under a small paging load. The translation time difference is about 3%when there is much paging activity with the reorganized version faster. With problem programs 3 and 4 there is little difference in translation times under a light paging load. However when paging is increased the increase in translation time for problem 3 is slightly less than 1%with S2 the faster version. With problem 4 the reorganized version S2 was the slower with a heavy paging load. The explanation for this is not clear except that the reorganization of the SNOBAL translator was based on its "average" behavior; therefore it could be expected to run slower for some types of problem programs. Problem program numbers 2 and 5 had relatively high time differences under the small paging load. In these cases it was the reorganized version which was slower with little paging (Table 6. 2). But when the paging load is increased for both these problem programs the reorganized version was faster by at least 1. 5%a In conclusion we note that in 4 of the 5 problem programs superior performance, under heavy paging, was observed by the reorganized SNOBOL translator. In the one case, problem program 4, where the reorganized program was slower the difference was less than 1/2%.

130 The data for which the program graph for SNOBAL was constructed was obtained by running a program which used a wide mix of the translator's facilities. Therefore when the problem programs, not the same program from which the program graph was constructed, were run a variation could be expected since all problem programs use different parts of the translator with different frequencies. The portion of the SNOBAL translator used for data is larger than that used for code (30 pages versus 18 pages). Therefore the paging due to data referencing is significant. We could not affect the organization of the data pages. The original version S1 was better organized with respect to data referencing. When the translator was reorganized the paging due to data referencing actually increased, i.e., the value of P is always higher for the reorganized translator. Although the statistics were taken at a time in which system use was minimal, we could not completely control the system environment. Users could sign on and thereby affect the overall system paging strategy and thereby the statistics we collected. A conclusion which may be drawn from our results is that the paging efficiency can be increased by a reorganization of the code of a program. Even though the efficiency increase may be relatively small, the fact that the systems routines are heavily used can result in significant savings in paging overhead. If little data referencing is done this is all that is necessary. But if many data pages are used

code reorganization should not take place without consideration of the references to data pages.

Chapter 7 7. Summary and Future Research This section summarizes the results of this study and suggests areas where the research may be extended. The central purpose of this research has been the development of an algorithm for finding specific optimal groupings. Towards this end we have done the following: (a) developed an algorithm for solution of a quadratic binary program of large dimensionw (b) investigated one specific application of optimal groupings to computer programming; (c) made computer comparisons of algorithms for computer program partitioning; (d) restructured a systems program and monitored paging behavior. We have developed an algorithm which has application to a variety of problems. For the specific application of computer program partitioning we found the unit merge algorithm to give the best performance in terms of running time and accuracy' for some graphs the Unit Merge algorithm could give results which were inferior to those given by sequential selection. The speed and simplicity of the Unit Merge algorithm makes it attractive as a procedure to incorporate in a compiler. Segmentation procedures 132

133 were found to be effective only for graphs of low average vertex degree. A system program was reorganized and the paging behavior due to this reorganization was measured. It was found that considerable paging overhead could be decreased by program reorganization. For programs with large amounts of data it was found that data pages must be considered in any scheme of overall program reorganization. In the next section we consider some extensions to the problem we have discussed. 7. 1 Extensions to Quadratic Algorithm We have discussed how generalized multipliers maybe used to solve the quadratic problem for m< 2. There should be more research in extending the generalized multiplier approach to large values of m. 7. 2 Program Partitioning with Paging In the previous work, we have not made use of the fact that a program may have more than a single page in main memory at a given instant. When this aspect of memory management is considered, one can alter the nature of a given program partitioning. Consider the following example. 5 /3

134 For this graph all vertices except 1 and 6 have size 1, vertices 1 and 6 have size 2; the page size is two. The unique optimum partition for the above program graph is Page 1 I Page 2 2, 4 Page 3 3, 5 Page 4 6 This partition corresponds to an interpage transition weight of 2 1/8 which is minimum. However if we assume that there is a nonzero probability p(k) that k pages of this program are in main memory simultaneously, where k=1, 2, 3, 4 and Zp(k) = 1, then we see that an optimum partition will depend upon the distribution p(k). For example suppose p(k) = 0, k=1, 3, 4, and p(2) = 1. This means that we can have pages 1 and 2 or pages 3 and 4 in core simultaneously. When we are allowed this configuration our "interpage" cost is now only 1 3/4; since there is no cost for transfer between vertices 1, 2, and 4 or 3, 5, and 6. But we notice that we can reorganize this program parition such that our interpage cost is 1 if we define the new packing as, Page 1 1 Page 2 2, 3 Page 3 4, 5 Page 4 6

135 With the new formulation pages 1 and 2 and pages 3 and 4 are simultaneously core resident. The point is that the optimum partition found under a "static" consideration is not necessarily optimum when one allows multiple pages to be core resident. When one considers multiple pages in core, a natural question is, how partitioning of programs interacts with "page turning" or page placement/replacement algorithms. One can formulate the partitioning problem in such a manner that one has included an implicit page turning algorithm. Suppose we have distribution p(k), the probability that a given program has k pages resident. We then would like to select a partitioning which somehow takes advantage of the information given by distribution p(k). Let us define x:. = 0 if vertices i and j are on the same page, = 1 otherwise. For k > 1 define x.. = 1 if when control is at vertex i vertex j is not on any of the k pages which are core resident, = 0 otherwise. Now a cost function which takes into consideration the fact that there is not necessarily a cost associated with interpage transfers is

136 (1) F c..x. (k)p(k). In this case we get a contribution to the cost function only if vertices are on different pages which do not occupy core simultaneously. The distribution p(k) "weights" the cost according to the statistics on core usage. The constraints for such a formulation could be as follows~ To assure that the vertices placed on a page don't exceed the capacity of a page we have, (2) (l-x.k)ai <ku, for all i,k; j x1j where a. is size of-vertex i and p is page. size. A solution is considered 1 I feasible if and only if for xij = O andx then xP for all p > k. 1] 1] ij This constraint simply says that if vertices i and j are on pages A and B respectively for k pages in core and control at A, then if more pages are allowed in core with control at i, we will consider only those solutions which still consider B to be among the resident pages. This can be stated as t k..t xk > 0 if t < k, for t = 1, 2,...m, k Now xj. says whether j is on any page among the k pages which are resident when control is at i' this should be the same for all other vertices P k 1 which are on the same page as i, i. e., if x.. =0 and x = 0 then k xp = O. We state this as (3) N(l-j 1) * m k k) 2 k k)2] f(3) (1-x.. ), X 2 (Xpi -x. =0 k=l p] ip Jp for all i,j.

137 k k And of course x.. {0,1} for all i, j,k and x.. = 0 for all i, k. 13 11 Equation (1-3) defines an optimum problem in boolean variables which is more general than the problem defined in section 2. The information {xk, k= 1, 2,..., m} obtained in the solution can be used in a page turning policy in order to minimize the number of page interrupts. The optimization problem formulated for multiple pages core resident is indeed very difficult to solve for cases of practical interest. A possible area of future research in the formulation of more tractable optimization models which retain the features of page multiplicity. Implicit with a model which considers multiple pages in core is the consideration of page fetch and replacement strategies. More research into this area is appropriate. In the models developed no consideration has been given to multiprocessing program parallelism. Models which incorparate paging could include these features.

Appendix I Solution of Linear Pseudo-Inequalities In this section we present a branch and bound algorithm due to Hammer and Rudeanu [19 ] for solution of a linear pseudo-Boolean inequalities of the following form C1xl + c2x2 +... +cnn k d, (1) where c, c2,...,cn, d (c1 >2 >,..., > n ) are known real constants and xi (i=1,..., n) are Boolean unknowns. Any Boolean inequality can be put into standard form (1) by application of the following transformation. Let az + z +... + anzn + bnn >k (2) be the general form a linear psuedo-Boolean inequality takes where ai, b, (i=l,..., n) and k are known constants and zi (i=l,..., n) and zi = 1 - zi are the Boolean unknowns. We can assume, with no loss of generality, that ai bi for all i; also the proper sense of the inequality is always obtained by appropriate multiplication by -1. For each i, let us set zi if ai > bi Xi = (3) zi' if ai < bi Now the term aizi + bizi becomes 138

139 (ai-bi)xi + bi if ai > bi a~z.+b z. - (3a) aizi + bizi (3a) (bi-ai)xi + ai if ai < bi Thus equation (2) becomes C1X1 + c2x2 +. ** +Cnn >d where c1, c2,..., cn, d are all known constants. By reindexing we have c > c2 >... >c > 0. Throughout we will confine our attention to the solution of inequalities of standard form. Following Hammer and Rudeanu we need the following definitions. Definition 1. LetS = (x*1,..., x* ) be a solution of (1) and let I be a set of indices I C {1, 2,..., n}. Let Fr(S, I) be the set of all Boolean vectors (x1,..., xn) satisfying Xi. = X* for all i e I, The other variables xi, i j I are arbitrary. If all Boolean vectors (X1,.., Xn) e r(S, I) satisfy (1), then r(S, I) is said to be a family of solutions to (1). The pair (S, I) is said to generate the family r (S, I); the variables xi for which i e I are called the fixed variables of the family. If I = {1, 2,..., n} the family consists of the single vector (x*1,.., x*n) and is said to be degenerate.

140 Definition 2. A vector (x*l,..., x*n) satisfying inequality (1) is called a basic solution of (1), if for each index i such that x*i = 1, the vector (X*,...,* i-l'' *i+l' x* * x*n) is not a solution of (1). For solution of (1) the following three lemmas are needed. Lemma 1 Let (x X*p,..., x, x x*n) be a basic solution of (1), then (x*p+1,.., x*n) is a basic solution of the inequality n p c.x. > d- CkX*k j=p+1 k=1 Lemma 2 If (x*p+1,..., x*n) is a basic solution of the inequality n cx > d, j=p+l then (0, 0,..., 0, x*p,..., x*n) is a basic solution of (1). Lemma 3 If d > O and (x*2,... x*n) is a basic solution of n E cjxj > d-Cl, (4) j=2 then (1,x*, )...,* is a basic solution of (1) n)

141 Proof: If x*2 = = x = 0, the lemma follows immediately; therefore n assume that x*j > 0. It is obvious that (1, x*2,..., x*) is a soluj=2 tion of (1); hence it remains only to show that it is a basic one. By hypothesis (x*2,..., x*n) is a basic solution of (4), therefore it follows that for every k (2 < k < n) such that x*k = 1 n c. cjx*j < d-cl j=2 jfk Hence we have that n c + c.x* < d or j=2 jk (1,X*2' ***X*k * x* * x*n) does not satisfy (1) or (1, x*2,... X*n) is a basic solution of (1). Further n n cj.x Cj k+I cCx* < ck+d-cl < d j=2 j=2 jfk therefore (0, x 2''., x* ) is not a solution of (1). This completes n the proof of lemma 3. All basic solutions are found by repeated application of the above three lemmas. Table A1 summarizes those results.

142 Table Al Condition C onc lus ion Justification d K 0 il Uniqtie:.basic solution obvious X1 X2 = 0'-*Xn d > 0 and a. For each k=1, 2,...,p obvious Xk= 1 C x x x x x =0/i d> p1 2' Xk-l =xk 1 =n= cp+l > > cn is a basic solution b. If any other basic solutions exist, they are characterized by the property: x1=x2=.l.=x =0, and by lemmas 1, 2 (Xp+...* xn) is a basic solution of c.x. >d j=p+1 d > 0, and No sblution:.:. obvious n Z ci1< d d > O, c1 < 0 Unique basic solution obvious =n xl=x2=... =x n = 1 i=1

143 Table Al (continued) Condition Conclusion Justification d > 0, c1 < d The basic solutions (if any) are by lemmas 1, 3 characterized by the property: x 1-i d and I X1lS and (x2,., xn) is a basic solution of Zc. x d-c Zci jj1=2 d>, c1 < d The basic solutions (if any) by lemmas 1, 3, 2 are characterized by the n property: either Ci > d and x= and () i a i ~~ xi=1 and (x2,...,X n) isa basic solution of > ~cid d ZCx d-c i=2 j=2 or: X1=0 and (x2,..., X) is a basic solution of Zcxj> d j=2

144 Each basic solution, S, can be associated with a family of solutions r(S, Js) as follows. Let i0 be the last index for which X*i =, (i. e., x* = 1, and x*i = Ofor all i > i0), and let Js be the set of all indices i < io. Then r (S, Js) is the set of all vectors (x1,.., Xn) satisfying X*i i < i 1=1 arbitrary i > i it can be shown (see [18]) that if S1,..., Sm are the basic solutions of (1), and r(Sk, Jk)' k=l,..., m the associated families, then every solution (Xl,.., xn) Of (1) belongs to exactly one family. We now present an example. Example 2z + 6z-4z3 +z +3z K 6'1 + 6z2 43 4 + 35 <- 6 Rewriting as -2z1 - 6z2 + 43 - 4 - 35 > -6 Y = Z1' Y2 = z2 Y3 z3 Y4= 4' Y5= z5 By applying transformations (see equations (3) and(3a))we get (21-2)+ (6Y2-6) + 4y3 + (Y4-1) + (3y5-3) > -6 2y + 6Y2 +4Y3+Y4 + 3y5 > 6; by reindexing x1=Y2, x2=Y3, x3=Y5, x4=y1, x5=Y4 we get the standard form /

145 6x1 + 4x2 + 3x5 + 2x4 +x5 > 6 xl=l, x2=x3=x4=x5 = 0 by case 2 (see table- Al). (a) (1, 0, 0, 0, 0) So by part (b) of case two x1=0 and one new equation is 4x2 + 3x3 + 2x4 + x5 > 6 An examination of the coefficients shows that we have case 6; hence x2=1 and we have to find the basic solution of 3x3 + 2x4 +x5 > 2. which we see in case 2, x3=l-1 x4=0, x5=0, or (b) (0, 1,. 1, O0 O) and x3=0, x4=1, x5=0, or (c) (0, 1, O, 1, 0). Proceeding to part (b) of case 2 we have x5 > 2 which leads to case 3. We here follow this branch as far as possible; we therefore return to the second part of case 6, x2=0 3x3 + 2x4 + x5 > 6. This is case 4, x3 =x4 =x5= 1, (d) (0, 0, 1, 1, 1) This terminates the search for basic solutions. Tables A2 and A3 summarize the results.

146 Table A2 olution x1 x2 x3 x4 x5 a 1 00 0 0 b O 1 1 0 0 C 0 1 0 1 0 d 0 0 1 1 1 Table A3 Family x x2 x x x r(s2, I,) 1 1(S2,i2) 0 1 1 r(S3, 13) O 1 0 1 - r(S4, I4) 0 0 1 1 1 The dashes in Table A3 indicate the variables which are arbitrary in a given family, i. e., 11= {1} 12 = {1,2,3}, I3={1,2,3,4},I4 {1,2,3,4,5} so that family r (S4, I4) is degenerate. After applying the inverse transformation we get Table A4 which represents the families of solutions in terms of our original variables.

147 Table A4 Family z1 z2 z3 z4 z5 number 1 - 0 _ _ _ 2 - 1 1 - O 3 O 1 - 1 4 O 1 0 0 O

Appendix II Two Partitioning Algorithms In this appendix we give two algorithms for graphical partitioning. The first algorithm is the unit merge procedure developed at Informatics [ 19 ]. The second algorithm is the segmentation procedure of Kernighan [24]. II.1 Informatics Unit Merge Algorithm The unit merge algorithm is suitable for application to any graph G where the edge weight cij are not all the same. The procedure assumes that the graph G is represented by an upper triangular matrix W, W= [Wij] Wij =Cij +Cji i j =0 i>j ci is the weight of edge between v. and v. in graph G. The algorithm proceeds as follows. 1) The matrix W is -searched, and its largest element wij is located. 2) If the size ai + aj (of vertices vi and vj) does not exceed M the block capacity, v. and v. are merged. The matrix W is accordingly updated. 3) The process is repeated until all possible mergers have been made. That is, the process terminates when for each wi >0, ai+aj >3M 148

149 4) In order to reduce the. number of pages required, any remaining possible mergers are made with no regards to W. The mergers made during this phase of the algorithm do not change the interblock weight. They simply fill out any remaining empty space or partially filled pages. In order to update the graph when v. is merged with vi (i < j), the newly formed unit (vertex) is named vi. The new weight is ai, the sum of the old ai and a.. For allk Wkj and (Jk become zero. The old values of wkjand/or Wjk are added to wki or wk. II.2 Dynamic VProgramming Algorithm for Seg'mentation Kernighan has developed an algorithm for the segmentation of 2 program which requires computations on the order of n. Where n is the number of vertices of the graph. We present the model for his optimization procedure. Let G be a directed graph. A node (vertex) j is a successor of a node i iff (i, j) is an edge in G. In this case, i is a predecessor of j. A program graph is a directed graph G of n + 2 nodes {O, 1,2,...n,n+l1 such that 1. There is no edge (mO), for m in G; node O has no predecessors. 2. There is no edge (n+l,M), for m in G.; node n+l has no successors. 3. There is no edge (m,m), for m in G.

150 4. G is connected in the sense that for any node m in G, there exists a path from 0 to n+1 which includes node m. Modes 0 and n+1 are called the initial node and terminal node of G respectively. The vertices of G have integral weights {w(i), i= 0,1,...n+1} such that w (0) = w(n+1) = 0, and 0 < w(i) < p for all i in (i,n); p is a positive integer representing the pages size. A segmentation of the program graph G is iesignated as a partition of the nodes of G into k disjoint subsets G(1),...,G(k) such that k 1. U G(j): =G j =1 2. The nodes in any G(j) are contigious; that is G(f) contains nodes i, i+1,... m-l,m. 3. C w(i) < p; that is the sum of the weights of the nodes in iEG(j) each subset of the partition is less than or equal to p. The subset G(j) is called a page of the segmentation. Let b(j) be the minimum node number in G(j), that is, the name of the first node of G(j). The number b(j) is called the break point or page break. The set {b(j), j=l,...,k' uniquely identifies the partition {G(j)}, b(1) =1. Each edge (i, j) of G is assigned a nonnegative cost c(i, j) and the cost of a segmentation is degined to be the summation of c(i, j) over all i and j such that i and j are not in the same page. The algorithm below obtains an optimal segmentation, i.e., one with minimum cost. The

151 distance d(i, j) from node i to node j (i < j) in a program graph G is defined as d(i, j) = w(i) + w(i+l) +... +w(j); thus the distance is the sum of the weights of all nodes from i to j inclusive. Optimal Segmentation Algorithm Let C(x,y) be the incremental cost for a page break at x, given that the previous page break is at y, C(x,y) = [ c(i,j) + c(j,i)] y <i <x j>x and let T(x) for any x in (l,...n} be the minimum partial cost (as far as node x) for any segmentation of a section of a program {1,2,...,n), with a break at x. Thus T(1) = 1; T(x) is evaluated iteratively. The algorithm goes as follows. 1. Set T(1) =0 2. For x from 2 to n+1 in steps of 1, set T(x) = min [ T(y)+ C(x,y)] Y where the minimization is done over all y such that d(y,x-1) < p. If more than one y satisfies the condition, choose the smallest. Set L(x) =y. 3. Set Total cost = T(n + 1) set z(0) = n+l set k = 1 4. While z >1, do z(k+l) = L [z(k)]; k=k+l

152 5. Break points are z(1), z(2),...,z(k), in descending order. A proof of the optimality of this algorithm can be found in [ 24 ].

Appendix III Graph Theoretic Formulation In discussion of graphical models of computer programs it is very helpful to present concepts in a graph theoretic context. And it is in this regard that we present the following definitions following Busacker and Saaty [7 ]. A graph is defined as follows: A graph consists of nonempty set V, a (possibly empty) set E disjoint from V, and a mapping ~ of E into Vx V. The elements of V and E are called the vertices and edges of the graph, respectively, and Z is called the incidence mapping associated with the graph. An edge e e E is said to incida t with vertices v, w e V if Z(e) = (v,w). The vertices incident with an edge are called its end points, and are also said to be joined by the edge. Generally we do not refer to < explicitly and the fact that v and w are end points of e is denoted as e = (v,w). A graph will be denoted as G = (V, E) where the incidence mapping remains implicit. If V and E are both finite, G is called finite. For our descriptive purposes we are concerned only with finite graphs. A graph is said to be simple if at most one edge joins any pair of vertices and each edge is incident to two distinct vertices. If the vertices of a graph represent components of individual 153

154 activities, and the edges are assigned an orientation, then in this case the incidences relations within the graph reflect the way in which execution of certain activities is contingent upon completion of other tasks. When an orientation is assigned to edges we have a directed graph. A finite sequence e,e2,...,en of (not necessarily distinct) edges of a graph is an edge progression of length n if there exists an appropriate sequence of n+1 (not necessarily distinct) vertices vO,v1,v2,..., vn such that e = (vi i) for i = 1,2,..,n. The edge progression is said to be closed if v0 = vn and open if vO ~ vn. When the elements of a progression represent distinct edges, the edge progression is called a chain'progression if it is open and a circuit progression if it is closed. The set of edges itself, without regard to sequencing is said to constitute a chain in the former case and a circuit in the latter. With simple graphs, if all n+1 vertices v0,v1,v2,...,vn are distinct, the set of edges is called a simple chain. If v0= vn but the vertices otherwise distinct the unordered set of edges is: said to constitute a simple circuit'. A graph is said to be connected'if every pair of distinct vertices are joined by at least one chain. A graph is said to be a tree if it is connected and has no circuits. When dealing with directed graphs we replace the set E with a

155 set of "oriented edges" A called arcs, and B= (V,A). Incidence for any a e A is again designated as a = (v, w), where a is said to have v as its initial vertex and w as its terminal vertex. a An arc progression of length n is a sequence of (not necessarily distinct) arcs al,a2,...,an such that for an appropriate sequence of n+l vertices v,vl,...,vn we have ai = (vi_,Vi) for i=1,2,...n. The arc progression is closed if vO= vn and open if v0 Vn. An arc progression in which no arc is repeated is called a path progression or cycle progression, depending on whether it is open or closed. The corresponding set of arcs, is called a path or cycle respectively. A directed graph is said to be cyclic if it contains at least one cycle, and acyclic otherwise. When the term "tree" is applied without qualification to a directed graph it is understood that arc directions are ignored and the associated undirected graph is being described. Some writers tend to use the terminology node and arc when discussing directed graphs and vertex and edge when dealing with undirected graphs. We tend to use vertex and edge in both contexts and will specify when necessary if the graph is oriented. We will frequently refer to the connection matrix C of a graph.

156 It is defined as a matrix C = {cij} where c.- =1, (v)E E; 0, otherwise An analogous matrix can be defined for directed graphs, it is the incidence matrix [6] M= {mij} which is defined as, mi.. =1 if (vi, vj) e A, -1 if (vj,vi) e A, =0 otherwise.

BIBLIOGRAPHY 1. Arden, B. W., Galler, B. A., O'Brien, T. C. and Westervelt, F. H., "Program and Addressing Structure in a Time-Sharing Environment", Journal of the ACM, 13 (January, 1966), 1-16. 2. Asher, D. T., "A Linear Programming Model for the Allocation of R and D Efforts", IRE Transactions on Engineering Management, EM-9 (December, 1962), 154-157. 3. Balas, E., "An Additive Algorithm for Solving Linear Programs with Zero-One Variables", Operations Research, 13 (July-August, 1965), 517-546. 4. Balinski, M., "On a Selection Problem", Management Science 17, 3 (November, 1970), 230-231. 5. Beged Dov, A. G., "OptimalAssignment of Research and Development Projects in a Large Company Using an Integer Programming Model", IEEE Transactions in Engineering Management, EM-12, 4 (December, 1965), 138-142. 6. Berge, C., The Theory of Graphs and Its Applications, John Wiley and Sons, 1962. 7. Busacker, R., and Saaty, T., Finite Graphs and Networks An Introduction with Applications, McGraw-Hill, 1965. 8. Cabot, A. V.,'An Enumeration Algorithm for Knopsack Problems", Operations Research, v8, 2 (March-April, 1970), 306-311. 9. Dantzig, G., Linear Programming and Extensions, Princeton University Press, 1963. 10. Denning, P. J., "The Working Set Model for Program Behavior", Communications of the ACM, 11 (may, 1968), 323-333. 11. Dennis, J. B., "Segmentation and the Design of Multiprogrammed Computer Systems", Journal of the ACM, 12(October, 1965), 589-602. 12. Edwards, A.W. F., and Cavalli-Sforza, L.L., "A Method for Cluster Analysis," Biometrics, (June 1965) 362-375. 13. Estrin, G. and Martin, D. "Models of Computations and SystemsEvaluation of Vertex Probabilities in Graph Models of Computations", Journal of the ACM, 14 (April, 1967), 281-299. 157

158 14. Everett, H., "Generalized Largrange Multipliers Methods for Solving Problems of Optimum Allocation of Resources", Operations Research, 11, (May-June, 1963), 399-417. 15. Geoffrion, A., "An Improved Implicit Enumeration Approach for Integer Programming", Memo RM-5644-PR, Rand Corp., Santa Monica, California, (June, 196.8). 16. Gomory, R. and Hu, T., "Multi-Terminal Network Flows", Journal of the SIAM 9, 4 (December, 1961), 551-570. 17. Gomory, R. E., and Gilmore, P. C., "A Linear Programming Approach to the Cutting Stock Problem-Part Ir', Operations Research, 11, (1963&), 863 —888. 18. Gue, R., and Liggett, J., "Analysis of Algorithms for the ZeroOne Programming Problem", Communications of the ACM, 11, 12 (December, 1968), 837-849. 19. Hammer, P. and Rudeanu, S., Boolean Methods in Operations Research and Related Areas, Springer-Verlag, 1968. 20. Informatics Incorporated, "Program Paging and Operating Algorithms", Technical Documentary Report, TR 68-793-1 (September, 1968). 21. Jensen, R., "A Dynamic Programming Algorithm for Cluster Analysis", Operations Research, 17 (November-December, 1969), 1033-1057. 22. Karp, R. M., "A Note on the Application of Graph Theory to Digital Computer Programming", Information and Control, 3 (September, 1960), 179-190. 23. Kemeney, J.G. and Snell, J. L., Finite Markov Chains, Princeton, New Jersey, D. Van Nostrand Co., 1960. 24. Kernighan, B., "Some Graph Partitioning Problems Related to Program Segmentation", Ph. D. Thesis, Princeton, (January, 1969). 25. Kernighan, B. and Lin, S., "An Efficient Heuristic Procedure for Partitioning Graphs", The Bell Systems Technical Journal, 49, 2 (February, 1970), 291-307...... 26. Kral, J., "The Formulation of the Problem of Program Segmentation in Terms of Pseudoboolean Programming", Kybernetika Cislo 1, Rocnik (January, 1968), 6-11.

159 27. Kral, J., "One Way of Estimating Frequencies of Jumps in a Program", Communications of the ACM, 11 (July, 1968), 475-480. 28. Kral, J., "To the Problem of Segmentation of Programs", Proceedings Information Processing Machine Symposium, Prague, Czechoslovakia, (September, 1964). 29. Lawler, E., "On the Optimal Partitioning of a Family of Subsets of a Finite Set", to be published. 30. Lawler, E., Notes on Combinatorial Optimization. 31. Lawler, E. L. and Bell, M. D., "A Method for Solving Discrete Optimization Problems", Operations Research, 14 (November - December, 1966), 1098-1122. 32. Lawler, E., Levitt, K., and Turner, J., "Module Clustering to Minimize Delay in Digital Networks", IEEE Transactions on Computers, C-18 (January, 1969) 47-57. 33. Lawler, E. and Wood, D., "Branch and Bound Methods: A Survey", Operations Research, 14, 4 (July - August, 1966), 699-719. 34. Marimont, R. B., "Applications of Graphs and Boolean Matrices to Computer Programming", SIAM Rev., 2 (October, 1960), 259-268. 35. Nemhauser, G. and Ullmann, Z., "A Note on the Generalized Lagrange Multiplier Solution to an Integer Programming Problem", Operations Research, 16 (March - April, 1968), 450-452. 36. Pankhurst, R. J., "Program Overlay Techniques", Communications of the ACM, 11 (February, 1968), 119-125. 37. Pinkerton, T., "The MTS Data Collection Facility", Memorandum 18, Concomp Project, University of Michigan, (June, 1968). 38. Pinkerton, T. B., "Program Behavior and Control in Virtual Storage Computer Systems", Technical Report No. 4, Concomp Project, University of Michigan, 1968. 39. Ramamoorthy, C. V., "The Analytic Design of a Dynamic LookAhead and Program Segmenting System for Multiprogrammed Computers", Proc. ACM 21st Natl. Conf., (August, 1966), 229-239.

160 40. Ramamoorthy, Co Vo, "Discrete Markov Analysis of Computer Programs", Proc. ACM 20th Natl. Conf., (August, 1965), 386-392. 41. Randell, B. and Kuehner, C. J., "Dynamic Storage Allocation Systems", Communications of the ACM, 11 (May, 1968), 297-306. 42. Rubin, J.,."Optimal Classification into Groups: An Approach for Solving the Taxonomy Problems", Journal of Theoretical Biology, 15 (1967) 103-144. 43. Sebestyen, G., Decision-Making Processes in Pattern Recognition, Macmillian Company, NewYork, 1962.. 44. Shepherd, M. J., and Willmott, A. J., "Cluster Analysis on the Atlas Computer", Computer Journal, 11 (1968), 57-62. 45. Smithies, A.o, t"A Conceptual Framework for the Program Budget", Systems, Organizations, Analysis, Management, McGraw-Hill, 1969, 163-182. 46. Sokal, R., and Sneath, P., Principles of Numerical Taxonomy,, Freeman, 1963.

UNIVERSITY OF MICHIGAN 3 9111111111111111 11111 029471 47118111 3 9015 02947 4718