Division of Research Graduate School of Business Administration The Univeristy of Michigan Written Fall 1976 Revised Summer 1977 A NON-PROCEDURAL LANGUAGE AND A SYSTEM FOR AUTOMATIC GENERATION OF DATA PROCESSING PROGRAMS Working Paper No. 156 by N. Adam Rin The University of Michigan O 1977 by The University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research

A Ncn-Procedural Language and a System for Automatic Generation of Data Processing Programs N. Adam Bin * Abstract The rising ccst of software development has motivated research on the automation of parts of the software development process. Towards that objective, a nonprocedural language called MODEL has been developed to describe desired programs cf an information processing system. This paper provides an overview of MODEL and describes a software system that automatically generates conventional business data processing programs from specifications in that language. 1. Eackqgrcund ard CvyrviEi This paper is ccrcerned with research done on the automatic generation of conventional data processing programs. It provides an overview of a non-procedural specification language called MODEL and of an automatic methodology that produces application programs from specifications expressed in the non-procedural language. The research reported in this paper is directed toward reducing the costs of software development by automatic generation of application programs from non-procedural specifications of * The author, currently Assistant Professor at the Graduate School of Business Administration, University of Michigan, performed the research reported in this paper in 1974-1975 while a graduate student at the University of Pennsylvania Computer Science Department. This project was supervised by Prof- N.S. Prywes and supported by the Office of Naval Research under contract N-00014-67-A-0216-0014. See [1] for greater detail cn this prcject. 1

user requirements. Another gcal cf this work is to make the computer usable for a broader range of people. By developing a language and methodclcgy capable of automatically analyzing specifications and generating programs, it is hoped that it would be possible for systems analysts to build an informaticn processing system without the intervention cf apFlicaticn prcgrammers as middlemen. In order to pinpoint the area of automation covered by this research, it is useful first to review briefly research by others in automating the stages in the development of an information processing system. Various authors [2,3] have enumerated these stages differently but they are generally as follows: 1. Perception of automation need and determination of overall recuirements; 2. Production of functional specifications (user requirements specificaticn or logical design); 3. Physical system design; 4. Programming (Ercgram design, coding, debugging, and testing); 5. Installaticn and cFeraticn; 6- Maintenance and modification. The systematic automation of each of these phases has been proposed several years ago [33 in the ISDOS project, which has primarily made progress in the areas of the recording, documentation, and analysis of user requirements. The 2

potential berefits of autcmatinc each of the above stages have been subsecuently evaluated [2,4]. Production of user requirements specification above requires knowledge of the application area and problemsolving capabilities. It has sc far been automated only very minimally by limiting narrowly the scope of the problem domain in application-specific packages (e.g., [5]). Balzer f6] has suggested that the computer could acquire a problem domain through an interactive sessicn in a natural language. Yet success with such an approach probably will not be realized until major advances are made in artificial intelligence. Therefore in MCDEL the user is assumed to have knowledge cf the application area while the automatic programming system has design and programming knowledge. Some progress has been made in the partial automation of expressing Frcblen reguirements formally, automatically analyzing systei functional specifications, and performing some physical design. These have been reviewed elsewhere [2,3], but a few are mentioned here. There are a number of so-called problem statement languages for expressing system functional specifications formally [7]. The importance of such languages was reccgnized at least theoretically quite early in works such as those of Young and Kent [8]. A notable example is the Problem Statement Language (PSL) to express system functional specifications [7,9]. The impact of such problem statement languages has been shown to be of 3

great benefit [4] in aiding the software development effort by giving the system designer a formal way to record user requirements that can be documented and analyzed automatically- Once functional specifications have been expressed formally, a further degree of automation has resulted from automatically checking the specifications and producing scEe cf the physical design- Many projects (see for example [10,11,12,13,14,15,16]) have dealt with various aspects of automating the physical design process through formal models and structured systems development techniques. More limited aids to the systems analyst have actually been available previously in forms-oriented and tabular languages (see for example [17,18]). Yet not much effort has gone into the automation of the program generation process (stage 4) based on analysis of user requirements; i.e., the above systems do not generate programs based on previous non-procedural specification, analysis, and design. It is with this area that this research has dealt. Cther work in automatic program generation is mentioned ir the end of Section 3 of this paper and compared to CErEL. The programming phase has been recognized as a very time-consuming and tedicus task. Even with recent generalized data base management systems (which have to some extent relieved their users of physical knowledge of the data base and details of access and have provided for some 4

data independence and central control, among other services), prcgramming is extremely tedious. They generally require Frcgraming and Erccedural knowledge to write application prcgrams. The research reported here helps to bridge the automation gap by including the automation of phase 4 above -- the program design, ccding, and debugging phase. Section 2 of this paper provides a general overview of the MODEL language and Processor. Section 3 describes the main features of the MCDEL specification language and provides an example of its use. Section 4 outlines the automatic prcgram generation methodology used. Section 5 makes some concluding remarks and comments on further research. 2., verview cf the MCDEI LanqLuaqe and Processor The approach taken here is that the user has all the kncwledge pertaining tc the application area, while the automatic program generator will possess only and all the design and programming knowledge. The focus of the research reported here is on the automation of the programming phase whose input is a functional specification in a very high level, non-procedural language and whose output is an application program in a programming language. It deals primarily with the automatic generation of transaction 5

processing systems. Figure 1 depicts the roles of the user and the software system ihich accomplishes this. The letter references beloh are to the figure. In order to express such specifications, a fcrual ncn-procedural language called a "Module Descripticn language" (MODEL) has been developed in which the user describes desired prcgrams fa]. The language is used to describe ncn-prccedurally objects in various categories: 1. The input data to be processed 2. The cutput and updates to be produced 3. The computation and decision rules to be used. Unlike programming languages, the user submits to the Processor descriptive statements in this language in any order that he may desire, and can do sc in increments over a period of time. There are no control structures or other control flcw statements in the user's language, because the language is concerned iith data description, data flow, and information relationships. Each statement is used to describe a unit of data, a computation or decision rule, and is independent cf other statements. The user is able to concentrate cr cne unit of information or one computational requirement at a time and cccmpse statements as they are conceived withcut regard to their Iccation in the rest of the specification. The Processor puts the statements into the proper sequence later. Ihe MODEL language is described in more detail in section 3 of this paper with an 6

I jI 1 I (d) Input Data a PL/1 Program (f) (b) PL/1 Compiler (a) L-User's MODEL | Stmts. II Target Application Program _~IIC~IIR oft J 0 MODEL Processor (e). _M__ WNRNNWRV._ Program (c Flowchart & I ~c Spec. St; Listing & Complete s Repo 66O Output Data rF _ s ---.. Figure 1 Overview of the MODEL System

illustrative example The MOCEI Processor [b] (hereafter referred to as the Processor) analyzes the set of statements obtained from the user after stcring them in an irternal associative form. It analyzes the user statements for their consistency, completeness, and checks for ambiguities, informing the user of these in a report [c]. Thus, a complete and consistent specificaticn of a program would be achieved through several interactions with the Erocessor. If analysis of the MODEL statements results in a complete and consistent program specificaticn, the Erocessor designs the program and generates a ccmplete application program in the PL/1 language [d]. It also generates various reports to the user about the specification and the generated program [e]. The generated program can then be submitted to a compiler [f], and subsequent executicn of the target program proceeds [g]. A more detailed description cf the MODEL Processor and its methcdclcgy is outlined in section 4 of this paper. A preliminary prottype processor embodying the system concepts has been developed.* It was designed and implemented in PI/1 on an IEM 370/168. * The versicn of MCDEI reported in this paper is the original one developed by the author in 1974-1975. Revisions and improvements to the language and system are under way in various directicns both at the Univ. of Michigan and Univ- of Pennsylvania. These are mentioned at the end of the paper under "Further Research". 8

3. The Prqoram Specificaticn lanquae — MODEL This section discusses the MODEL language in greater detail. After a general description of the language characteristics, an illustrative example is presented. That is followed by a discussion of the class of problems currently describable in EODEL and the limitations of the current language and system. This section then ends with a brief comparison with other related languages and systems. MODEL langua features The "Module Description Language" (MODEL) is intended to he used by a system designer to state the requirements of a desired application program ncn-prccedurally- As stated above, it inherertly differs frcm programming languages in several respects. It is a ncn-procedural language, which has been loosely defined as a language that expresses "what" rather than "hot" [e.g., 19]. hCDEL is characterized as nonprocedural for the following additional, explicit reasons. First, the statements are all descriptive or declarative, not imperative. Each statement given by the user stands independently and is used to describe a unit of data, a relationship, a desired subset of records, a computation, or a decision rule. Mere important is the lack of necessity on the part of the user tc ccEpcse statements in any particular order. The crder or sequence in which statements are given by the user is immaterial, since all sequencing of tasks or 9

events to be performed are deduced by the Processor. Therefore, there are DC control structures in the language. This ability by the user to submit statements in any sequence reduces the expertise required on the part of a user. It enables him or her to concentrate on and submit one unit of infcrmation or ccmputational requirement at a time. Unlike conventicnal language compilers, the MODEL system allows the system designer to add statements to the specification ithout regard to their location in the specification. It allows statements to be added in incremental stages and provides feedback on the completeness and consistency of the specification between iterations. (Examples of such feedback are given later.) In short, there is no computer programming knowledge necessary in order to compose MODiI statements in several respects: there is no procedural or seguential thinking reguired, there is no reference tc computer prcgramming terminology, and there is an absence cf concern with processing details such as data access, program "housekeeping' tasks, control flow, etc. MODEI languaqe statements and an example A descripticn and examples of the statements in MODEL follow. One limitation to the use of MODEL in its present form is the fact that it is a formal language and has a somewhat rigid syntax. Although it is a specification language and not a programming language, it nevertheless 10

would require some training on the part of a systems analyst. A specification in the MCDEI language consists of several parts: a header which has a list of source and target data files of the module, a data description section, record selection rules, and assertions that state computational or decision reguirements. In order to describe and exemplify the nature of the language, Figure 2 presents a system flowchart of a sales problem. There are three source files: the transaction from the point-of-sale terminal, an inventory stock status file, and a customer file cf charge account information. The output files are the sales slip to be produced at the terminal, and the updated inventory and customer records. The program to be specified is to process the transactions producing the sales slip, and to update the inventory and customer records to reflect the sale. For tutorial purposes, Figure 3 shows the specification of a small subset of this problem in the MODEI language. In the example given, only the transaction, inventory, and sales slip files are shown and the files have cnly a small subset of the fields that would actually be carried in such files. A sLbset was chosen here for the purposes of a sufficient example, and it will be used to describe the Prccessor methodology in Section 4. Furthermore, the specification depicted shows only two tasks to be carried cut: to compute the charge and to update the 11

~1 1I Figure a '., Illustration of Department Store Sale - ------ I..,.. i

/* */ /* "EINSALE" ERCGRAM SPECIFICATION /* */ /*********** ************* *** ***************************/ 1 MODULE: ZINSALE; 2 SOURCE FILES: SAIETBEA, IEVEN; 3 TARGET FILES: SALESIIP, IIVEN; /1************1 **** **********************************S(i(*/ /* */ /* EIIE DESCBIFPICNS * /* */ /** ** **** ****** ******** ** ******** ********************/ 4 SALETRAN IS FILE(RECCRD IS SALEfEC); 5 SALEREC IS RECORD (CUST#,STOCK#,QUANTITY); 6 CUS1# IS FIEID (CHAE(5)); 7 STOCK# IS FIELD(CHAIi(7)); 8 QUANTITI IS EIEIEDEUMEEIC(3)); 9 INVEN IS fIIE (BECORE IS INVREC, STCRAGE IS INVDISK, KEY IS STOCK#); 10 INVEEC IS RECORE(STCCKI,SA1EPEICET,CH); 11 STOCK# IS FIELD(CHA (7)); 12 SALEPRICE IS FIEID (lUMERIC(5)); 13 COH IS FIELD (NUMESIC(5)); 14 INVDISK IS DISK (ORGANIZATION=ISAM,UNIT=3330); 15 SALESLIP IS REPCRT(BEPORI_ENTFY IS SLIPREC); 16 SLIEREC IS REEORT ENTFY (CUSI#,STOCK#,CHARGE); 17 CUSI# IS FIEID (CHAR (5)); 18 STOCK# IS FIELD (CfAE(7)); 19 CHARGE IS FIETE(KUCEBIC(8)); /*************************************3********************* /* */ /* ItTER-FECCEE FELUIIONSHIPS */ /* */ /* ******************************************************/ 20 SALETRAN.S1OCK# IDETIIEIES INVEEC; /y*********** x ******* ***^* ********************** *******/ /* */ /~* ASSEBIICNS * /* */,****8**+**** 4***********i*4****************************** 21 CALCCHEG: CHARGE=QUANT'ITY*I YVEK. SALEEBICE; 22 UPDQUANI: NEW.INVEN-QOH=OLD.INVEN. CH - QUANTITY; FiqurE 3: SamlE Seet cf MCDEL Statements 13

guantity on hand in the invertcry record. In this sample, only one item purchased is assumed, but the specification could be expanded to handle multiple items. Looking nc~ at the sample specification, the header is a formalizaticn cf the system flowchart, showing the name of the desired program and the input and output files. In the data descriticn porticn of the specification, a networklike model is used. Each record type is described with its component groups and fields. Such data descriptions are not unlike a COOEL or PL/1 hierarchical structure. The individual fields are given their specific attributes, such as field type and length. Furthermore, various statements describing inter-record relationships are provided to handle a network structure- In this example, the STOCK# field in the transaction file corresponds to an inventory record in the inventory file. All the above such data descriptions could be stored in a library for use by other program specifications using those files. In order to describe the specific tasks of the program, selection rules can be provided. These are logical expressions for subsets of data to he processed. In this example, none are stated and therefore all records are processed as a default. Examples of statements for record selection rules are given after discussion of this sample problem. 14

The ccfutational requirement rules, called "asserticns" in MCDEL, are specified tc provide the information relationships, data computations, and decision rules. It is important to note that these could appear in any order and dc not in themselves denote anything about seguence of events. An assertion of the form "CHARGE=QUANTITY*PRICE" is to be regarded as an algebraic eguation, and not as an assignment statement as in a programming language- In fact, the entire set of statements in the specification could Le shuffled with no effect on the meaning.* Statements in conventional programming languages such as 'total=tctal+amount" would of course not be in the MODEL language for they are procedural and could have no non-procedural interpretaticn. Such computations as totals would be perfcrmed in this language by high-level functions. Therefore, in order to augment the basic arithmetic and logical operators, a library cf common high-level functions, such as for totaling or counting, has been built. For example, a statement such as the following could be specified: * Ihe one exception tc the principle that statements can appear in any order is the following: Since the same field name may appear in more than one file, an assumption is made by the Processor that a field described in a FIELD statement is associated with the file mcst recently described. 15

ICTCHRG=ICTAL(CHAGE); where TOTCHBG cculd be ir an output report or a field that could be used in other statements. In the context cf an expanded version of this example, the analyst cculd specify rules fcr decisions, accounting, and other rules that would indicate dispositions in certain eventualities, such as wien a stock item is not available for inventory or when a purchaser exceeds the allowed credit limit. The acccunting rules could specify the determination of charges for purchases and the method of determination of the customer's balanceIn the sample of Figure 3, two computational statements are given. The first is a straightforward computation for the charge of an item. In the second statement, the words OLD and NEW before a name are used to distinguish between corresponding field nanes in a file that is both input and output. Here the guantity-cr-hand field in the inventory record is updated. The cther fields in the record are assumed to be urchanged. After submitting a set of statements such as those in Figure 3 to the Processcr, the user of MCDEL gets feedback as to the correctness, consistency, and completeness of a set of statements beyond the feedback of a conventional compiler. The user is infcrmed in the form of listings and reports as to inadeguate descriptions, contradictory 16

statements, and assumptions made by the Processor. Examples of logical inconsistencies and incompleteness detectable by the Processor are circular computational definitions, undefined output fields, two contradictory computations for the same field, etc. Section 4 of this paper outlines in more detail hc~ the system processes a set of MODEL statements, including the feedback the user gets. A complete and consistent MODEL specification would be achieved through several interactions vith the Prccessor There exemplified to define conditional operations examples of are other features cf the MODEL language not in the subset of Figure 3 such as the capability subsets of files, the capability to define assertions, and the capability to define on repeating groups and fields. Some brief such statements follow, In order to provide a way of specifying logical expressions tc determine which subset of a file is to be considered fcr a desired application program, a subset selection statement is Frovided. For example, to restrict the set of transactions to be processed to only those records that have a guantity-crdered of more than 100, the user could have the following statement: 17

SUBSIE.SALETRAN: CUANII1Y > 100; In ether wcrds, any record in the SALETRAN file not meeting this criterion is excluded. MODEL lets the user state computational rules that hold cnly under certair ccnditicns, in two parts. In the first part, the user can designate a name for the condition, and give the correscpnding Bcclean expression. For example, to designate a condition called SALE as occuring when the customer's cld balance (in the CUSTomer file) + total charge is not greater than his credit limit, the user could write: CONDITION: SALE: OID.CUST.EALANCE + TOTCHRG <= CLC.COST.CREDLIM; Then anywhere else in the specification the user can define one or more computations that are contingent upon a condition defined elsewhere- Fcr example, to update a customer balance only when the SALE condition has been met, the user can write IF SALE THEN NEW.CUSI.BALANCE = OLD.CUST.BALANCE + TOTCERG; Keep in mind that these could be submitted in any order. The reason for splitting up conditional relationships in this way is that ether computations could reference the same conditicn and could be added later in any location. Thus, the user can insert ccmrutations that are to take place under certain eventualities as they occur to him or her. It 18

will be the Erocesscr's task tc group the actions for a condition tcgether at the a[Ercrriate place. On the other hanud, if a user finds the traditional IF...THEN...ELSE statement (such as in PL/1) more convenient, the Processor would accept it as wellA more detailed discussion of these language features is actually beycnd the intended scope of this paper. The reader is referred tc [1] for a more complete description and examples cf the language. Class of Drckleus and some current limitations Frcm the previous sections, it can be inferred that the MODEI language can be used tc describe desired application programs in many areas of data processing. MODEL is particularly suited tc the automatic generation of traditional transaction processing systems where transactions are processed, master records are updated, added, or deleted, and reports are produced. The features of ECDEI as outlined above enable it to describe aluost any aspect cf application programs. However, the current version of MCDEI has several limitations. For one thing, as Mentioned, the somewhat rigid syntax of MODEL in its current form limits the range of users enabled to use it, but a more user-friendly front-end interface to it could alleviate this Ercblem. 19

The currert versicn does not provide for report formatting facilities. Ercgrams generated by the Processor treat output reorcts as cutput files. This is not considered a conceptual drawback, since report generation facilities are known technology and could be incorporated in future versions. Although nct a prcblem in practice, another limitation is in the use of computational statements. The lack of capability to specify sequence provides a great degree of ease of use and does not in itself pose any limitations. However, computational statements are limited to single equations kith single unknowns on the left-hand side; that is, statements of the fcrm Y=arithmetic-expression or Y=F(X1,...,Xn). Thus it is rot possible to generate new algorithms with the system, ncr is that its intent. It is expected that the vast majority of data processing programs could be handled with a dozen or so high-level functions (such as totalling, counting, maximum, minimum, etc.) in a library. Another assumption made by the current version is that the physical organization cf the files used by the target application program is pre-desigred by a system designer before MCDEI is used as a tool for automatic program generation. The current version uses the operating system sequential ard indexed sequential access methods in its implementaticn cf input and output statements for the 20

generated application prcgram. However, one of the extensions cf this language under way is directed towards generation of applicaticn programs that use generalized data base management systems. Finally, the Processor currently generates one program per specification. It is envisioned that in the future a further optimization phase would modularize the generated program based on efficiency and other considerations. Comrariscn with cther larguaqes and systers Comparing MCDEL tc ESI 9 ], PSL does not have sufficient facilities in it in order to generate an executable program, nor has that been its focus. By the same token, MODEL is a much smaller and simpler language because it has only those facilities necessary in order to specify and generate a program, and is not concerned with the target environment as a khole as is PSL. Other "autcmatic prcgrarming" projects have had similar goals as MODEL, but preliminary indications (e.g., [21]) are that they require more procedural knowledge and are not as user oriented. 21

There are many other classes of non-procedural languages hut they are for use in areas other than generation of application programs (such as data translation). Some query languages show some similarity to MODEL, but they are generally concerned with one-time ad-hoc retrieval rather than with generation of transaction processing application systems for repetitive use. Comparisons could also te made between MODEL and other program geereatcrs, but they deal with other problems. For example, [22] deals with automatic generation of DBTG information retrieval programs from specifications of "relations" and not kith generation cf transaction processing systens. 4. 'he Autocatic Proram Generation Methcdolos The Processor has the task of accepting such specifications and generating a PL/1 program complete with all the necessary declarations, control structures, sequenced statements, input/output statements, housekeeping tasks, etc. 22

The system goes through varicus phases, labelled PHASES I-V in Figure 4, in order to generate the program from the specification. Although the detailed algorithms are beyond the scope of this paper, the general methodology and the five phases are outlined below, and the reader interested in greater detail ray refer tc [1]. Phase LI): SFytax analysis cf the KCDEL module specification Syntax analysis technology is well known and follows that of a corpiler. Ir this phase, the provided MODEL specificaticn is analy2ed to find syntactic and some semantic errcrs. This phase of the Processor is itself generated automatically by a zeta-processor called a Syntax Analysis Program Generator (SAPG), whose input consists of syntax rules provided through a formal description of the MODEL language [1]. Thus, changes to the syntax of MODEL during development and in the future can be made easily. A further task of tkis phase is to store the statements in a simulated associative memory for ease in later search, analysis, and processing [1]. Sore needed corrections and warnings of possible errors are also produced in a report for the user. 23

I. II. III. IV. V. sourc files I Figure 4: Phases of the MODEL Processor

Phase JIIfl A.alyss of ECDEL specification In this phase, precedence relationships are determined froc analysis of the MCDEL data descriptions and assertions, and the specification is analyzed to determine the consistency and completeness of the statements. Each MODEL statement may be considered to be an independent, standalone statement. The order of the user's statements is of no consequence. Hcwever, in analysis of the statements, precedence relationships are determined on the basis of description cczponenrts, Precedence rules, based on programming kncvledge, have been built into the Processor. Two frequently used precedence rules are hierarchical precedence rules (e.g., having tc read a record before its component fields can be used) and value dependency precedence rules (e.g., having to compute A=B+C before D=A*C). These relationships are used to form a precedence graph on which the completeness, consistency, and ambiguity of the specification can be checked. Reports are produced for the user indicating the data, assertions, or decisions that have been inadeguately described, assumptions that have been made by the Processor, or contradictions that have been found, and reports are provided to cross-check the descriptions. Fcr example, if an output field has not been giver a value By a user rule, the Processor has a series of rules it uses in order tc determine its value, such as using the value of same-named field in an input file. If a value 25

cannct be assumed by the Erocessor, an incompleteness message is printed tc the user and he is prompted to complete the missirg infcrmaticn. Examples of inconsistencies that are detected by the system are circular definitions (e.g., A is a function cf B, B is a function of C, and C is a function of A) or contradictory computational rules for the same field under conditions which are not mutually exclusive. Thus, it is envisioned that a complete and consistent specification of a program would be achieved through several iterations and through interaction with the Processor. An example cf a directed graph that represents the prcblem expressed in zCDEI in the previous section is given in Figure 5. aote that the nodes represent the records, fields, computations, etc., as described by the user. They could have been entered ircrementally and in any sequence. The system "draws the arcs" based on its internal precedence rules. Many representations of such a directed graph appear in the literature. Figure 6 shows an adjacency matrix which MODEL uses for the rrctlem described above. The various entries in the matrix show the relationships existing between the various objects. This matrix representation enables easy checking cf ccnsistency and completeness criteria described above. For example, a cycle enumeration algorithm on the matrix can be used to detect circular computational definitions. 26

i I t I 1 6 if r i |: i~ ~ * ft * * * f * * t 16 (2oo t ) —t C = —o l \r2 OIN:E tOLD DISo ). SALEFEC ) OLD INVRC ) P '2 1 93 / 11 \11 I 1 t \7 > / OLD / OLD \ T OLD \, j: sTocKs qTy, sJToC) T SAL S T,C.S | 'f' */ ^^'' I ' *', CALCCUIRG UPDQUAN * i2 STOC- -\ NS. "G\;W-., i / ) [ ) STOCKX ) ( SALs Q OH ) RIC' /..I 25 /. l SLIPPEC E* — J — Hierarchical Input I INVREC j 1 - Herarchical Output j 17 E- xplicit Dependence......... Implicit Dependence SALE — Storage NEW SLIP __ Pointing INVEN i Figure -" Digraph for MODEL Specification of Figure. i

0 AOJACEIJCY:MATRIX OF JANE RELATIONSHIPS...-. t-1. 5 tO 15 20 I., 2 3 4 o 5 6 7 0 8 9 10 0 1 1 12 13 @ 14 15 16 0 17 16 19 ~ 20 21 22 a 23 24 25 0 26 27 CALCCIIRC INV()ISK M4ItlcALC Nrt. w. ItNlEJ n1cw. IIIJN.00II NlCW. IIlVLfJ.SALPRICE tJl 4. IIlVLI. STucK. Ntw. IIIvl EC OLU. IN jvENl. 0 P1-. I IVl.o 00l OLL. IlliEll.SALPRICE OLU. IjNvEN.STuCK4 OLO.) I It/itCC PolrlTJ'l. OL. INVREC SALEr)nccK SALCt-c SALESL I' SA.LE~SL IT' ~,CHAGE SALSL IP. CU ST SAiL. SL IP.STOCK# SALE rRAN SALE TRAN.CUST SALLTAI(NIJUANI ITY SALE TRAN-oSTOCKt SL IPRC TRIPNV UPDOUAN. O n On O n o n n n o n n n 3 0 0 O o o )( 'On 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000 ) (" u n 0 0 n u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a o0 0 0 o 0 0 0 o 0 n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0oo00ono02000oo00 o oo0 ooo000oo oo000 02ooo0000000 0000oooo 0 0 000 00 020 0 1 00000 0 0000000n00o 002000000000000000000000000000 000000000000000000000000003 n a 0oo oo 0 0 0 0 5 0 a a 0 0 0 a 0 0 o 0 00 00 00 0 0 0 0 0 0 0 0 0 00 0 0 00 0 0 000000 000 0000000o000000 o00 0 0 0 0 0 0 0 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n Q 3 000000000 0 00000 000 00 f 0 3 000000000000000000000000000 0 0 0 0 0 0 u u 0 0 0 0 0 0 0 0 0 0 0 4 00 0 0 0 3 0 000000000000000000000000000q 000000000000000 0000000000 000 0000 0000 n 0000 000 000 0 o0 n n 0000 000000 0000 n 0 n 000000 3 00000 00000000 0000 0.,. o A NON-ZERO ENTRY IN ROW I g COLUMN J II:.DCATES TIAT ITEM I PfECEDES ITCH J. TIIE CODE REPRESENTS THE FOLU 1 = HIERARCHICAL(SOURCE)I 2 = HIERARCHICAL(TARGET)I 3 = EXPLICIT DEPENDENCYt 4 - IMPLICIT DEPENOENCY: Q 5: POINJTINS RELATIONSIIIPI 6 = STORAGE RELATIONSHIPP 7 = CONDITIONAL OEPENDENCY Figure 6 Weighted jacency Matrix for Sa MODEL Sp. Figure *, Weighted Ajmcency Hatrix for Sampte HODSL Specification ~ ~. I I I I — I......~-... k-,, -~,!, ",'k 1 ~ "l ',, 3.4-, W 9 -d~ -, - - ~rr,~ ~ '~'..-4... ~

Phase JII-: Automatic program desiqn and generation of sequence and control lcqiic This phase cf the Processor determines the sequence of execution of all events implied by the specification, using precedence graph theory algcrithms, and thereby determines the sequence and control logic of the desired module. The result of a topological sorting of the graph presented above is a table such as the one shown in Figure 7. This table forms the skeleton of a flowchart for the target program, as it shows the order of the nodes in their executable sequence. It also shcws the ranks of these events, which as a by-product denotes some of the possible parallelism. Design of the object prcgram proceeds with scope and iteration analysis and flow optimization. Iterations are based on the existence of repeating groups or fields in the data description, or are designed for processing a set of records sequentially. the result of this phase is a set of data structures representing the desired sequence of processes and flow of events, sequenced, ranked, and optimized in their order of execution. 29

/ SEQUENCE OF PROCESSING ORDER VECT. ORDER NAME RANK INDEX VECTOR ' 1 3 MINSALE 0 2 9 OLD. INEN 0 3 21 SALETRAN 0! 4 15 SALEDECK 1 5 16 SALEREC 1 I 6 22 SALETRAN CUST 2 i 7 23 SALETRAN.QUANTITY 2 8 24 SALETRAN.STCCKi 2 9 19 SALESLIP.CUST# 3 10 20 SALESLIP.STOCK# 3 11 26 TRINV 3 12 14 POINTER.OLD. INVREC 4 13 13 OLD.INVREC 5 e ~ 14!0 OLD.INVEN. OOH 6 15 11 OLD.INVEN.. SALPRICE 6; 16 12 OLD. IVEN.STOCK, 6 17 1 CALCChRG 7 >: ~ 18 6 NEW.INVEN.SALPRICE 7 19 7 NEW.INVEN.STOCCX 7 " 20 27 UPDQUAN 7 ' 21 5 NEW.INVEN.QOI 8 - 22 18 SALESLIP.CIbARGE 8 23 8 NEW.INVREC 9 24 25 SLIPREC 9 25 4 NEUl.INVEN 10 26 17 SALESLIP 10 27 2 INVDISK 11 i,.- 11, f _ i Figure Digraph Nodes Sequenced in Precedence Order

Plase _IJV) Code qeneraticn At this point in the process it is necessary to generate, tailcr, and insert the ccde into the entries of the flowchart tc prcduce the program. Code is produced in two steps for purposes of modularity and independence of the target language. The first step produces a languageindependent version cf the flowchart-entries, as noted above, while this second step produces code in the PL/1 programming language. In particular, input/output statements are generated whenever the flowchart indicates the need for records. Calls tc procedures embodying the assertions are generated in the appropriate places in the flowchart. Wherever prcgrai iterations and cther ccntrol structures are necessary, prcgram code for them is generated, such as for repeating groups or fields. Declarations for object program data structures and variables are generated. The product of this phase is a complete program in a high level language, PL/1, ready for compilation and execution. A listing of the generated program as sell as the flowchart-like report is produced. This documentation is not expected to be of significance to the casual user, but it would be available for a ccputer programmer in the event that it may be needed for deeper understanding cf the prccedure.

Phase VL.i Prcqjra compilaticn and execution This phase starts with the PL/1 program module that has been autcoatically produced. with the generated PL/1 program being submitted tc the El/1 cptimizing compiler, program compilation and optimization of code on the machine-language level is effected. The automatically generated program is then available for use in execution. 5. Ccncludinq Remarks and further Research This paper has outlined and exemplified the basic features of a non-procedural specification language for describing modules of an information system and a processor for generating programs automatically from specifications expressed in that language. The system described here has demonstrated the feasibility cf such an approach. It is expected that such a language and system could be an important step towards the automation of the software develcpment process if given strong industrial level support, reliability, and funding. Some conclusions are already evident. Such a system reduces the amount of expertise and time needed to generate today's typical data processing program. Since MODEL is non-procedural, it enables the user to concentrate on describing data and their inter-relationships by providing a set of statements that can appear in arbitrary order. The Processor, unlike conventional compilers, is able to deduce the sequence of 32

events and checks much of the user's logic and provides effective feedback. In addition, it produces the desired program complete with sequence and control logic, declarations, input/output, relieving the user of such procedural thinkirg. From preliminary indications, the execution time efficiency cf a program generated by the Processor is good. Unlike "generalized packages," the Processor generates an ad hoc program fcr a particular problem. Therefore, the code generated is peculiar tc a given use and contains no unnecessary instructions. Unlike some "pre-ccmpilers," the system reguires no prccedural knowledge to use nor is it limited to particular application areasFurther research is taking several directions at various institutions including the University of Michigan and the University of Eennsylvania. Refinements are being made in the language and in the degree of logical analysis that the system can perform cn a specification. Some work is being conducted on automatic data aggregation and modularizaticn of generated programs based on efficiency and other ccnsiderations. At the University of Michigan, there is also an effort to apply some of this technology to the autcoratic design, generation, and restructuring of programs that use network data base management systems. The data manipulation languace UDMI) cf many current network model 33

data base management systems is highly procedural and technical, and requires "data navigation" and other procedural knowledge by an experienced programmer. Another language based on the prirciples outlined in this paper is now being developed for use in defining non-prodcedurally application programs fcr retwcrk data base management systems. An interface between PSL [3,9] and MODEL is also under investigation, sc that PSL users could use the code generation capabilities cf ECDEL. 34

Eiblicgraphy L1] Rin, N. Adam "Autciatic Generation of Business Data Processing Programs from a Non-Procedural Language," Ph.D. Dissertation, Computer and Information Science Department, University of Pennsylvania, 1976. [2] Prywes, N. S. "Autcmatic Generation of Software Systems -- a Survey," Data Base, Vol. 6, No. 2, Fall 1Sc4. [3] Teichrcew, D. and Sayani, H. "Automation of Systems Building," Eatamationl Aug. 1971. [4] Teichrcew, D. And Merten, A. "The Impact of Problem Statement Languages on Evaluating and Improving Software Development Performance," Fall Joint Comuter Conference, 1972. [5] Hax, A. [6] Balzer, C. and Martin, W. A. "Automatic Generation of Customized Mcdel Based Information Systems for Cperaticns Management," Proceedings on the Whartcn Ccnference on Research on Computers in Crgani2aticnsg Philadelphia, Oct. 1975, edited by H. I. Mcrgan, p-. 117-121. R. M., "A Global View of Automatic Programming," also Mecmrandum cn "Autcmatic Programming," USC Informaticn Sciences Institute, Marina del Bey, California, Sept. 1972. [7] leichrcew, D. "A Survey of Languages for Stating Requirements fcr Computer- Based Information Systems," Fall Jcint Computer Conference. 1972. [8] Young, J. W and Pent, H. K "Abstract Formulation of Data Processing Problems," National Cash Register Co., also given at 13th ACM National Meeting, June 1S9E. 35

[9] Hershey, Rataj, leichrcew, and Berg PSL Manual, ISDOS Working Paper #68, University of Michigan, Oct. 1973 -[10] Severance, D. G. "Some Generalized Modeling Structures for Use in Design of File Organizations," Ph.D. Dissertation, University of Michigan, 1972. [11] Cardenas, A..-, "Evaluation and Selection of File Organization -- a Model and System," Communicaticns cf the ACM, Sept. 1973, pp. 540 -548. [12] Nunamaker, J. F- Jr. "On the Design and Optimization of Information Processing Systems," Ph.D Dissertation, Case Western Reserve University, June 196SE [13 --- ----- "A Methodology for the Design and Optimization of Information Processing Systems," AFIES Proceedingqs 1971, SJCC, pp, 283-294. [14] ---- ---- "Processing Systems Optimization Through Autcmatic Design and Reorganization of Program Modules," CSD TE77, Purdue University, Dec. 1972. [15] ---------- SODA, ISDOS Working Paper #36, University of iichigan, Feb. 1971. [16] Graham, Clang, and DeVeney "A Software Design and Evaluation System," Communications of the ACM, Feb. 193. [17] ADS, National Cash Register Cc., 1968 -[18] TAG Sales and Systems Guide, IBM, GY20-0358-1, 1968. 36

[19] Leavenworth, Eurt M. and Sammet, Jean E. "An Overview cf Kon-prccedural Languages," SIGPLAN Notices, Proceedings of a Sympcsium on Very High Level Languages, Santa Monica, California, March 28, 1974. [20] Beck, leland "An Approach to the Creation of Structured Data Processing Systems," SIGMOD Proceedings, 1976. [21 ] Ruth, G. B. "Status of Erotosystem I," Automatic Programring Grcup, Massachusetts Institute of Technolcgy, 1976. [22] Gerritsen, R. "Understanding Data Structures," PhD. Dissertaticn, Computer Science Department, Carnegie-Mellcn University, 1975. 37