File: readme Author: Edward A. Green Description: readme file of Compiler Design (66.648): Project 2 FILES readme this file block.c c code for handling program blocks code.c c code for handling quads labels.c c code for handling program labels, jmps, calls, etc. leads.c c code for calculating leading lines of basic blocks next_use.c c code for calculating next-uses optimize.c c code for doing constant propagation and removing useless var. symbols.c c code for storing the symbol table util.c general c routines. proj2.c main block for project 2. proj2.h include file for project. readme1 the readme file for project1 proj1.yac revised project 1 syntactic analyzer. proj1.lex revised project 1 lexical analyzer. makefile make file for generating two executable files, phase1 & phase2 example.p example file for testing the programs. OPERATION The system can be made and run successfully on a SUN system with Lex, Yacc, cc, and gcc (Turing, for example). Running the make file generates two executable files, phase1 and phase2. Phase1 takes a mini-Pascal program (Aho, et. al., pp. 746-748) in standard input and generates a quad program with a symbol table. Phase2 generates the optimization output required by project 2. To run these programs on an input file "myfile.p", enter: % phase1 < myfile.p > myfile.q % phase2 myfile.q // the output is stdout CHANGES TO PHASE 1 A modification in the output to phase 1 was made. Now the symbol table is encoded as a prefix to the quads for which it applies. The following is a sample of a symbol table (with comments): *B fill // Start of a new block named "fill" *V fill j integer 52 // an integer defined in "fill" named "j" *V fill a real [1..5] 28 // a real array defined in "fill" named "a" *R fill fill 2 // a procedure named "fill" with 2 arguments *P1 real [1..5] 28 // the first parameter of fill is a real array *P2 integer 48 // the second parameter of fill is an integer *V example a real [1..5] 8 // (note that program offsets follow variable *V example j integer 4 // definitions. offset of 1st "j" is 52) . . . *F sum sum integer 1 // A function named "sum" returns an integer *P1 integer 56 // "sum"s only parameter is an integer, off. 56 . . . The quads generated here are identical with those generated by the first project (see readme1). INTRODUCTION TO PHASE 2 This project takes the output file from phase1 and performs two optimizations on the quads. The two optimizations are CONSTANT PROPAGATION and USE VARIABLE REMOVAL. The output does not follow the adopted conventions for the quad code used above, but generates the output required by the project. The output has the following sequence: 1) basic block connection table 2) a dump of the input code showing a more readable form of the symbol table, and each statement of the program with a comment showing the statement number and the block it belongs to. 3) The calculated next use table. This shows the variables (including temps) that are modified and used by each statement. For the variables modified, the number of the statement in which the variable is next used appears in parenthesis. If this number is -1, the variable is not used again inside the basic block. 4) A dump of the output code after the optimizations are done. This dump has the same format as the first dump. It will be noticed that some statements have been changed, and some statements have been removed. OPTIMIZATIONS The two optimizations performed are constant propagation and useless variable removal, in that order. It was found that propagating constants sometimes made some variables useless (see the last example, below). The optimizations were done on the range of basic blocks. The text indicated that "if no live-variable analysis has been done, we can assume all non- temporary variables are live on exit" of a basic block. Therefore, the only useless variables to be eliminated were those temporaries isolated to one basic block. Since some temps in this quad scheme span more than block, an initial search for temps isolated to one block is done. Only these variables will be considered for elimination. There are two ways a variable can be eliminated. First, the following transformation is made: ADD x y T1 MOV T1 z which is code for z=x+y. This is changed to: ADD x y z The next elimination is done when a statement modifies a local temporary that has no next use in the block. The statement is eliminated. This type of elimination crops up when the constant propagation preceeds the useless variable removal. For example, the input statements might be: ADD 2 3 T1 MOV T1 z which is code for z=2+3. When the constant propagation is done, we have: ADD 2 3 T1 MOV 5 z Now the variable T1 is useless, but we don't have the same situation as before, where T1 was moved to z, and we could delete the variable based on the match of T1 in both statements. However, since T1 has no next use, we can remove the ADD statement: MOV 5 z Another situation might be: ADD 2 3 T1 JGZ T1 L1 If useless temporary removal preceeded constant propagation, the T1's can't be matched (the second statement is not a MOV). Then constant propagation would give: ADD 2 3 T1 JGZ 5 L1 and the variable T1 would not be removed even though it has no next use. Thus, we do the constant propagation before we do the useless variable removal, which yields: JGZ 5 L1 Note that this substitution would make dead code removal easy (this would also have been done in this project if there was enough time). An additional transformation which was done was to remove NOPs. If a labeled NOP is followed by an unlabeled statement, then the label is moved to the next statement, and the NOP is removed.