File: readme
Author: Edward A. Green
Description:  readme file of Compiler Design (66.648): Project 2

FILES

readme		this file
block.c		c code for handling program blocks
code.c		c code for handling quads
labels.c	c code for handling program labels, jmps, calls, etc.
leads.c		c code for calculating leading lines of basic blocks
next_use.c	c code for calculating next-uses
optimize.c	c code for doing constant propagation and removing useless var.
symbols.c	c code for storing the symbol table
util.c		general c routines.
proj2.c		main block for project 2.
proj2.h		include file for project.
readme1		the readme file for project1
proj1.yac	revised project 1 syntactic analyzer.
proj1.lex	revised project 1 lexical analyzer.
makefile	make file for generating two executable files, phase1 & phase2
example.p	example file for testing the programs.


OPERATION

The system can be made and run successfully on a SUN system with Lex, Yacc,
cc, and gcc (Turing, for example).  Running the make file generates two 
executable files, phase1 and phase2.  Phase1 takes a mini-Pascal program 
(Aho, et. al., pp. 746-748) in standard input and generates a quad program 
with a symbol table.  Phase2 generates the optimization output required by 
project 2.  To run these programs on an input file "myfile.p", enter:

% phase1 < myfile.p > myfile.q
% phase2 myfile.q			// the output is stdout


CHANGES TO PHASE 1

A modification in the output to phase 1 was made.  Now the symbol table 
is encoded as a prefix to the quads for which it applies.  The following 
is a sample of a symbol table (with comments):
 
*B fill				// Start of a new block named "fill"
*V fill j integer 52		// an integer defined in "fill" named "j"
*V fill a real [1..5] 28	// a real array defined in "fill" named "a"
*R fill fill 2			// a procedure named "fill" with 2 arguments
*P1 real [1..5] 28		// the first parameter of fill is a real array
*P2 integer 48			// the second parameter of fill is an integer
*V example a real [1..5] 8	// (note that program offsets follow variable
*V example j integer 4		//     definitions. offset of 1st "j" is 52)
.
.
.
*F sum sum integer 1		// A function named "sum" returns an integer
*P1 integer 56			// "sum"s only parameter is an integer, off. 56
.
.
.
The quads generated here are identical with those generated by the first
project (see readme1).


INTRODUCTION TO PHASE 2

This project takes the output file from phase1 and performs two optimizations
on the quads.  The two optimizations are CONSTANT PROPAGATION and USE VARIABLE
REMOVAL.  The output does not follow the adopted conventions for the quad code
used above, but generates the output required by the project.

The output has the following sequence:
1) basic block connection table
2) a dump of the input code showing a more readable form of the symbol table,
   and each statement of the program with a comment showing the statement
   number and the block it belongs to.
3) The calculated next use table.  This shows the variables (including temps)
   that are modified and used by each statement.  For the variables modified,
   the number of the statement in which the variable is next used appears in 
   parenthesis.  If this number is -1, the variable is not used again inside 
   the basic block.
4) A dump of the output code after the optimizations are done.  This dump
   has the same format as the first dump.  It will be noticed that some
   statements have been changed, and some statements have been removed.


OPTIMIZATIONS

The two optimizations performed are constant propagation and useless variable
removal, in that order.  It was found that propagating constants sometimes
made some variables useless (see the last example, below).

The optimizations were done on the range of basic blocks.  The text indicated
that "if no live-variable analysis has been done, we can assume all non-
temporary variables are live on exit" of a basic block.  Therefore, the only
useless variables to be eliminated were those temporaries isolated to one
basic block.  Since some temps in this quad scheme span more than block, an
initial search for temps isolated to one block is done.  Only these variables
will be considered for elimination.  There are two ways a variable can be
eliminated.  First, the following transformation is made:

ADD	x	y	T1
MOV	T1		z

which is code for z=x+y.  This is changed to:

ADD	x	y	z

The next elimination is done when a statement modifies a local temporary 
that has no next use in the block.  The statement is eliminated.  This
type of elimination crops up when the constant propagation preceeds the 
useless variable removal.  For example, the input statements might be:

ADD	2	3	T1
MOV	T1		z

which is code for z=2+3.  When the constant propagation is done, we have:

ADD	2	3	T1
MOV	5		z

Now the variable T1 is useless, but we don't have the same situation as before,
where T1 was moved to z, and we could delete the variable based on the match
of T1 in both statements.  However, since T1 has no next use, we can remove 
the ADD statement:

MOV	5		z

Another situation might be:

ADD	2	3	T1
JGZ	T1		L1

If useless temporary removal preceeded constant propagation, the T1's can't 
be matched (the second statement is not a MOV).  Then constant propagation
would give:

ADD	2	3	T1
JGZ	5		L1

and the variable T1 would not be removed even though it has no next use.  Thus,
we do the constant propagation before we do the useless variable removal, which
yields:

JGZ	5		L1

Note that this substitution would make dead code removal easy (this would
also have been done in this project if there was enough time).

An additional transformation which was done was to remove NOPs.  If a 
labeled NOP is followed by an unlabeled statement, then the label is
moved to the next statement, and the NOP is removed.