File: readme Author: Edward A. Green Description: readme file of Compiler Design (66.648) Project 3: generation of executable assembler code. FILES README FILES readme this file readme1 the readme file for project1 readme2 the readme file for project2 PROGRAM CODE block.c c code for handling program blocks code.c c code for handling quads connect.c c code for drawing the basic block connection table constant.c c code for storing temporary variable types, etc. gen_code.c main program file for generating program code labels.c c code for handling program labels, jmps, calls, etc. leads.c c code for calculating leading lines of basic blocks memory.c c code for calculating memory requirements, symbolic addresses mod.c c code for handling the MOD structure (modified/used data) next_use.c c code for calculating next-uses optimize.c c code for doing constant propagation and removing useless var. register.c c code for handling register allocation symbols.c c code for storing the symbol table util.c general c routines. proj3.c main program file. proj3.h include file for project. proj1.yac revised project 1 syntactic analyzer. proj1.lex revised project 1 lexical analyzer. COMMAND FILES makefile make file for generating two executable files, phase1 & phase2 comp a Unix shell routine for compiling a minipascal program MINIPASCAL TEST PROGRAMS test1.p the GCD program from the text (recursive) test2.p a program showing parameter passing is done in correct sequence test3.p testing passing array elements and real->integer coercion test4.p a program testing a[a[i]] test5.p a program testing passing whole arrays test6.p a program that does an insert sort of array data (<=20 items) test7.p a program that recursively merge sorts array data (<=20 items) test8.p a program for testing all arithmetic operations on integers. test9.p a program for testing mixed-mode math. OPERATION The system can be made and run successfully on a SUN system with Lex, Yacc, cc, and gcc (Turing, for example). Running the make file generates two executable files, phase1 and phase2. Phase1 takes a mini-Pascal program (Aho, et. al., pp. 746-748) in standard input and generates a quad program with a symbol table. Phase2 generates assembler code that will run on the Sequent B8 computer. Included in the distribution is a shell script which car run both phase1 and phase2 with one command: % comp myfile // assumes myfile.p is the source code This command generates two files, myfile.q (the quad file from phase 1) and myfile.s, the assembler code for the b8. After phase 1 runs, the quad file is dumped to stdout; any syntax errors will show up in the quad file, and the quad file can be used to help pinpoint the location of syntax errors. CHANGES TO PHASES 1 AND 2 (FROM PROJECT 2) A few bugs were revealed while developing project 3, which were fixed. These are the only changes to the first two phases (up through code optimization). The optimization from phase 2 are done before the code is generated. CODE GENERATION The data structure developed for the optimization was used for generating the code. The development revolved around 3 new program files: gen_code.c the central code generation routines. memory.c the routines for calculating the memory usage required by units. register.c the routines for managing the usage of registers and tempories. MEMORY ALLOCATION The gen_code routine is called from main() for generating the code for the entire program. At the start of each program block, the memory requirements for program variables and temporaries are calculated. The general memory management strategy is to use registers as much as possible for temps, with caller-saved registers when subroutine calls are encountered. The registers r0, r1, f0, and f1 are used for "intra-instruction" working storage; they are not reserved between quad instructions. With these quad instructions, as soon as a temp is used, it is dead, and thus it's register may be freed. In all candor, it is obvious by looking at the assembler code generated that a lot of register-to-register moves could be eliminated by more intelligent code generation. The b8 has a very easy-to-use assembler language that facilitates activation record creation and destruction for subroutine calls. The memory calculation for each subroutine is used when generating the ENTER command (which allocates the subroutines memory), and for the ret command, which deallocates the memory from the stack at the end of the subroutine. In the subroutines symbol table, the location assignment for each variable is made during the memory calculation. These addresses are relative to the b8's frame pointer. The passed parameters are also assigned in the memory calculation routine. Since the language is static and subroutines are not nested, the main program's variables are stored as global variables, with ".comm" assembler directives. This makes the address of global variables a little easier to resolve. TRANSLATING CODE Once the start of a subroutine is written, the quad's of code are analyzed. As noted above, the registers r0, r1, f0, f1 are used for each quads workspace. All code is generated by routines in the gen_code.c routines (with exception of spilling registers; this is done in register.c file routines). There are several general groups of instructions: calls and jumps returns push's and pull's arithmetic instructions moves array instructions The code generation is relatively straightforward, with a few exceptions: 1) The statement label for the main program is changed from "start" in the quad code to "_main:", which conforms to the gcc main program unit. 2) When calling "read" or "write", the type of the last push is tested, and an additional push of a "0" (integer) or "1" (real) is made so that the read or write routine can use the correct format. When the "call exit" statement is encountered, it is ignored (the return from the main program is generated at the end of the code). 3) Upon entry to a subroutine, it is assumed that all registers are free. Therefore, all registers must be spilled to the caller's local memory prior to the call. Moreover, if a temp is to be passed, the temp must also be in memory and it's address pushed. Therefore, registers are spilled before the first push prior to a subroutine call. This code is generated in a subroutine in the "registers.c" file. 4) Return values from functions are "standard types", that is, either an integer or a real value. These values are returned in r0 or f0, respectively. 5) Arrays present an interesting problem in terms of returning an address or its data. For example, in the statement: read (a[i]); the address of a[i] is needed for passing data to the element. But in a[i]:=j; the data in a[i] should be sent to j. But both of these instructions have the same quad opcode: [] a i t1 PUSH t1 CALL read and [] a i j The way this was resolved was to label all temps created with the [] opcode with an 'a' and all other created temps a 'd', and placing the address of the array element in the temp's location. When using temps, this flag is detected, and if the flag is a, the temp is loaded into either r0 or r1, and the symbolic address used is then either '0(r0)' or '0(r1)' respectively, that is, an indirect address. If the assignment is not to a temp, the data is transferred. 6) Array values are referred by loading the index into r0, and adjusting the index in r0 by subtracting the start index of the array. The above policies allow array expressions like a[a[i]] to work correctly. 7) Real <-> integer coercion: If a real value is assigned to an integer, the fraction is truncated. Mixed-mode calculations are allowed, and inter- mediate results are kept (as much as possible) as real values. The instructions are dependent on the data types. C programs were run to see how the b8 worked with mixed-mode expressions. 8) The read and write routines were coded by hand and are included in the generated code.