File: readme Author: Edward A. Green Description: Readme file for Compiler Design (66.648): Project 1 Files: readme This file proj1.lex The LEX file for the project (lexical analysis definition) proj1.yac The YACC file for the project (syntactic analysis definition) example.p An example Pascal program which can be parsed by this parser Makefile The make file for building a.out, the executable parser The files for this project construct a parser for a subset of the Pascal language defined in the Aho, Sethi, and Ullman text, pp. 746-748. The program generates two types of output: a set of triples and the symbol table for each program unit. This file principly details the operation of the syntax analyser. The lexical analyser is rather straightforward, and won't be discussed in detail. The alphabetic identifiers are assumed to be case insensitive; all alphabetics are converted to lower case by the lexical analyser. The quads which are generated involve only the following instructions: ADD - add arg1 and arg2 giving arg3 CALL - call a subroutine DI - integer divide arg1 by arg2 giving arg3 DIV - real divide arg1 by arg2 giving arg3 JGZ - jump to the statement labeled with arg3 if arg1 is positive JMP - jump unconditionaly to the statement labeled with arg3 JZ - jump to the statement labeled with arg3 if arg1 is zero MOD - place the remainder of arg1 divided by arg2 in arg3 MOV - move arg1 to arg3 MUL - multiply arg1 times arg2 giving arg3 NOP - no operation PUSH - arg2 will be the next parameter to the next called subroutine POP - arg2 is a value returned by a function subroutine RTN - return from a subroutine SUB - subtract arg3 from arg2 giving arg4 [] - resolve an array reference: assign arg1[arg2] to arg3 []= - assign to an array: assign arg3 to arg1[arg2] All other operators, including all boolean and relational operators, are synthesized from the above operators. Boolean values are integers, such that zero corresponds to false and non-zero corresponds to true. There are three kinds of stacks used in the YACC program. The parser stack, YYSTYPE is a pointer to void. This was done so that the values returned by the actions could be of two different types. The first kind of type is a pointer to char, so that identifiers can be passed easily. The second type is a list of character pointers (strings), so that a set of identifiers could be colleted in identifier lists for declaring variable types (for example). By casting pointers as either pointers to char or pointers to STR_LST, an action can return either type of variable. All actions that produce a given non-terminal will return only one of these types, so higher level actions can know how to retrieve the items from the stack (whether a character string or a list of character strings). Another stack is the symbol table. It is actually a doubly linked list. The following pair of diagrams describes the construction of the table for a program with one subroutine: Symbol table during parsing of the subroutine: [program]-[var1]-[var2]-[subroutine]-[s_parm1]-[s_parm2]-[s_var1] |__________| parmlist pointer Symbol table after parsing subroutine, during parsing of main program: [program]-[var1]-[var2]-[subroutine]-[s_var1] | |--[s_parm1]-[s_parm2] Note that the subroutine parameters are retained in the parmlist in order that parameters may be checked for type when the subroutine is called by the main program. The third stack is for saving statement labels for control flow statements. It uses the same structure type as that used by the parser stack, the pointer to a list of character strings. An enhancement to the grammer given in the text is the inclusion of the "if..then" as well as the "if..then..else" construct, although this adds a shift/reduce warning generated by YACC. Enhancements that would be nice to make (but there wasn't enough time): 1) Allow statements before an "end" of a compound statement to end in a semicolon; these lead to syntax errors. 2) Syntax errors are sent to stdout, along with the quads and the symbol table. The should probably be sent to stderr. 3) A syntax error now ends syntax analysis. More work should be done with error handling. To generate the program file (a.out), just type "make". To test the program, type "a.out < example.p". You need all of the above files in your current directory to make the executable and test it. This make was tested on both turing and on RCS.