Deadline: 6:45pm, Thursday, November 19, 2009
Review 1: 6:45pm, Thursday, October 15, 2009 (AST + Type Checking)
Review 2: 6:45pm, Thursday, October 29, 2009 (ILOC + CFG)
In this phase of the project you will complete a non-optimizing compiler for the core TL09 compiler. The resulting compiler should translate TL09 source code, producing output at several intermediate stages, into MIPS assembly code executable in the SPIM simulator
You should modify your parser to produce an Abstract Syntax Tree (AST) as a tree data structure in memory. This is a requirement.
An Abstract Syntax Tree can be thought of as a simplified parse tree. Compared to a parse tree, an AST retains only the nodes that reflect the structure and semantics of the program, but drops those that are merely technical details of the concrete syntax. For example, using semicolons to separate or terminate statements is not meaningful to later stages of the compiler, so can be dropped, but what is meaningful is that the different statements are represented as different subtrees, so that should be retained. Similarly, it is not important that an expression is made up of <simpleExpression>s, <term>s <factor>s, etc.; what is important is that the expression is what operations are done on what operands. For example, adding 7 to the product of 3 and 4, could be represented as a tree with its root node being "+", its right child being seven, and its left child being a subtree whose root is "*", and its left child being 3 and its right child being 4. In this way, the tree describes the order the operations can be done (from leaves to roots and how the result of one operation is used as an input to another). Part of the assignment is to figure out what should be kept in the AST and what should be discarded as unnecessary syntactic detail.
Hint: You may find it useful to provide a Visitor interface for processing the different subclasses of AST nodes in the in-memroy data structure. (In this case, one Visitor might be for printing out graphviz nodes for AST Nodes. Another might be for decorating the AST with types and doing type checking.)
Once you have generated the AST. The compiler should traverse the AST to find the type of each subexpression and determine whether there are any violations of the TL09 type rules.
Hints: In order to type check the program, you will need to create a symbol table to associate each appearance of a variable with that variables declared type. The best way to do this, is probably to create an entry in a hash table indexed by name and containing information about each declared entity. (For TL09, it is sufficient to just track the name and type of each declared variable.) And then when the AST generation code comes across a variable in a procedure body, it can look that name up in the symbol table and store a pointer/reference to that symbol table entry in the AST node.
The compiler should traverse the abstract syntax tree and translate the program into basic blocks of three-address ILOC instructions [Section 5.4.2 and Appendix A of Cooper] according to the TL09 semantics. You may want to add instructions to represent the READINT and WRITEINT operations. As in the examples found in Cooper, each basic block should end in a conditional branch or unconditional branch (a.k.a., jump). The ILOC code can use an infinite number of virtual registers.
Hints:
As the last step, the "back end" of the compiler should translate the program into executable MIPS assembly code. This involves selecting the appropriate MIPS assembly language instructions, modifying the code so that it executes with a finite number of registers, and outputing an ASCII file containing MIPS assembly code that is executable with SPIM.
The ILOC and MIPS instruction sets are quite similar, and depending on the subset of ILOC which the IR contains, instruction selection may be done through a simple 1:1 substitution of MIPS instructions for ILOC instructions. (Please refer to Appendix A of Computer Organization and Design: The Hardware/Software Interface for a description of the MIPS Instruction Set Architecture (ISA), the MIPS assembly language, and the SPIM simulator.) It may be necessary, however, to handle some instructions in a slightly more complicated manner. (For example, the MIPS load-from-memory instruction takes a "register + a 16 bit constant offset" as its address, and it is not possible to map ILOC's loadAO instruction directly to it.) Note: SPIM provides simulated system calls for printing out characters; you may find the read_int, print_int, and print_char system calls useful for implementing TL09's READINT and WRITEINT operations. You may wish to use an instruction selection scheme based on the peephole optimization to handle these cases. (Cf. Section 11.4 of Engineering a Compiler, but you can probably simplify the scheme described there.)
In order to generate executable MIPS code, however, you will have to generate code that uses only those registers available in the MIPS ISA. Normally this would be done using graph-coloring register allocator with spilling; implementing this will be an extension. In order to get executable code without a register allocator, you can add a "dummy register allocator pass" that simply sets aside dedicated registers for the input and output of each MIPS instruction, and creates one global variable (in the ".data" segment or as an offset from the frame pointer) for each IR temporary. You would then place a store instruction to write to that global variable, each time it the temporary is assigned to, and place load instructions to transfer the data back from the global variables into the dedicated registers for each MIPS instruction. This implementation would be quite inefficient, but it will produce correct code. See the revised simple.s with comments, as an example of the output this process would produce. (An alternative, which you may also pursue would be to only use a limited set of registers in the ILOC code and to instead assign a fixed memory location to each variable/temporary and perform loads/stores as appropriate in the ILOC code.)
The compiler program should expect a TL09 input file,
<basename>.tl09, to be specified as
command-line argument when it is invoked. (See langauge-specific
submission instructions in the submission
instructions for details.)
If the input file is not a syntactically-valid program (i.e., if the program contains a lexical element that is not a legal token or does not match the BNF grammar), the compiler should ouptut an error message to standard error (System.err in Java) containing the text "SCANNER ERROR", the text "PARSER ERROR", or the text "SYNTAX ERROR". (There is no constraint on which you use. The intent is that you might want to distinguish between scanner and parser errors or you might want to just call everything a syntax error.)
The compiler may optionally, output a parse tree in graphviz dot
format to the file: <basename>.pt.dot
If there is no syntax error, the compiler should output an abstract
syntax tree in graphviz dot format to the AST output file,
<basename>.ast.dot. The nodes of the abstract
syntax tree that serve as the roots of subexpressions whose type is INT
should have a fillcolor of "/pastel13/3" (a light green)
and those whose type is BOOL should have a fillcolor of
"/pastel13/2" (a light blue). If there is a type error, the
compiler should output an error message to standard error that contains
the text "TYPE ERROR" and should also color the AST
node associated with that type error (e.g., the node of an operation whose
operands have incorrect types) "/pastel13/1"
(a pink) in the AST output file.
If there is no syntax or type error, the compiler should also output
a control flow graph in graphviz dot format to the file,
<basename>.cfg.dot. The control flow graph
should be based on a translation of the original program into an ILOC
three-address code. The nodes in the graph should be the basic blocks
of the ILOC program representation. In the graphviz output, each of
the nodes of the graph should be labeled with the instructions found
in that basic block.
The final output of the compiler,
SPIM-compatible MIPS
assembly code that faithfully implements the semantics of
the original source program, should be written to the file,
<basename>.s.
So for a correct TL09 source program,
<basename>.tl09, the compiler should produce
a Type-annotated Abstract Syntax Tree
(<basename>.ast.dot), a control flow graph
(<basename>.cfg.dot), and a MIPS assembly
code file (<basename>.s). Optionally,
it may also produce a parse tree,
<basename>.pt.dot.
The entire compiler should be complete and operating correctly by the final deadline. "Rough drafts" of the compiler are due at two intermediate deadlines, "Review 1" and "Review 2". These deadlines exist to encourage students to make timely progress and to provide an opportunity for the TA and/or instructor to review the code and provide feedback. Students should aim to have at least the Parsing, AST generation, and type checking and type-annotated AST output completed by "Review 1". They should aim to have at least Parsing, AST generation, type checking, AST output, ILOC translation and CFG ouptut completed by "Review 2".
At the review dates, a grade will be assigned as described in
the grading rubric. Specifically students
will receive points for correctly completing the items described in
the paragraph above (progress), and for submitting readable
source code (source code readability), and properly tested, documented
and packaged code (build/packaging). The progress scores will be assessed
by examining the ouptuts using the General Correctness and Completeness
rubric. If the compiler outputs a parse tree dot files
(<basename>.pt.dot), this (along with syntax
errors when appropriate) will be used for assess parser progress.
Otherwise, correctness of parsing will be assessed based on the AST
generation.
Please refer to the Grading Breakdown and Rubric for details on what criteria will be applied in determining grades.
Please refer to the the
submission instructions for information on how to prepare the
subversion repository containing your source code for grading.
Please note, the README.R1 and README.R2
files that should be prepared for the Review 1 and Review 2
deadlines.
You are also required to adequately test your compiler and submit your test cases along with your source code and document the current state of your compiler based on your own testing.
The following some sample input TL09 programs and (for some) their corresponding output files. Nota bene: it is not necessary to match the text of the dot files exactly. It is only necessary to produce dot files that visually complies with the specification above when rendered with graphviz. For MIPS, it is only necessary to produce programs that behave in accordance with the TL09 semantics when executed.
The input file will be specified as a command-line argument. Output file names should be based on the input command-line argument. Details about how the program will be invoked are contained in the submission instructions.
There may need to be corrections, clarifications, or other modifications to these instructions, you are responsible for monitoring the class web site and listening during lecture for announcements related to this assignment.