CS 5363: TL07 Compiler Project, Part #2

Target Deadline: 1pm, Wednesday, October 3, 2007

Second Assignment

Your second assignment is to modify your parser to create an Abstract Syntax Tree (AST) in memory, output the AST as a graphviz file, add type checking, and annotate the AST nodes that represent (sub-)expressions with their types.

An Abstract Syntax Tree can be thought of as a simplified parse tree (See page 27, 28, and 69 of Scott). Compared to a parse tree, an AST retains only the nodes that reflect the structure and semantics of the program, but drops those that are merely technical details of the concrete syntax. For example, using semicolons to separate or terminate statements is not meaningful to later stages of the compiler, so can be dropped, but what is meaningful is that the different statements are represented as different subtrees, so that should be retained. Similarly, it is not important that an expression is made up of <simpleExpression>s, <term>s <factor>s, etc.; what is important is that the expression is what operations are done on what operands. For example, adding 7 to the product of 3 and 4, could be represented as a tree with its root node being "+", its right child being seven, and its left child being a subtree whose root is "*", and its left child being 3 and its right child being 4. In this way, the tree describes the order the operations can be done (from leaves to roots and how the result of one operation is used as an input to another). Part of the assignment is to figure out what should be kept in the AST and what should be discarded as unnecessary syntactic detail.

You should represent your AST explicitly in memory as a tree data structure and provide a facility to walk the tree and write it to a GraphViz DOT file. (You may find the The Visitor Design Pattern a good way to structure this code.)

In addition, you should add a type checker that walks the AST and verifies the types of expressions, subexpressions, etc., match the TL07 type rules. (It is OK to just go by the informal rules.) The type checker should also annoate the (sub-)expression nodes, etc., in the AST with their type, and modify the graphviz AST output to show the type annotations in some way (perhaps by coloring nodes).

In order to type check the program, you will need to create a symbol table to associate each appearance of a variable with that variables declared type. The best way to do this, is probably to create an entry in a hash table indexed by name and containing information about each declared entity. (For TL07, it is be sufficient to just track the name and type of each declared variable.) And then when the AST generation code comes across a variable in a procedure body, it can look that name up in the symbol table and store a pointer/reference to that symbol table entry in the AST node.

The graphviz AST output should have a ".ast.dot" extensions, and your program should continue to output the parse tree as well. So if the input file is "simple.tl07", the compiler should (by default) output the parse tree as "simple.pt1.dot" and the AST as "simple.ast.dot".

You are also required to adequately test your compiler and submit your test cases along with your source code. Testing is important, and some advocates of Test Driven Development go so far as to recommend that you do not write any code without first creating a test case that fails without that code. I may also provide some test cases.

Deliverables and Submission

Please see revised packaging and submission instructions.

Grading

Please see revised grading notes.

Errata/Clarifications

There may need to be corrections, clarifications, or other modifications to these instructions, you are responsible for monitoring the class web site, monitoring your CS account mailboxes, and listening during lecture for announcements related to this assignment.