Submission Deadline: 5pm, September 15, 2005
In this class, you will be writing an optimizing compiler for a toy programming language, which we will call, TL05 (and targeting the SPIM MIPS Simulator.) This project is tentatively divided into six parts. The first three involve, work on the front-end (Scanning and Parsing, type-checking, IR design and generation), and the last three involve work on optimization and code generation.
The TL05 programming language, is based on the P2K language used by Michael Franz in his Advanced Compiler Construction class at the University of California, Irvine. The P2K language was a simplified subset of Pascal, and TL05 has been simplified even further, so that you can experience writing an optimizing compiler without getting burdened by all of the complexities and details of a complete, standard programming language.
You may choose C, C++, or Java to implement your compiler, but it must compile and run on the machines in the Computer Science Department's Linux lab. (If you choose C or C++, you will also need to create a makefile that will build an executable from your source files.) The entire compiler must be your own, independent work, and each assignment will build on the work of the previous assignments.
You're first assignment is to write a Scanner and Parser for the TL05 language. A BNF grammar and a list of lexical items for the TL05 language is provided. If you wish, you may use an automated tool, such as lex, to create the code for your scanner, but you may also find that TL05 is simple enough that it is just as easy to write your scanner by hand.
The TL05 language is intended to be LL(1), and you are required to write, "by hand," a recursive-descent top-down parser for it. Your parser should have one method or function corresponding to each non-terminal in the language's grammar, but you may find that you need to modify the BNF grammar in order to write a backtrack-free parser (have a look at the algorithms in Figures 3.4 and 3.6 of your text book for ideas of the kinds of modifications you might have to make).
For a lexically or syntactically invalid program, your scanner or parser should output an appropriate error message. For a syntactically valid input program, the output of your parser should be an abstract syntax tree (AST) for the input program. (An AST is a parse tree that may have been simplified in order to better reflect the semantic structure of the program rather than the BNF grammar.) The individual functions making up the recursive descent parser should return a data structure containing the portion of the AST/parse-tree corresponding to the derivation from that non-terminal. You should also implement a facility for outputting the AST data structure as a Graphviz DOT file, so that it can be rendered and viewed graphically.
You are also required to adequately test your compiler and submit your test cases along with your source code. Agile programming techniques recommend that you do not write any code without first creating a test case that fails without that code. I may also provide some test cases.
When you have completed your assignment, you should create a tar archive or zip file containing your source files (with the correct package-based directory structure if you're using Java), the makefile (if your compiler is in C or C++), and your test cases. These should be attached to an email to vonronne@cs.utsa.edu. The subject of this email should containing the text "CS 5363 Submission #1", and the body should contain a "write up". This "write up" must contain:
As a matter of policy, late work will not be accepted. Partially completed assignments will, however, be evaluated for partial credit. Projects will be evaluated using a rubric similar to the following:
In addition, a penalty of up to 50% may be deducted if the program doesn't compile and run on the machines in the department's Linux lab.
There may need to be corrections, clarifications, or other modifications to these instructions, you are responsible for monitoring the class web site, monitoring your CS account mailboxes, and listening during lecture for announcements related to this assignment.
9/12/2005: Your parser implementation should be backtrack-free, but your final AST should not reflect any left-factoring you had to do to create an LL(1) grammar.