CS 5363: The TL05 Language

Project Overview

In this class, you will be writing an optimizing compiler for a toy programming language, which we will call, TL05 (and targeting the SPIM MIPS Simulator.) This project is tentatively divided into six parts. The first three involve, work on the front-end (Scanning and Parsing, type-checking, IR design and generation), and the last three involve work on optimization and code generation.

The TL05 programming language, is based on the P2K language used by Michael Franz in his Advanced Compiler Construction class at the University of California, Irvine. The P2K language was a simplified subset of Pascal, and TL05 has been simplified even further, so that you can experience writing an optimizing compiler without getting burdened by all of the complexities and details of a complete, standard programming language.

Language Features

Lexical Features

The TL05 language, is lexically simple. All lexical items are to be separated by whitespace (space, tab, or return). All identifiers start with a lower case letter, and may contain only numbers and lower case letters. All key words are made of a capital letters. In addition the symbols "[", "]", "(", ")", ":=", and ";" are used.

Data Types

TL05 supports 32-bit integers ("INT"), booleans ("BOOL"), arrays of these, and arrays of arrays of these, etc. The syntax for an array type is "ARRAY num OF ", where num is the size of the array and is the type of the elements of that array. Variables are always declared to be of a particular type. There is also a formal set of type rules

Operators

TL05 has several infix binary operators that work on either integer operands. The multiplication "MUL", division "DIV", modulus "MOD", addition "PLUS", and subtraction "MINUS" produce integer results. The comparison operators (i.e., equals "EQ", not equal "NE", less than "LT", less-than or equal-to "LTE", greater than "GT", and greater-than or equal-to "GTE") all produce boolean results.

Control Structures

TL05 is a structured programming language. The only control structures supported are IF and WHILE statements. Both take a boolean expression that guards the body of the control structure. In the case of an IF statement, the statements after the THEN are executed if the expression is true, and the statements after the ELSE (if there is one) are executed if the expression is false. In the case of the WHILE statement, the loop is exited if the expression false; otherwise if the expression is true, the body will be executed, and then the expression will be re-evaluated.

Assignment

Assignments are a kind of statement rather than a kind of operator. The ":=" keyword is used to separate the left hand side (which is the variable or array element being assigned to) from the right hand side, which is an expression that must be of the same type as the left hand side.

Built-in Procedures

TL05 does not support user-defined functions or procedures, but it does support two built-in procedures WRITEINT and WRITELN that output an integer or a new-line to the console (respectively), and one user-defined function, READINT that reads an integer from the console. The syntax for these is hard-coded into TL05's BNF grammar.

Lexical Elements

Note: If the definition of a lexical element is in quotes, then it is meant to match exactly, the contained string. Otherwise, it is a regular expression. Square brackets in regular expressions are used as an abbreviation for matching ranges of letters. For example, [0-9] matches any digit, and [a-zA-Z] matches any English letter in capital or lower case.

Numbers, Literals, and Identifiers:

Symbols:

Operators:

Keywords:

Built-in Procedures:

BNF Grammar

<program> ::= PROGRAM ident <declarations> BEGIN <statementSequence> END

<declarations> ::= VAR ident AS <type> SC <declarations>
               | ε

<type> ::= ARRAY num OF INT
       | ARRAY num OF BOOL
       | INT
       | BOOL

<statementSequence> ::= <statement> SC <statementSequence>
                    | ε

<statement> ::= <assignment>
            | <ifStatement>
            | <whileStatement>
            | <writeInt>
            | <writeLn>

<assignment> ::= <memCell> ASGN <expression>
             | <memCell> ASGN READINT

<memCell> ::= ident
          | ident LB <expression> RB

<ifStatement> ::= IF <expression> THEN <statementSequence> <elseClause> END

<elseClause> ::= ELSE <statementSequence>
             | ε

<whileStatement> ::= WHILE <expression> DO <statementSequence> END

<writeInt> ::= WRITEINT <expression>

<writeLn> ::= WRITELN

<expression> ::= <simpleExpression>
             | <simpleExpression> OP4 <simpleExpression>

<simpleExpression> ::= <term> OP3 <term>
                   | <term>

<term> ::= <factor> OP2 <factor>
       | <factor>

<factor> ::= <memCell> 
         | lit 
         | LP <expression> RP

Informal Semantics

Errata/Clarifications