This parser uses a recursive function corresponding to each grammar rule (that is, corresponding to each non-terminal symbol in the language). For simplicity one can just use the non-termianl as the name of the function. The body of each recursive function mirrors the right side of the corresponding rule. One decides which function to call based on the next input symbol.
Perhaps the hardest part of a recursive descent parser is the scanning: repeatedly fetching the next token from the scanner. It is tricky to decide when to scan, and the parser doesn't work at all if there is an extra scan or a missing scan. In the examples here and in the next recitation, all tokens are just single characters, so the scanner is simple, but it is still hard to do the scanning at the right places in the program.
Initial Example:
P ---> E E ---> E + T | E - T | T T ---> T * S | T / S | S S ---> F ^ S | F F ---> ( E ) | char
In order to eliminate left recursion, one simple method is to introduce new notation: curley brackets, where {xx} means "zero or more repetitions of xx", and parentheses () used for grouping, along with the or-symbol: |. Because of the many metasymbols, it is a good idea to enclose all terminals in single quotes. Also put a '#' at the end. The resulting grammar looks as follows:
P ---> E '#'
E ---> T {('+'|'-') T}
T ---> S {('*'|'/') S}
S ---> F '^' S | F
F ---> '(' E ')' | char
Now the grammar is suitable for creation of a recursive descent parser.
Notice that this is a different grammar that describes
the same language, that is the same sentences or strings
of terminal symbols. A given sentence will have a similar parse
tree to one given by the previous grammar, but not necessarily
the same parse tree.
The Grammar: Consider the following expanded version of this language:
M ---> { ( S | D ) } '#'
S ---> I | W | A | P | C | G
D ---> '(' id '(' [ id { ',' id } ] ')' { S } ')'
I ---> '[' E '?' { S } ':' { S } ']' | '[' E '?' { S } ']'
W ---> '{' E '?' { S } '}'
A ---> id '=' E ';'
P ---> '<' E ';'
G ---> '>' id ';'
C ---> '<' ( 'B' | 'T' | 'N' ) ';'
E ---> Q [ ('&' | '|') Q ]
Q ---> R [ ('<' | '>' | '<=' | '>=' | '==' | '!=' ) R ]
R ---> T { ('+' | '-') T }
T ---> U { ('*' | '/' | '%') U }
U ---> F '^' U | F
F ---> ['+' | '-' | '!'] ('(' E ')' | id | num |
id '(' [ E { ',' E } ] ')' )
id ---> letter { letter | digit }
num ---> digit { digit }
For a more colorful grammar that might be easier to understand, see here.
The terminal "letter" stands for a single lower-case letter, and "digit" stands for a single digit.
Just to help with understanding, here is the intuitive meaning of each of the above non-terminals:
| Symbol | Meaning |
|---|---|
| M | Main Program |
| S | Statement |
| D | Function Definition |
| I | If-Then-[Else] Statement |
| W | While Statement |
| A | Assignment Statement |
| P | Put or Print (integer) |
| C | Print Character |
| G | Get (integer) |
| E | Expression (logical or arith) |
| Q | Relational Expression (without | or &) |
| R | Arithmetic Expression (without relational ops) |
| T | Term (without + or -) |
| U | Ugly Term (without * or / or % either) |
| F | Factor (parethsized, with unary + or -) |
Also, the [ ] used for an if-then-else is often used in BNF for an optional item, while the { } used for a while is used above in the BNF for zero or more repetitions.
You are to write a recursive descent parser for this language. To keep things simple, you can initially make all the tokens single characters, so that the scanner (lexical analyser) can be very simple, as shown in the earlier examples. Later you can add identifiers that are more than one letter, and integer constants more than one digit.
A true parser will input a sentence and either say that the sentence was legal or not. However, the parser is actually carrying out a complete traversal of the parse tree by the function calls and returns (as will be illustrated in class). The examples above have extra temporary output illustrating the calls and returns, so you can have confidence that your parser is working correctly. You should have temporary output also, though it does not have to be the same as the above examples.
What to turn in: Turn in a working version of your parser. You should start with very simple input as you are developing your parser -- perhaps just a single assignment statement at first, and then other single statements.
f = 0; g = 1; n = 0;
{ n - 8*5 ?
h = f + g;
< n; < B; < f; < B;
[ f%2 ? < 1; : < 2; ]
< N;
n = n + 1;
f = g; g = h;
}
#
To understand what this program does, you need to know the semantics of the P (Put an integer using <) and C (Put a character using <) constructs. < prints the value of the expression following it as an integer, without any extra blanks or newlines. < also prints a special character, chosen depending on the upper-case letter following it: B for a blank, T for a tab, N for a newline, etc.
The above program prints the integers from 0 to 39, followed on the same line by the corresponding Fibonacci number, followed by a 1 if the Fibonacci number is odd, and a 2 if it is even.
f = 1; g = 2; n = 3; > m;
{ m - n ?
< n; < T; < g; < T;
j = g; d = 2; t = 1;
{ t ?
[ j%d ? e = 0; : e = 1; ]
{ e ?
j = j/d; < d; < B;
[ j%d ? e = 0; : e = 1; ]
}
[ j - 1 ? t = 1; : t = 0; ]
[ d - 2 ? d = d + 2; : d = 3; ]
}
< N;
n = n + 1;
h = f + g;
f = g; g = h;
}
#
This program produces the first m Fibonacci numbers along with their prime factorizations, where a value for m is read from the terminal using the > construct. Here is one possible result of running your parser with the above input: Parser output.
( g (x, y) [y ? g = g(y, x%y); : g = x;] ) > a; > b; c = g(a, b); < a; < B; < b; < B; < c; < N; #
This program reads two integers, calculates their gcd, and prints it.
> a; > b; < a; < B; < b; < N; [ a < b ? < 1; : < 0; ] [ a > b ? < 1; : < 0; ] [ a <= b ? < 1; : < 0; ] [ a >= b ? < 1; : < 0; ] [ a == b ? < 1; : < 0; ] [ a != b ? < 1; : < 0; ] #
This program reads two integers, with the output depending on the integers read.
> a; > b; < a; < B; < b; < N; < a & b; < N; < a | b; < N; #
> x; > y;
< x; < B; < y; < N;
u = 1; v = 0; w = x; a = 0; b = 1; c = y;
{c != 0 ?
q = w/c;
r = u - a*q; s = v - b*q; t = w - c*q;
u = a; v = b; w = c;
a = r; b = s; c = t;
< q; < T; < u; < T; < v; < T; < w; < T;
< a; < T; < b; < T; < c; < N;
}
< u; < B; < v; < B; < w; < N;
#
> a; > b;
< a; < B; < b; < N;
x = a; y = b; z = 1; r = 1;
{ (y > 0) ?
{ (y%2 == 0) ?
x = x*x;
y = y/2;
< r; < T; < x; < T; < y; < T; < z; < N;
}
z = z*x;
y = y - 1;
< r; < T; < x; < T; < y; < T; < z; < N;
}
< a; < B; < b; < B; < z; < N;
#
Key ideas: A recursive descent parser is particularly easy to write by hand and requires no special software tools as the other parsers do. It is easy to "cheat" with this parser, doing whatever you like. The capability of "cheating" can also lead to poorer code, particularly for a compiler that has to evolve over time.