Computer Languages History
(Click or use local copy.)
 CS 3723/3721
 Programming Languages
 Spring 2005

 Recitation 2
 Finite State Machines
    Week 2: Jan 24-28
 Due (on time): 2005-01-31  23:59:59
 Due (late):        2005-02-04  23:59:59

Recitation 2 must be submitted following directions at: submissions with deadlines
  • 2005-01-31  23:59:59 (that's Monday, 31 January 2005, 11:59:59 pm) for full credit.
  • 2005-02-04  23:59:59 (that's Friday, 4 February 2005, 11:59:59 pm) for 75% credit.


Overview: A compiler first breaks the sequence of input characters of a program into a similar sequence of basic units called tokens, in a process called lexical analysis or scanning. Examples of tokens are identifiers, keywords, constants, operators, and special characters. Recitation 3 will study the scanning process in more detail. This recitation is focusing on one fairly complex token, a floating point constant. The recitation also has you consider C-type comments, in order to recognize them and eliminate them from the source program.
Writing a recognizer for individual tokens: The most common way to write such a recognizer is to first create a finite state machine (FSM) that describes the token. Then the FSM is converted to a program, either by hand or using a automated software tool to produce the program (such as lex in Unix).


C-style comments: C-style comments use an initial /* and a terminal */ to form a comment, This is a very good illustration of the concept of using a FSM to help construct a recognizer, that is, a program that can tell when it has encountered a comment. A recognizer for these comments is illustrated here: C-style comments.


Floating point constants: This is the most complex token that will be worked with in this and the next recitation. For our purposes here, a floating point constant consists initially of any number (including 0) of digits, followed by an optional decimal point(.), followed by any number (including 0) of digits, followed by an optional exponent part, which is an e or E, followed by an optional sign (+ or -), followed by 1 or more digits for the exponent. There must be at least one digit before or after the decimal point (or both), and if there is no decimal point there must be at least one digit. There must be either a decimal point or the exponent part (or both).

This and the next recitation will include an integer constant as a special case, since it just has no decimal point and no exponent part.

Actual floating point constants in Java can also have an optional trailing f or F for float constants and the optional trailing d or D for double constants. (With no optional trailing letter the constant is double by default.) You should ignore these possibilities. You should also ignore the limitation on the size of the exponent, so the last illegal constant in the Java program below (not illegal in C) would be accepted by your FSM. Any initial optional sign (+ or -) is always treated as a separate operator, and it should not be present in the input for this recitation.

Here is a short program that tries out various floating point constants in C and Java. The program only shows a few of the possibilities. All the constants shown are legal except for the last one in C and the last two in Java (all commented out). Your code should accept all but the last one.

C Program Java Program
// doub.c: try out double constants
#include <stdio.h>

int main() {   /* below "d" stands for "digit"
   printf("1.2e2 = %f\n", 1.2e2); /* normal */
   printf(".2e2 = %f\n", .2e2); /* no d before . */
   printf("1.e2 = %f\n", 1.e2); /* no d after . */
   printf("1e2 = %f\n", 1e2); /* no . */
   printf("1.2E2 = %f\n", 1.2E2); /* cap E */
   printf("1.2 = %f\n", 1.2); /* no e or E */
   printf(".2 = %f\n", .2);/*no e, no d before . */
   printf("1. = %f\n", 1.);/* no e, no d after . */
   printf(  /* too many digits */
      "333333333.222222222e-2 = %16.12f\n",
       333333333.222222222e-2);
   printf("1.0e-444 = %20.16f\n", 1.0e-444);

   /* printf(".e-2 = %f\n", .e-2);*/ /* only . */
}
// Doub: try out double constants
public class Doub {

   public static void main(String[] args) {
      System.out.println("1.2e2 = " + 1.2e2);
      System.out.println(".2e2 = " + .2e2);
      System.out.println("1.e2 = " + 1.e2);
      System.out.println("1e2 = " + 1e2);
      System.out.println("1.2E2 = " + 1.2E2);
      System.out.println("1.2 = " + 1.2);
      System.out.println(".2 = " + .2);
      System.out.println("1. = " + 1.);
      System.out.println( // too many digits
          "333333333.222222222e-2 = "
         + 333333333.222222222e-2);
      // System.out.println("1.0e-444 = " +
      //     1.0e-444);
      // System.out.println("-.e-2 = "+.e-2);
   }
}
C Run and Output Java Run and Output
% cc -o doub doub.c
% doub
1.2e2 = 120.000000
.2e2 = 20.000000
1.e2 = 100.000000
1e2 = 100.000000
1.2E2 = 120.000000
1.2 = 1.200000
.2 = 0.200000
1. = 1.000000
333333333.222222222e-2 = 3333333.3322222223505378
1.0e-444 =  0.0000000000000000
% javac Doub.java
% java Doub
1.2e2 = 120.0
.2e2 = 20.0
1.e2 = 100.0
1e2 = 100.0
1.2E2 = 120.0
1.2 = 1.2
.2 = 0.2
1. = 1.0
-333333333.222222222e-2 = -3333333.3322222224

The two commented-out constants in the Java code produced error messages:

One method for constructing a program to recognize floating point constants starts with a complete FSM that exactly recognizes these constants. Such an FSM is fairly complex, since the many constraints (such "at least one digit before or after the decimal point") make the FSM unnecessarily convoluted. The FSM must have at least a dozen states, along with very complicated transitions. This method is similar to the example for recognizing C-style comments, but more complicated.

An easier approach is to start with a "skeleton" FSM that has the basic parts: initial digits, decimal point, trailing digits, letter e, sign on exponent, exponent. Then the constraints can be handled with boolean flags or by other means, so that if there are no initial digits and the constant starts with a decimal point, the flag tells you that there must be trailing digits, and similarly for other constraints.


Recitation work: You should write a program in either C++ or in Java that recognizes C-style comments, in order to discard them, and that recognizes floating point constants. Of course the program for comments has been given to you, but you need to adapt it to your program and possibly rewrite it. Your program should also calculate the value of the floating point constant as a double. It should do this "from scratch" as described below, rather than using a library function.


Hints on calculating the value of a scanned double: In C it would be possible to use a library function to convert a string of characters representing a floating point number to an actual double. You could use sscanf for example. In ordinary programming it is usually a good idea to use library functions rather than rewrite them from scratch. In this recitation, however, we are studying some of the low-level language mechanisms, and for this assignment you are not to use the library function. Instead, you should realize that given the code: the variable i takes on the integer value 4. At each stage of reading the initial digits of the constant, you can multiply by 10 and add in the next integer value. Then you have to take the "." and the exponent part appropriately into account.

Java also has library functions (such as Double.parseDouble(String)) that should not be used. The same trick above also works in Java.

You should not use the pow function in C or the static Math.pow() function in Java in order to handle the exponent, but you should handle this "from scratch" also. (You don't need the full power of pow, but just some multiplies or divides by 10. On Sun systems, if you did use pow, it is part of the math library, and you would need an extra -lm option on the cc command, as with "% cc -o source -lm source.c".)


What you should submit: Refer to the submissions directions and to deadlines at the top of this page. The text file that you submit should first have Your Name, the Course Number, and the Recitation Number. The rest of the file should have the following in it, in the order below, and clearly labeled, including at the beginning the appropriate item letters: a and b.

 Contents of submission for Recitation 2:

Last Name, First Name; Course Number; Recitation Number (2).

a. The Java or C++ source file or files for your program. Everything should be run together into one file, with reasonable separators between components (the separate source files and the output). The code should be reasonably organized and written, with special emphasis on header comments. (Not much emphasis on inline comments.)

b. You should give the results of a run using the following source file for input. In case your program works for some inputs but not for the complete source file below, you should explain this and use simpler source that your program will handle (for some loss of credit).

    
    62.47 /* simple form */
    14.   /* no digit after . */
    /* 53.53 a number inside a comment */
    .23/***/2e2/* *//**/.84e-2/* some tricky comments */
    345.578e5
    1234.5678
    $ /* optional, in case you need an eof sentinel */
    

Output format: In b above, your output should be the value of each scanned floating point constant. For example, with the input in b above, your output should look something like the following (though the specific output will depend on the particular language and formatting used):

    
    62.47
    14.0
    0.23
    200
    0.0084
    34557800.
    1234.5678
    

End-of-file problems: Sometimes students have trouble recognizing end-of-file and spend a lot of time on this simple problem. Instead, you may use a "$" character to represent end-of-file (as shown above), or you may use the system EOF that is provided. (So do not get hung up on end-of-file.)


Key ideas: Finite state machines (FSMs) can be used to aid in constructing a program to recognize tokens of a programming language such as floating point constants or C-style comments.


Revision date: 2005-01-22. (Please use ISO 8601, the International Standard.)