LESSON 2: Working with line graphs

FOCUS QUESTION: How do I display trends in data?

Contents

EXAMPLE 1: Load the New York City contagious disease data set

   load NYCDiseases.mat;     % Load the NYC disease data

Questions Answers
How do I start a new cell? Create a line starting with two percent signs followed by a space. After placing the cursor where you want to insert the cell, you can either type these characters on the keyboard or hit the Insert cell break icon Insert cell break icon.
What if I forget the space? The lines that follow will be part of the previous cell.
What does the single percent sign designate? The % marks the start of a comment.
Does MATLAB execute comments? No, comments are for the benefit of the user and are ignored during execution.
What kind of files have a .mat extension? MATLAB MAT-files have a .mat file extension. They are used to store MATLAB variables in an efficient manner.
How many variables can you read in with a single load? You can read in all of the variables that are stored in the .mat file.
Is the format of measles.mat different from that of the count.dat file of Lesson 1? Yes, count.dat was an ordinary text file that you could read using Word Pad or a browser. MATLAB created a single variable to hold the data of this file based on the name of the file. In contrast, measles.mat was created in MATLAB and holds several variables. You can't open this file using Word Pad or another text editor.
What if I only want to read particular variables from a MAT-file? You can read particular variables of the MAT-file by listing them after the file name in the load command.

EXAMPLE 2: Define appropriate variables for analysis

   measles1931 = measles(1, :);   % Measles cases in 1931 (row 1 of measles)
   measles1941 = measles(11, :); % Measles cases in 1941 (row 11 of measles)
   measlesMay = measles(:, 5);   % Measles cases in May (column 5 of measles)
   measlesSpring = measles(:, [3, 4, 5]); % Measles for March, April, May

Questions Answers
What happens if I omit the semicolon after the first statement? MATLAB outputs the values of the result (measles1931 = measles(1, :)) in the Command Window. Notice that if you take out the semicolon, you will see an orange line underneath the equals sign (=). This orange underline, which is called a warning, designates a potential problem with your script. Red underlines mark actual errors that you must fix. Place your cursor over the underline to see what the problem is and how to fix it.
What does the equals sign do? The equals sign (=) is the assignment operator. MATLAB computes a value from the expression on the right and assigns the result to the variable on the left.

EXAMPLE 3: Plot the measles cases for 1931

   figure                       % Create a new figure window
   plot(measles1931);           % Plot 1931 measles (y-axis) against 1..12
   xlabel('Month')              % Always label your axes
   ylabel('Case count')
   title('Measles cases NYC: 1931')

Questions Answers
What is xlabel? The xlabel identifier names a MATLAB function that sets the x-axis label on the current axis. Similarly, the ylabel function sets the y-axis label on the current axis, and the title function sets a title over the current axis.
Why call a function such as xlabel rather than editing the x-axis directly using the plot tools? Calling xlabel, ylabel, and title document the purpose of the graph in your script as well as labeling the graph.
Why is the word Month enclosed in single quotes? The single quotes specify that Month is to be treated as a message or string. The actual label is the word Month. If you omitted the single quotes, MATLAB would treat Month as a variable and use its value (rather than its name) as the x-axis label.

EXAMPLE 4: Plot the measles cases for the month of May

   figure                    % New figure
   plot(years, measlesMay);  % Plot May measles (y-axis) against years (1931 .. 1971)
   xlabel('Year');           % Label the x-axis
   ylabel('Case count');     % Label the y-axis
   title('NYC measles cases for month of May');  % Put a title on the graph

EXAMPLE 5: Plot the spring measles cases

   figure                       % New figure
   plot(years, measlesSpring);  % Plot spring measles (y-axis) against years (1931 .. 1971)
   xlabel('Year');              % Label the x-axis
   ylabel('Case count');        % Label the y-axis
   title('NYC measles cases for spring');  % Put a title on the graph
   legend({'March', 'April', 'May'})  % Use a legend to identify multiple lines

Questions Answers
What is a legend? A legend is an annotation on a plot identifying the type of data represented by objects in the plot.
What does the word legend represent in MATLAB? The legend identifier names a MATLAB function that provides annotations for the objects on a graph. The function arguments (the items in parentheses) specify how to identify the plot objects in the annotation.
Why is 'March' enclosed in quotes? MATLAB uses single quotes to distinguish strings or labels from variable and function names.
When should I use a legend? If your axes have more than one graph, you should always use a legend. By calling the legend function rather than waiting to edit the plot later, you will provide documentation for your graph in your script.
Why was it better to use plot(x, y) rather than plot(y) function for this example? If you omit the specification of the x values, MATLAB plots against the values 1, 2, ... . In this case, these values should be 1931, 1932, ... . You would need to hand edit the x-axis labels later, a troublesome and error-prone operation.

EXAMPLE 6: Compare the measles cases for the years 1931 and 1941

   figure                       % New figure
   hold on                      % Draw multiple graphs in same figure
   plot(measles1931, '-sb');    % Draw 1931 measles with blue(b) squares(s)
   plot(measles1941, '-ok');    % Draw 1941 measles with black(k) circles(o)
   hold off                     % No more graphs
   xlabel('Month')              % Label x axis
   ylabel('Case count')         % Label y axis
   title('Measles cases NYC')   % Put a title on the graph
   legend({'1931', '1941'})     % Use a legend to identify two graphs

Questions Answers
What if I omit hold on? By default, MATLAB replaces one plot by another when you give multiple plot commands. If you didn't call hold on, you would only see the graph corresponding to chicken pox.
What does hold off do? The hold off command turns off the MATLAB hold state so that subsequent plots are not added to this figure.
What does the '-sb' mean in the first plot command? The 'b' is a short cut for setting the plot line color to blue. The s specifies that the individual data points should be plotted using square markers. The - indicates that successive points should be connected by straight lines, making the graph a line graph.
What does the 'k' mean in the second plot command? The 'k' is a short cut for setting the plot line color to black. The o specifies that the individual data points should be plotted using circular markers. The - indicates that successive points should be connected by straight lines, making the graph a line graph.
Why use different markers on individual graphs plotted on the same axis in addition to plotting the graphs in different colors? Colors are not always distinguishable when graphs are printed in black and white. Furthermore, people with different forms of color blindness may not be able to distinguish one line from another.
What happens if I don't set the line colors for the individual plots plots? All of the lines will appear in the same color (blue). You could always use the plot tools to edit the graphs later.

EXAMPLE 7: Adding the rows or columns of an array to summarize data

   measlesByMonth = sum(measles);    % Sum each column of measles
   measlesByYear = sum(measles, 2);  % Sum each row of measles
   measlesTotal = sum(measles(:));   % Find the total number of measles cases

Questions Answers
What does sum(measles) do? This command adds up each column of measles.
How big is sum(measles)? Since measles is an array with 41 rows and 12 columns, the result will be a single row of 12 entries.
Since measlesByMonth has one row of 12 columns, why doesn't plot(measlesByMonth) show 12 different lines, each consisting of one point? The MATLAB plot command normally plots each column of an array as a separate line. However, if the array only has one row, MATLAB plots all of its values as a single line.
How does sum(measles) differ from sum(measles, 1)? They are the same. If you omit the one, MATLAB assumes you meant one.
What does sum represent? This identifier represents a MATLAB function whose name is sum. The items enclosed in parenthesis are the arguments to the function. MATLAB sends these values to the function when it executes (calls) the function.
What does sum(measles, 2) do? This function call returns a single column consisting of the row sums of measles.
How big is sum(measles, 2)? Since measles is an array with 41 rows and 12 columns, the result will be a single column of 41 entries.
What happens if I omit the 2? MATLAB uses the default value, which is 1. The result will be an array of column sums, not row sums.
How big is measlesByYear? The measlesByYear variable holds a single column of 41 values corresponding to the 41 row sums of measles.

EXAMPLE 8: Plot yearly total (in thousands of cases) of measles by year

   figure                                 % Create a new figure window
   plot(years, measlesByYear./1000);      % Draw a line graph
   xlabel('Year');                        % Label the x-axis
   ylabel('Total cases (in thousands)');  % Label the y-axis
   title('NYC measles cases');            % Put a title on the graph

Questions Answers
What does A./B represent in MATLAB? A./B designates element-by-element division of the array A by the array B. MATLAB creates a new array in which the elements are the elements of A divided by the corresponding elements of B. If B is just a single number, then all of the elements in A are divided by the value of B.
What did dividing measlesByYear by 1000 accomplish? In EXAMPLE 8, the y-axis actually ran from 0 to 80,000. MATLAB shortens the labels and displays an exponent of 10^4 above the y-axis to make the display more readable. However, the exponent is easy to miss, and the 10^4 is hard to comprehend. By converting to thousands and adjusting the y-axis label accordingly, you will make the sizes easier for your viewer to comprehend.

EXAMPLE 9*: Try plotting the entire measles array

   figure                       % Create a new figure window
   plot(measles);               % Draw a line graph

Questions Answers
Why are there 12 lines on this graph? The plot(measles) command graphs each column of the array measles as a separate line. The measles variable has 12 columns, corresponding to the 12 months of the year.
What does the x-axis represent on the graph? The x-axis represents the 41 years in the study.
Why are the disease counts plotted against the numbers 1 to 41? Since we didn't give the x values explicitly, MATLAB plots against the row numbers of the array.
Why does the x-axis scale run from 0 to 45 when there are only 41 rows in measles? MATLAB tries to choose even numbers (e.g., multiples of 2, 5 or 10) for the scale and tick marks to make the graph easier to read.
What if I don't like the scale that MATLAB chooses for an axis? You can always override MATLAB's choice by editing the plot and making the scale or tick marks for an axis manual rather than auto. We did this in Lesson 1.
Why does the y-axis only go from 0 to 3 when many months of the reporting period have hundreds of cases of measles? Actually, the y-axis goes from 0 to 30,000 (i.e., 3 x 104). MATLAB converts the tick marks to scientific notation and displays the exponent at the top of the axis when the values are large.
How does MATLAB choose the colors for plotting the lines? MATLAB has user-settable properties to control just about everything about the visual appearance of graphs. It uses the ColorOrder property to control the colors. You can look at and modify the properties of a figure through the Property Editor when you are editing a figure.

EXAMPLE 10*: Attempt to plot all the measles data as a single time series

   figure                        % Create a new figure window
   plot(measles(:));             % Draw a line graph of end-to-end columns

Questions Answers
What does measles(:) mean? measles(:) is the linear representation of the measles array. Form the linear representation by placing the columns of measles end-to-end to make a single vertical column.
Does the linear representation of an array contain the same values as the original array? Yes, the values don't change, only the arrangement in rows and columns.
What is order of elements in the linear representation of an array? All of the elements from the first column (in order) are followed by all of the elements of the second column (in order), etc.

EXAMPLE 11*: Correctly plot the measles data as a single time series

   measlesFlip = measles';    % Flip measles to make rows into columns
   figure                     % Create a new figure window
   plot(measlesFlip(:));      % Draw a line graph

Questions Answers
What does the prime mean? The prime (') is the transpose operator.
What does the transpose operator do? The transpose operation flips an array on its main diagonal, making rows into columns and columns into rows. Thus, measles' is a new array with the same values as measles, but the rows and columns have been interchanged.

EXAMPLE 12*: Define the x-axis scale for the single time series

   yearStart = 1931;                       % Start of the scale
   yearInc = 1/12;                         % Scale has one month intervals
   yearEnd = 1972 - yearInc;               % End of the scale
   yearScale = yearStart:yearInc:yearEnd;  % Yearly scale (with month increments)

Questions Answers
What does the notation a:b mean? a:b creates a row of values that are one apart, starting with the value of a. None of the values can be bigger than b. Thus, 1931:1971 is a single row consisting of the values 1931, 1932, ..., 1971.
Does a:b still work if a and b are not integers? Yes, the list still starts with the value of a, and the values are still one apart. However, the list may end with a value less than b, if a and b are not separated by an integral value.
How would I produce a column containing the values 1931, 1932, ..., 1971? Use the transpose operator to turn a row into a column. Thus (1931:1971)' is a column containing the values 1931, 1932, ... 1971.
What does a:b:c do? This expression creates a row vector with the values a, a + b, a + 2*b, ..., c.
What if b doesn't evenly divide c - a? The row vector will include the largest value of a + n*b that is less than c.
Why is yearInc equal to 1/12? The points are spaced one month apart, but the units are in years. One month is 1/12 of a year.
Why is the ending point not 1972? The last point in the data set represents December of 1971, not January of 1972.

EXAMPLE 13*: Plot the measles data as a single time series, setting x-axis scale

   figure                                    % Create a new figure window
   plot(yearScale, measlesFlip(:)./1000);    % Draw a line graph
   xlabel('Year');                           % Label the x-axis
   ylabel('Cases (in thousands)');           % Label the y-axis
   title('NYC measles cases');               % Put a title on the graph

These questions were written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions.