LESSON: Error bars and measures of dispersion

FOCUS QUESTION: How can I depict uncertainty and variability in data?

This lesson discusses various ways of putting error bars on graphs.

In this lesson you will:
  • Examine measures of spread or dispersion.
  • Display error bars on different types of charts.
  • Additional practice with plot properties.
Measles vaccination rates WHO 2007

Contents

DATA FOR THIS LESSON

File Description
NYCDiseases.mat The data set contains the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971.

The file is organized into the following variables:

  • measles - an array containing the monthly cases of measles
  • mumps - an array containing the monthly cases of mumps
  • chickenPox - an array containing the monthly cases of chicken pox
  • years - a vector containing the years 1931 through 1971
The data was extracted from the Hipel-McLeod Time Series Datasets Collection, available at http://www.stats.uwo.ca/faculty/aim/epubs/mhsets/readme-mhsets.html.

The data was first published in: Yorke, J.A. and London, W.P. (1973). "Recurrent Outbreaks of Measles, Chickenpox and Mumps", American Journal of Epidemiology, Vol. 98, pp. 469.

SETUP FOR LESSON

EXAMPLE 1: Load the data about New York contagious diseases

Create a new cell in which you type and execute:

   load NYCDiseases.mat;    % Load the disease data

You should see measles, mumps, chickenPox, and years variables in the Workspace Browser.

EXAMPLE 2: Compute the overall mean and standard deviation of measles and chickenpox

Create a new cell in which you type and execute:

   measlesAver = mean(measles(:));         % Calculate overall average measles
   measlesSD = std(measles(:), 1);         % Calculate overall std measles
   chickenPoxAver = mean(chickenPox(:));   % Calculate overall average chickenpox
   chickenPoxSD = std(chickenPox(:), 1);   % Calculate overall std chickenpox

You should see the following variables in your Workspace Browser:

Note: we used the population estimate of standard deviation, not the sample standard deviation.

EXERCISE 1: Create variables to hold the overall average and overall standard deviation of the traffic data of Lesson 1.

EXAMPLE 3: Compare overall compare average and SD of monthly counts of measles and chickenpox

Create a new cell in which you type and execute:

   figure
   hold on
   errorbar(1, measlesAver./1000, measlesSD./1000, 'rs');
   errorbar(2, chickenPoxAver./1000,chickenPoxSD./1000, 'ko');
   hold off
   xlabel('Disease')
   ylabel('Monthly averages (in thousands)')
   title('Childhood diseases NYC: 1931-1971 (SD error bars)')
   set(gca, 'XTickMode', 'manual', 'XTick', 1:2,  ...
         'XTickLabelMode', 'manual', 'XTickLabel', {'Measles', 'Chicken Pox'},...
         'XLim',[0.5,2.5])

You should see a Figure Window with a labeled error bar plot:

EXERCISE 2: Limitations of standard deviation
Look at the graph from EXAMPLE 3. What doesn't make sense? (Hint: what does the value mean on the lower measles STD error bar?)

EXERCISE 3: Copy the code in EXAMPLE 3 and modify it to also include mumps.

EXAMPLE 4: Compute mean and standard deviation of monthly measles cases by year

Create a new cell in which you type and execute:

   measlesByYearAver = mean(measles, 2); % Average monthly measles by year
   measlesByYearSD = std(measles, 1, 2); % Std monthly measles by year

You should see the following varibles in your Workspace Browser:

Note: we used the population estimate of standard deviation, not the sample standard deviation.

EXERCISE 4: Create variables to hold the average and standard deviation by hour of the traffic data of Lesson 1.

EXERCISE 5: Plot the mean with SD error bars for the traffic data. Use the data computed from Exercise 4.

EXAMPLE 5: Plot the SD error bars for measles monthly counts by year

Create a new cell in which you type and execute:

   figure
   errorbar(years, measlesByYearAver./1000, measlesByYearSD./1000, 'ks');
   xlabel('Year');
   ylabel('Monthly averages (in thousands)')
   title('Measles NYC: 1931-1971 (SD error bars)')
   set(gca, 'YLimMode', 'manual', 'YLim', [0, 20])

You should see a Figure Window with a labeled error bar plot:

EXERCISE 6: Ethics of data presentation Look at the figure generated in EXAMPLE 5. Can you see the bar that indicates the bottom of all the error bars? Where is it? MATLAB allows us to set the limits of our graphs - this is done with the YLim command above - we set the lower limit of 0. By setting the lower value at 0, we are intentionally not displaying the full data set. Any data below zero doesn't make sense (can you have new cases less than 0 in a month?), but could this be considered unethical? Present a short argument both for and against.

EXAMPLE 6: Plot the SD error bars on a bar chart for measles

Create a new cell in which you type and execute:

   figure 

  hold on
   errorbar(years, measlesByYearAver./1000, measlesByYearSD./1000, 'ks');
   bar(years, measlesByYearAver./1000, 'FaceColor', [0.5, 0.5, 1])
   plot(years, measlesByYearAver./1000, 'LineStyle', 'none', ...
       'Marker', 's', 'MarkerEdgeColor','k', 'MarkerFaceColor','r')
   hold off
   xlabel('Year');
   ylabel('Monthly averages (in thousands)')
   title('Measles NYC: 1931-1971 (SD error bars)')
   set(gca, 'YLimMode', 'manual', 'YLim', [0, 20])

You should see a Figure Window with a labeled error bar plot:

EXAMPLE 7: Compute median, MAD and IQR by month for measles

Create a new cell in which you type and execute:

  measlesByMonthMedian = median(measles, 1);       % Median by month
  measlesByMonthMAD = mad(measles, 1, 1);          % Median by month
  measlesByMonthIQR = prctile(measles, [25, 75]);  % 25th and 75th %-tile

You should see the following 3 variables in your Workspace Browser:

The rows of measlesByMonthIQR correspond to the percentiles, and the columns correspond to the months.

EXAMPLE 8: Plot median monthly measles with IQR for error bars

Create a new cell in which you type and execute:

   xPositions = 1:12;
   lowerDist = measlesByMonthMedian - measlesByMonthIQR(1, :);  % Bottom
   upperDist = measlesByMonthIQR(2, :) - measlesByMonthMedian;  % Top bar
   figure
   errorbar(xPositions, measlesByMonthMedian./1000, ...
       lowerDist./1000, upperDist./1000, '-m*')
   xlabel('Month');
   ylabel('Cases in thousands')
   title('Measles cases in NYC: 1931-1971')
   legend('Median (IQR error bars)', 'Location', 'Northeast') % Upper right

You should see the following 3 variables in your Workspace Browser:

You should see a Figure Window with median/IQR error bars:

EXERCISE 7: Copy the code for EXAMPLE 8 into a new cell. Add a line graph of the average monthly measles cases (black line, no markers or error bars). Update the legend appropriately.

EXERCISE 8: Create a new figure in which you plot the yearly averages for measles and chickenpox on the same graph. The graphs should have SD error bars.

EXAMPLE 9: Plot IQR and MAD error bars on the same graph

Create a new cell in which you type and execute:

   figure
   hold on
   errorbar(xPositions-0.1, measlesByMonthMedian./1000, ...
                lowerDist./1000, upperDist./1000, 'm*')
   errorbar(xPositions+0.1, measlesByMonthMedian./1000, ...
                 measlesByMonthMAD./1000, 'ks')
   hold off
   xlabel('Month');
   ylabel('Median in thousands')
   title('Measles cases in NYC: 1931-1971')
   legend('IQR error bars', 'MAD error bars', 'Location', 'Northeast')

You should see a Figure Window with two sets of error bars:

SUMMARY OF SYNTAX

MATLAB syntax Description
errorbar(Y, E) Create a plot of the values of Y similar to plot(Y). The corresponding values in E give the length of each wing of the error bars that extend above and below the corresponding values in Y.
errorbar(X, Y, E) Create a plot similar to errorbar(Y, E) except that this function uses the values of X for the horizontal positions rather than using the integers 1, 2, ... .
errorbar(X, Y, L, U) Create a plot similar to errorbar(X, Y, E) except that this function uses the values of L and U to determine the lengths of the lower and upper wings of the error bars, respectively.
mad(X) Compute the average or mean absolute deviation for the array X across the first non-singleton dimension. For 2D arrays, this computes the mean absolute deviation across the rows (resulting in the mean absolute deviations of the columns).
mad(X, 0, 1) Compute the average or mean absolute deviation for the array X across dimension 1 (resulting in the mean absolute deviations of the columns). Note: If the second argument is 1, we compute the median absolute deviation.
mad(X, 0, 2) Compute the average or mean absolute deviation for the array X across dimension 2 (resulting in the mean absolute deviations of the rows). Note: If the second argument is 1, we compute the median absolute deviation.
Y = prctile(X, p) Compute a vector of the percentiles of the vector X. The vector p specifies the percentiles. When X is a 2D array, the i-th row of Y contains the percentiles p(i).
std(X) Compute the unbiased estimate of the population standard deviation for the array X across the first non-singleton dimension. For 2D arrays, this computes the standard deviation across the rows (resulting in thestandard deviations of the columns).
std(X, 0, 1) Compute the unbiased estimate of the population standard deviation for the array x across dimension 1 (resulting in the standard deviations of the columns). Note: If the second argument is 1, the actual sample standard deviation is computed.
std(X, 0, 2) ompute the unbiased estimate of the population standard deviation of the array x across dimension 2 (resulting in thestandard deviations of the rows). Note: If the second argument is 1, the actual sample standard deviation is computed.

This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified by Dawn Roberson on 26-Jan-2018. Please contact kay.robbins@utsa.edu with comments or suggestions.The photo shows rate of measles vaccination worldwide (WHO 2007) http://en.wikipedia.org/wiki/File:Measles_vaccination_worldwide.png.