# LESSON: Error bars and measures of dispersion

FOCUS QUESTION: How can I depict uncertainty and variability in data?

This lesson discusses various ways of putting error bars on graphs.

 In this lesson you will: Examine measures of spread or dispersion. Display error bars on different types of charts. Additional practice with plot properties. ## DATA FOR THIS LESSON

 File Description NYCDiseases.mat The data set contains the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971. The file is organized into the following variables: measles - an array containing the monthly cases of measles mumps - an array containing the monthly cases of mumps chickenPox - an array containing the monthly cases of chicken pox years - a vector containing the years 1931 through 1971 The data was extracted from the Hipel-McLeod Time Series Datasets Collection, available at http://www.stats.uwo.ca/faculty/aim/epubs/mhsets/readme-mhsets.html. The data was first published in: Yorke, J.A. and London, W.P. (1973). "Recurrent Outbreaks of Measles, Chickenpox and Mumps", American Journal of Epidemiology, Vol. 98, pp. 469.

## SETUP FOR LESSON

• Create an ErrorBars directory on your V: drive and make it your current directory.
• Download the NYCDiseases.mat to your ErrorBars directory.
• Create a ErrorBarLesson script file in your ErrorBars directory.

## EXAMPLE 1: Load the data about New York contagious diseases

Create a new cell in which you type and execute:

```   load NYCDiseases.mat;    % Load the disease data
```

You should see measles, mumps, chickenPox, and years variables in the Workspace Browser.

## EXAMPLE 2: Compute the overall mean and standard deviation of measles and chickenpox

Create a new cell in which you type and execute:

```   measlesAver = mean(measles(:));         % Calculate overall average measles
measlesSD = std(measles(:), 1);         % Calculate overall std measles
chickenPoxAver = mean(chickenPox(:));   % Calculate overall average chickenpox
chickenPoxSD = std(chickenPox(:), 1);   % Calculate overall std chickenpox
```

You should see the following variables in your Workspace Browser:

• measlesAver - overall average of measles
• measlesSD - overall standard deviation of measles
• chickenPoxAver - overall average of chickenpox
• chickenPoxSD - overall standard deviation of chickenpox

Note: we used the population estimate of standard deviation, not the sample standard deviation.

EXERCISE 1: Create variables to hold the overall average and overall standard deviation of the traffic data of Lesson 1.

## EXAMPLE 3: Compare overall compare average and SD of monthly counts of measles and chickenpox

Create a new cell in which you type and execute:

```   figure
hold on
errorbar(1, measlesAver./1000, measlesSD./1000, 'rs');
errorbar(2, chickenPoxAver./1000,chickenPoxSD./1000, 'ko');
hold off
xlabel('Disease')
ylabel('Monthly averages (in thousands)')
title('Childhood diseases NYC: 1931-1971 (SD error bars)')
set(gca, 'XTickMode', 'manual', 'XTick', 1:2,  ...
'XTickLabelMode', 'manual', 'XTickLabel', {'Measles', 'Chicken Pox'},...
'XLim',[0.5,2.5])
```

You should see a Figure Window with a labeled error bar plot: EXERCISE 2: Limitations of standard deviation
Look at the graph from EXAMPLE 3. What doesn't make sense? (Hint: what does the value mean on the lower measles STD error bar?)

EXERCISE 3: Copy the code in EXAMPLE 3 and modify it to also include mumps.

## EXAMPLE 4: Compute mean and standard deviation of monthly measles cases by year

Create a new cell in which you type and execute:

```   measlesByYearAver = mean(measles, 2); % Average monthly measles by year
measlesByYearSD = std(measles, 1, 2); % Std monthly measles by year
```

You should see the following varibles in your Workspace Browser:

• measlesByYearAver - a 41 x 1 array of average monthly measles cases by year
• measlesByYearSD - a 41 x 1 array of average standard deviations of measles cases by year

Note: we used the population estimate of standard deviation, not the sample standard deviation.

EXERCISE 4: Create variables to hold the average and standard deviation by hour of the traffic data of Lesson 1.

EXERCISE 5: Plot the mean with SD error bars for the traffic data. Use the data computed from Exercise 4.

## EXAMPLE 5: Plot the SD error bars for measles monthly counts by year

Create a new cell in which you type and execute:

```   figure
errorbar(years, measlesByYearAver./1000, measlesByYearSD./1000, 'ks');
xlabel('Year');
ylabel('Monthly averages (in thousands)')
title('Measles NYC: 1931-1971 (SD error bars)')
set(gca, 'YLimMode', 'manual', 'YLim', [0, 20])
```

You should see a Figure Window with a labeled error bar plot: EXERCISE 6: Ethics of data presentation Look at the figure generated in EXAMPLE 5. Can you see the bar that indicates the bottom of all the error bars? Where is it? MATLAB allows us to set the limits of our graphs - this is done with the YLim command above - we set the lower limit of 0. By setting the lower value at 0, we are intentionally not displaying the full data set. Any data below zero doesn't make sense (can you have new cases less than 0 in a month?), but could this be considered unethical? Present a short argument both for and against.

## EXAMPLE 6: Plot the SD error bars on a bar chart for measles

Create a new cell in which you type and execute:

```   figure

hold on
errorbar(years, measlesByYearAver./1000, measlesByYearSD./1000, 'ks');
bar(years, measlesByYearAver./1000, 'FaceColor', [0.5, 0.5, 1])
plot(years, measlesByYearAver./1000, 'LineStyle', 'none', ...
'Marker', 's', 'MarkerEdgeColor','k', 'MarkerFaceColor','r')
hold off
xlabel('Year');
ylabel('Monthly averages (in thousands)')
title('Measles NYC: 1931-1971 (SD error bars)')
set(gca, 'YLimMode', 'manual', 'YLim', [0, 20])
```

You should see a Figure Window with a labeled error bar plot: ## EXAMPLE 7: Compute median, MAD and IQR by month for measles

Create a new cell in which you type and execute:

```  measlesByMonthMedian = median(measles, 1);       % Median by month
measlesByMonthMAD = mad(measles, 1, 1);          % Median by month
measlesByMonthIQR = prctile(measles, [25, 75]);  % 25th and 75th %-tile
```

You should see the following 3 variables in your Workspace Browser:

• measlesByMonthMedian - the median measles by month
• measlesByMonthMAD - median absolute deviation (MAD) by month
• measlesByMonthIQR - IQR for the measles by month

The rows of measlesByMonthIQR correspond to the percentiles, and the columns correspond to the months.

## EXAMPLE 8: Plot median monthly measles with IQR for error bars

Create a new cell in which you type and execute:

```   xPositions = 1:12;
lowerDist = measlesByMonthMedian - measlesByMonthIQR(1, :);  % Bottom
upperDist = measlesByMonthIQR(2, :) - measlesByMonthMedian;  % Top bar
figure
errorbar(xPositions, measlesByMonthMedian./1000, ...
lowerDist./1000, upperDist./1000, '-m*')
xlabel('Month');
ylabel('Cases in thousands')
title('Measles cases in NYC: 1931-1971')
legend('Median (IQR error bars)', 'Location', 'Northeast') % Upper right
```

You should see the following 3 variables in your Workspace Browser:

• lowerDist - lengths of lower edges of IQR error bars for median
• upperDist - lengths of upper edges of IQR error bars for median
• xPositions - vector with the values 1..12

You should see a Figure Window with median/IQR error bars: EXERCISE 7: Copy the code for EXAMPLE 8 into a new cell. Add a line graph of the average monthly measles cases (black line, no markers or error bars). Update the legend appropriately.

EXERCISE 8: Create a new figure in which you plot the yearly averages for measles and chickenpox on the same graph. The graphs should have SD error bars.

## EXAMPLE 9: Plot IQR and MAD error bars on the same graph

Create a new cell in which you type and execute:

```   figure
hold on
errorbar(xPositions-0.1, measlesByMonthMedian./1000, ...
lowerDist./1000, upperDist./1000, 'm*')
errorbar(xPositions+0.1, measlesByMonthMedian./1000, ...
measlesByMonthMAD./1000, 'ks')
hold off
xlabel('Month');
ylabel('Median in thousands')
title('Measles cases in NYC: 1931-1971')
legend('IQR error bars', 'MAD error bars', 'Location', 'Northeast')
```

You should see a Figure Window with two sets of error bars: ## SUMMARY OF SYNTAX

 MATLAB syntax Description `errorbar(Y, E)` Create a plot of the values of `Y` similar to `plot(Y)`. The corresponding values in `E` give the length of each wing of the error bars that extend above and below the corresponding values in `Y`. `errorbar(X, Y, E)` Create a plot similar to `errorbar(Y, E)` except that this function uses the values of `X` for the horizontal positions rather than using the integers 1, 2, ... . `errorbar(X, Y, L, U)` Create a plot similar to `errorbar(X, Y, E)` except that this function uses the values of `L` and `U` to determine the lengths of the lower and upper wings of the error bars, respectively. mad(X) Compute the average or mean absolute deviation for the array X across the first non-singleton dimension. For 2D arrays, this computes the mean absolute deviation across the rows (resulting in the mean absolute deviations of the columns). mad(X, 0, 1) Compute the average or mean absolute deviation for the array X across dimension 1 (resulting in the mean absolute deviations of the columns). Note: If the second argument is 1, we compute the median absolute deviation. mad(X, 0, 2) Compute the average or mean absolute deviation for the array X across dimension 2 (resulting in the mean absolute deviations of the rows). Note: If the second argument is 1, we compute the median absolute deviation. `Y = prctile(X, p)` Compute a vector of the percentiles of the vector `X`. The vector `p` specifies the percentiles. When `X` is a 2D array, the i-th row of `Y` contains the percentiles `p(i)`. std(X) Compute the unbiased estimate of the population standard deviation for the array X across the first non-singleton dimension. For 2D arrays, this computes the standard deviation across the rows (resulting in thestandard deviations of the columns). std(X, 0, 1) Compute the unbiased estimate of the population standard deviation for the array x across dimension 1 (resulting in the standard deviations of the columns). Note: If the second argument is 1, the actual sample standard deviation is computed. std(X, 0, 2) ompute the unbiased estimate of the population standard deviation of the array x across dimension 2 (resulting in thestandard deviations of the rows). Note: If the second argument is 1, the actual sample standard deviation is computed.

This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified by Dawn Roberson on 26-Jan-2018. Please contact kay.robbins@utsa.edu with comments or suggestions.The photo shows rate of measles vaccination worldwide (WHO 2007) http://en.wikipedia.org/wiki/File:Measles_vaccination_worldwide.png.