LESSON: Error bars questions

FOCUS QUESTION: How can I depict uncertainty and variability in data?


EXAMPLE 1: Load the data about New York contagious diseases

   load NYCDiseases.mat;

EXAMPLE 2: Compute the overall mean and standard deviation of measles and chickenPox

   measlesAver = mean(measles(:));
   measlesSD = std(measles(:), 1);
   chickenPoxAver = mean(chickenPox(:));
   chickenPoxSD = std(chickenPox(:), 1);

Questions Answers
What is std? The MATLAB std function finds the standard deviation of a list of values.
How do I compute the standard deviation of a list of numbers by hand? See the handout on statistical indicators for the formula.
What does the 1 represent in the expression std(count(:), 1)? This is tricky. The second parameter of the std function specifies type of standard deviation to compute, not the dimension. The function call std(count(:)), which is equivalent to std(count(:), 0) or std(count(:), 0, 1), finds the unbiased estimator of the standard deviation of the column vector count.
What does std(count, 1, 1) represent? The function call std(count, 1, 1), which is equivalent to std(count, 1), finds the ordinary sample standard deviation of the columns of count.
What is the difference between std(count, 1, 1) and std(count, 0, 1)? Both versions compute a value for each column (as specified by the third parameter). However, std(count, 0, 1) computes the unbiased estimate of the popluation standard deviation, while std(count, 1, 1) computes the actual sample standard deviation. We'll talk about population estimates later in the course. For now, we'll stick to the ordinary standard deviation.
How would I compute the ordinary standard deviation of the rows? Use std(count, 1, 2) to compute the ordinary standard deviation of the rows of count.
What happens if I use std(count, 2, 1)? The second argument of std must be either a 0 or 1. Other values give an error when the statement is executed.

EXAMPLE 3: Compare overall compare average and SD of monthly counts of measles and chicken pox

   hold on
   errorbar(1, measlesAver./1000, measlesSD./1000, 'rs');
   errorbar(2, chickenPoxAver./1000,chickenPoxSD./1000, 'ko');
   hold off
   ylabel('Cases (in thousands)')
   title('Childhood diseases NYC: 1931-1971 (SD error bars)')
   set(gca, 'XTick', 1:2, 'XTickMode', 'manual', ...
           'XTickLabelMode', 'manual', 'XTickLabel', {'Measles', 'Chicken Pox'}, ...

Questions Answers
What are error bars? Error bars are a visual device that is generally used to convey uncertainty. However, you can choose to use the error bars to represent anything you wish.
What do error bars actually represent in this example? Rather than depicting actual errors, these error bars indicate how widely the monthly measles case counts are spread around the mean.
What determines the location of the top and bottom of the error bars? The first argument of errorbar specifies the center of the bars. The second argument specifies the distance from the center in either direction.
How many lines and points does errorbar(X, Y) plot? The errorbar function plots a point for each element in X. Each column of X is treated as a data set, and its adjacent points are connected by lines of the same color. The plot shows a different line for each column of X. Note that if X and Y are vectors, errorbar draws one curve, regardless of whether these vectors are row vectors or column vectors.
Must X and Y have the same number of elements for errorbar(X, Y) to work? Yes, X and Y must be the same size.
Where are the error bars for errorbar(X, Y) located when X = [10, 30, 20, 10] and Y = [5, 2, 3, 1]? The tops of the error bars are located at 15, 32, 23, and 11, respectively. The bottoms of the error bars are located at 5, 28, 17, and 9, respectively. The markers of the error bars are located at 10, 30, 20, and 10, respectively
Where are the error bars for errorbar(X, Y) located when X = [10; 30; 20; 10] and Y = [5; 2; 3; 1]? The results are the same as in the previous question.
What does 'XTick' designate? The string 'XTick' is an example of a property. Property arguments are always specified in pairs of property name followed by the property value. XTick specifies the locations of the tick marks on the horizontal axis. Its value should be a vector of locations.
What does 'XTickLabel' designate? XTickLabel is a property specifying the labels of tick marks. Often the value of the XTickLabel property is a vector of strings or a cell array of strings. In this example, it is a 1 x 3 cell array with strings naming the two diseases.
What is gca used for? The gca designates the graphic context of the current axis, allowing us to access and set the axis properties from MATLAB programs.
How many properties can I change with a single call to set? The set function doesn't limit the number of properties that can be specified.
How do I find out what axis properties are available for setting? Use get(gca) to find out what properties are available for modification.
Will get(gca) give me all the properties of the figure? No, only the properties associated with the current axis are accessible. The figure window itself has its own properties (accessible by get(gcf)). Each axis on a figure with multiple axes has its own properties. In fact each object on the figure has its own properties. We'll discuss these properties later in the course.

EXAMPLE 4: Compute the mean and standard deviation of the measles by year

   measlesByYearAver = mean(measles, 2);
   measlesByYearSD = std(measles, 1, 2);

EXAMPLE 5: Plot the SD error bars for measles monthly counts by year

   errorbar(years, measlesByYearAver./1000, measlesByYearSD./1000, 'ks');
   ylabel('Monthly averages (in thousands)')
   title('Measles NYC: 1931-1971 (SD error bars)')
   set(gca, 'YLimMode', 'manual', 'YLim', [0, 20])

EXAMPLE 6: Plot the SD error bars on a bar chart for measles

   hold on
   errorbar(years, measlesByYearAver./1000, measlesByYearSD./1000, 'ks');
   bar(years, measlesByYearAver./1000, 'FaceColor', [0.5, 0.5, 1])
   plot(years, measlesByYearAver./1000, 'LineStyle', 'none', ...
       'Marker', 's', 'MarkerEdgeColor','k', 'MarkerFaceColor','r')
   hold off
   ylabel('Monthly averages (in thousands)')
   title('Measles NYC: 1931-1971 (SD error bars)')
   set(gca, 'YLimMode', 'manual', 'YLim', [0, 20])

Questions Answers
What happens if I put the errorbar after the bar in this example? The lower wing of the error bars are visible on top the bars.
What is 'FaceColor'? FaceColor is a property of bar that specifies the color of the bars. The color value should be a three-element row vector with values between 0 and 1 specifying how much red, green and blue, respectively, should make up the color. The example has a blue component that is twice as big as the red and green components. White corresponds to a vector of 3 ones, while black corresponds to a vector of 3 zeros.
How can I set the x positions of error bars? Call the errorbar function with three arguments: the x coordinates, the y coordinates, and the length of the error bar wings.
Why did this example explicitly set the error bar positions? Otherwise horizontal scale would be labeled with the values 1 .. 41 instead of the actual years.

EXAMPLE 7: Compute median, MAD and IQR by month for measles

  measlesByMonthMedian = median(measles, 1);
  measlesByMonthMAD = mad(measles, 1, 1);
  measlesByMonthIQR = prctile(measles, [25, 75]);

Questions Answers
Does measlesByMonthIQR really represent the inter quartile range? No, the inter quartile range (IQR) is the different between the 75th percentile and the 25th percentile. The measlesByMonth variable contains separate values for each percentile in a column.
Why is measlesByMonthIQR a 2 x 12 array? The measlesByMonth variable has 3 columns. The prctile function treats each column as a data set and finds the specified percentiles of each column. The result is column vector of size 2, since we asked for two percentile values. Putting the results together for the three data sets gives the 2x3 result.
What does sLengthIQR(2, 1) represent? The value is the 75th percentile for the January monthly counts of measles.
What does 'Location' designate in the legend function? 'Location' is a property specifying where the legend should be placed on the axes. This property is useful for preventig the legend from overlapping with the graph.

EXAMPLE 8: Plot median monthly measles with IQR for error bars

   xPositions = 1:12;
   lowerDist = measlesByMonthMedian - measlesByMonthIQR(1, :);
   upperDist = measlesByMonthIQR(2, :) - measlesByMonthMedian;
   errorbar(xPositions, measlesByMonthMedian./1000, ...
       lowerDist./1000, upperDist./1000, '-m*')
   ylabel('Cases in thousands')
   title('Measles cases in NYC: 1931-1971')
   legend('Median (IQR error bars)', 'Location', 'Northeast')

Questions Answers
Why do the IQR error bars need different values for the distances above and below? The median does not necessarily fall in the center of the IQR range. In other words, the 75th percentile minus the median does not necessarily have the same value as the median minus the 25th percentile.
Why did we connect the error bars with a line? Since there is a time relationship between the months, successive bars have a time ordering. The lines emphasize this ordering visually, but they aren't required.

EXAMPLE 9: Plot IQR and MAD error bars on the same graph

   hold on
   errorbar(xPositions-0.1, measlesByMonthMedian./1000, ...
                lowerDist./1000, upperDist./1000, 'm*')
   errorbar(xPositions+0.1, measlesByMonthMedian./1000, ...
                 measlesByMonthMAD./1000, 'ks')
   hold off
   ylabel('Median in thousands')
   title('Measles cases in NYC: 1931-1971')
   legend('IQR error bars', 'MAD error bars', 'Location', 'Northeast')

Questions Answers
Why did we not connect the error bars with a line? We could have connected each group separately but decided that it would be visually confusing to do so.

This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified by Dawn Roberson on 21-Jan-2014. Please contact kay.robbins@utsa.edu with comments or suggestions.