LESSON 13 QUESTIONS: Depicting confidence and significance

FOCUS QUESTION: How reliable is my estimate of population mean?

Contents


EXAMPLE 1: Load the Fisher iris data

   load fisheriris;


EXAMPLE 2: Compute the ends of the 95% confidence intervals for sepal length

   sepalLens = reshape(meas(:, 1), 50, 3);  % Make sepal lengths 50 x 3
   sampleSize = size(sepalLens, 1); % Number of rows (points in the sample)
   sLenMeans = mean(sepalLens); % Calculate the mean sepal length each species
   sLenSEMs = std(sepalLens)./ sqrt(sampleSize);
   sLenCIEnds = sLenSEMs.* 1.96; % Calculate the size of confidence interval

Questions Answers
What is a random sample? A random sample is a subset of values selected at random from a population. In common English usage, a sample often refers to a single value, but in statistics, a sample represents a group of values.
How do I estimate the true mean of a population if I only have one sample of that population? The usual way is to calculate the sample mean and use this value as an "estimate" of the true population mean. In MATLAB, the mean of the sample held in vector x is just mean(x).
How can I get a good estimate of the true population mean? Usually larger random samples result in better estimates of the true population mean. Many biological research projects use very small samples because of the difficulty or expense of doing the experiment. Making sure that the sample is picked at random is very important and difficult in practice. If you take a different sample, you will usually get a different estimate. Taking multiple random samples and pooling the estimates is useful too.
How do I estimate the true standard deviation of a population if I only have one sample of that population? The usual way is to calculate the unbiased estimator of the standard deviation. In MATLAB, the unbiased estimator of population standard deviation for a sample held in vector x is just std(x).
How is std(x) calculated? The std(x) is the unbiased sample estimate of the population standard deviation. The evaluation takes the square root of the squared error of the sample values from the sample mean divided by the length of the sample minus 1. See the handout on Populations and samples for the actual formulas.
Does std(x) represent the standard deviation of the sample in the vector x? No, std(x) is not the standard deviation of x.
How can I calculate the actual sample standard deviation of x? Use std(x, 1) to calculate the actual standard deviation of the values contained in the vector x.
How is std(x, 1) calculated? Calculate std(x, 1) by taking the square root of the variance of x. In other words, the standard deviation is just the square root of the MSE of the values from the mean.
Why not just use the sample standard deviation to estimate the population standard deviation? Statisticians have found that, on the whole, sample standard deviation underestimates the population standard deviation, especially for small samples. In other words, sample standard deviation is a "biased" estimator of population standard deviation. They came up with a fudge factor to give a better estimate.
What does SEM stand for? SEM stands for standard error of the mean.
What does SEM represent? SEM gives the standard deviation of different sample estimates of the population mean.
How can I estimate SEM if I have only one sample? Suppose the sample is in the vector x. Calculate SEM as std(x)./ sqrt(length(x)). In other words, estimate SEM by dividing an estimate of the population standard deviation by the square root of the sample size.


EXAMPLE 3: Output population estimates for Iris setosa

   fprintf('Population estimate of mean sepal length for Iris setosa: %g\n',  ...
           sLenMeans(1))
   fprintf('95%% confidence interval for this estimate: [%g, %g]\n', ...
           sLenMeans(1) - sLenCIEnds(1), sLenMeans(1) + sLenCIEnds(1));
   fprintf('Standard error (SE or SEM): %g\n', sLenSEMs(1));
   fprintf('Relative standard error (RSE): %g%%\n', ...
       100*sLenSEMs(1)./sLenMeans(1));
Population estimate of mean sepal length for Iris setosa: 5.006
95% confidence interval for this estimate: [4.90829, 5.10371]
Standard error (SE or SEM): 0.0498496
Relative standard error (RSE): 0.995796%

Questions Answers
What is the 95% confidence interval for the mean? Roughly, we are 95% certain that the true population mean falls within this interval. More specifically, If we take a large number of different samples and compute the confidence interval for each, these 95% of these confidence intervals will contain the true mean.
Does a confidence interval represent a single value? No, a confidence interval is always a vector of two values specifying a range. The first value is the smaller of the two values.
How do I estimate the 95% confidence interval for the population mean given a random sample of that population? Suppose the sample is in the vector x. Follow these steps:
  • Estimate the population mean, m (by using the sample mean, mean(x)).
  • Calculate the SEM (by using std(x)./sqrt(n) where n is the number of elements in x).
  • The confidence interval is just [m - SEM*1.96, m + SEM*1.96].
Does the calculation of the 95% confidence interval for the population mean make any assumptions about the population? Yes, the formula assumes that the population is normally distributed and may not be accurate for populations that are not normally distributed. Furthermore, the sample must be drawn at random from the population.
Why does the fprintf statement use two percent signs (%%) in a row to print out a single percent sign? Normally fprintf interprets the percent sign (%) as the start of a format specifier for a variable value. Two percent signs in a row tell MATLAB to interpret these characters as a single percent rather than as the beginning of a format specifier.


EXAMPLE 4: Plot mean sepal length using 95% confidence interval error bars

   irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use a standard legend
   irisTitle = 'Comparison of three iris species';      % Use a standard title
   figure
   errorbar(sLenMeans, sLenCIEnds, 'ks'); % Error bars use black squares
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(irisTitle)
   legend('Mean (95% CI error bars)', 'Location', 'Southeast') % Put in lower right

Questions Answers
What do the error bars on this graph represent? The error bars represent represent the 95% confidence intervals for the estimates of the true population mean sepal lengths of three iris species.
What can I say about the average sepal length of Iris versicolor? Roughly, we are 95% certain that the average sepal length of Iris versicolor is between 6.4 mm and 6.8 mm. The statistical interpretation is that 95% of the samples give confidence intervals that contain the true mean. In other words, there is a 95% probability that this sample is one of those "good" samples. In this case the true mean will fall in the calculated confidence (be between 6.4 mm and 6.8 mm).
How does Iris versicolor compare with the other two species? The error bars for the other two species are lower and smaller. These observations indicate that Iris versicolor has longer sepals than the other two species (e.g., the error bar for Iris versicolor is higher). Iris versicolor sepals also appear to vary more in length than sepals of the other species (e.g., the error bar for Iris versicolor is longer). Furthermore, none of the error bars overlap indicating that the three species are characterized by different length sepals.
Why wasn't sLength95CIEnds explicitly added to and subtracted from sLengthMeans to compute the ends of the confidence intervals as in EXAMPLE 3? The errorbar function expects the center position of the bar and the distance above (or below) the bar. This function takes care of the computation performed manually in EXAMPLE 3.


EXAMPLE 5: Plot mean sepal length with SEM and 95% confidence interval error bars

   xPositions = [1, 2, 3];  % Use these as base x-axis error bar positions
   figure
   hold on
   errorbar(xPositions-0.1, sLenMeans, sLenSEMs, 'g^') % Green up triangles
   errorbar(xPositions+0.1, sLenMeans, sLenCIEnds, 'bv') % Blue down triangles
   set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
   xlabel('Species of Iris');
   ylabel('Sepal length in mm')
   title(irisTitle)
   legend({'Mean (SEM error bars)', 'Mean (95% CI error bars)'}, 'Location', 'Southeast')
   hold off

Questions Answers
Are the SEM error bars always smaller than the 95% confidence intervals computed from the SEMs? Yes, they are approximately half as long.
Why are the x positions of these error bars adjusted by +/- 0.1? If you don't adjust the x values or you omit the x positions for the error bars entirely, MATLAB plots both sets of bars at the positions 1, 2 and 3, respectively. The overplotted error bars are hard to distinguish.
Why an offset of 0.1? The size of the offset is up to you. Pick an offset that is small enough so that the error bars for each species are close together and distinguishable from those of the other species.


EXAMPLE 6: Load the circadian rhythm data from Lab 1

   load circadian.mat


EXAMPLE 7: Estimate hourly temperature mean and 95% CI (horse 1)

   hPtsPerDay = 12;                              % 12 data points per day
   hSampleSz = length(horse(:, 1))./hPtsPerDay;     % Sample size
   horse1 = reshape(horse(:, 1), hPtsPerDay, hSampleSz); % Extract and reshape data from first horse
   horse1Means = mean(horse1, 2);         % Average for each hour over all days
   horse1STDs = std(horse1, 0, 2);        % Unbiased sd for each hour over all days
   horse1SEMs = horse1STDs./sqrt(hSampleSz); % Standard error of mean for each hour over all days
   horse1CIEnds = horse1SEMs.* 1.96;      % 95% confidence interval size

Questions Answers
How big is horse(:, 1)? The horse(:, 1) array has 120 elements corresponding to data taken every two hours for 10 days.
What is the shape of horse1? The horse1 array has 12 rows and 10 columns. Each row corresponds to an hour of the day, and each column corresponds to a day of the experiment.
What is the purpose of reshaping the data for horse 1? You want to compute the average for each hour over all days of the experiment. After reshaping, each row of horse1 holds the measurements made at a particular time of day. Simply average each row of horse1.
Why average over all days of the experiment? The assumption is that horse body temperature is a function of time of day. The measurements at a particular time should be clustered around a true value for that time, but are subject to experimental noise (with some being higher and some lower). The errors should cancel out in averaging.
How do I think of this averaging in terms of populations and samples? Again, the assumption is that the horse body temperature depends on the time of day. For a particular time of day, the horse has a true value of body temperature. An actual measured value will vary from the true value due to biological variability and experimental noise. To compensate for this variability, measure the temperature at this time on several different days (the sample). Use this sample to predict the true population mean (the average body temperature at this time over all possible days).
What is the population in this example? You can argue that the population is the temperature at this time for this horse over all days. Alternatively, you can argue that the population is all horses or all female horses.
How do the samples correspond to horse1? Each row of horse1 correspond to a sample from a different experiment. The horse1 array actually represents 12 different experiments. Row 1 corresponds to the experiment "measuring the horse's body temperature at 8 am over 10 different days". Similarly, row 2 corresponds to "measuring the horse's body temperature at 10 am over 10 different days".


EXAMPLE 8: Plot hourly mean temperature using 95% CI error bars (1 horse)

   hHours = horseHours(1:hPtsPerDay);        % Extract hour of day for means
   figure
   errorbar(hHours, horse1Means, horse1CIEnds, '-ks');  % Error bars use black squares
   xlabel('Hours (from midnight)');
   ylabel('Body temperature ( ^oC)')
   title('Mean temperature variation of thoroughbred horse (10-day average)')
   legend('Mean 1 horse (95% CI error bars)', 'Location', 'Southeast') % Put in lower right

Questions Answers
Why did this example define the hHours variable? The data started at 8 am rather than at 1 am and is spaced 2 hours apart rather than 1 hour apart. If you don't provide x values, the errorbar function plots the error bars at positions 1, 2, ... rather than at positions 8, 10, ... . You would then have to hand-label the tick marks to correct. As a rule, you should try to explicitly set your x axis values.


EXAMPLE 9: Estimate hourly temperature mean and 95% CI (5 horses)

   hAllSampleSz = length(horse(:))./hPtsPerDay;     % Sample size when all horses used
   horseR = reshape(horse, hPtsPerDay, hAllSampleSz); % Each row corresponds to an hour
   horseMeans = mean(horseR, 2);
   horseSTDs = std(horseR, 0, 2);
   horseSEMs = horseSTDs./sqrt(hAllSampleSz);
   horseCIEnds = horseSEMs.* 1.96;

Questions Answers
How big is horseR? The horse array has 120 rows and 5 columns for a total of 600 elements. After the reshape, the horseR array has 12 rows and 50 columns. Each row corresponds to a specific hour of the day (starting with 8 am). Each column corresponds to measurements of body temperature at the time. There are 50 measurements (5 horses measured for 10 days).
What is the purpose of reshaping the data? You want to compute the average for each hour over all horses for all days of the experiment. After reshaping, each row of horseR holds the measurements made at a particular time of day. Simply average each row of horseR.
Why average over all days of the experiment? The assumption is that horse body temperature is a function of time of day. The measurements at a particular time should be clustered around a true value for that time, but are subject to experimental noise (with some being higher and some lower). The errors should cancel out in the average.
How do I think of this averaging in terms of populations and samples? Again, the assumption is that the horse body temperature depends on the time of day. For a particular time of day, horses have a particular value of the body temperature on average. To get a true indication of the population average, measure the temperature at this time on several different days for several different horses (the sample). Use this sample to predict the true population mean (the average temperature at this time over all days).
What is the population in this example? You can argue that the population is all thoroughbred horses or maybe all female thoroughbred horses.
How do the samples correspond to horseR? Each row of horseR correspond to a sample from a different experiment. The horseR array actually represents 12 different experiments. Row 1 corresponds to the experiment "measuring horse body temperature at 8 am for 5 horses over 10 different days". Similarly, row 2 corresponds to "measuring horse body temperature at 10 am for 5 horses over 10 different days".


EXAMPLE 10: Plot hourly mean temperature using 95% CI error bars (5 horses)

   figure
   errorbar(hHours, horseMeans, horseCIEnds, '-ro');  % Error bars use red circles
   xlabel('Hours (from midnight)');
   ylabel('Body temperature ( ^oC)')
   title('Mean temperature variation of thoroughbred horse (10-day average)')
   legend('Mean 5 horses (95% CI error bars)', 'Location', 'Southeast') % Put in lower right

Questions Answers
Why are the error bars smaller than in EXAMPLE 8? The error bars represent the 95% confidence intervals in both examples. However, the sample size for EXAMPLE 8 is 10, while the sample size of EXAMPLE 10 is 50. Since the SEM is the population standard deviation divided by the square root of the sample size, you would expect the error bars to be smaller for EXAMPLE 10.
Are the SEM error bars always smaller if the sample size is bigger? Although this is the expected behavior, it is not guaranteed. The larger sample could include some outliers, making the standard deviation estimate larger. This effect could cancel out the effect of the larger sample size. If you observe such a case, you should look very carefully at the data, because it is likely that it doesn't follow a nice normal distribution.
Why does the curve of EXAMPLE 10 look smoother than the curve of EXAMPLE 8? Again, the assumption is that the body temperature of the horse is a smooth function of time of day. Measurements of a particular horse on a particular day will vary from this relationship, with some being higher and some lower (provided the experiment doesn't have some bias in a particular direction). Thus, by averaging over multiple horses and multiple days, you would expect these variations to be smoothed out. If you observe that this isn't the case for your experiments, then you want to look for sources of bias or other errors.


EXAMPLE 11: Show results from 1 horse and 5 horses on same graph

   figure
   hold on
   errorbar(hHours, horse1Means, horse1CIEnds, '-ks');
   errorbar(hHours, horseMeans, horseCIEnds, '-ro');
   xlabel('Hours (from midnight)');
   ylabel('Body temperature ( ^oC)')
   title('Mean temperature variation of thoroughbred horse (10-day average)')
   legend({'Mean 1 horse (95% CI error bars)' 'Mean 5 horses (95% CI error bars)'}, ...
       'Location', 'Southeast') % Put in lower right


This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a cladogram showing the ancestry of the mammals discussed in Richard Dawkins' book The Ancestor's Tale and generated by Fred Hsu from ITOL: Interactive Tree of Life (http://itol.embl.de/). See http://en.wikipedia.org/wiki/The_Ancestor%27s_Tale for a description of The Ancestor's Tale.