LESSON 13 QUESTIONS: Depicting confidence and significance
FOCUS QUESTION: How reliable is my estimate of population mean?
Contents
- EXAMPLE 1: Load the Fisher iris data
- EXAMPLE 2: Compute the ends of the 95% confidence intervals for sepal length
- EXAMPLE 3: Output population estimates for Iris setosa
- EXAMPLE 4: Plot mean sepal length using 95% confidence interval error bars
- EXAMPLE 5: Plot mean sepal length with SEM and 95% confidence interval error bars
- EXAMPLE 6: Load the circadian rhythm data from Lab 1
- EXAMPLE 7: Estimate hourly temperature mean and 95% CI (horse 1)
- EXAMPLE 8: Plot hourly mean temperature using 95% CI error bars (1 horse)
- EXAMPLE 9: Estimate hourly temperature mean and 95% CI (5 horses)
- EXAMPLE 10: Plot hourly mean temperature using 95% CI error bars (5 horses)
- EXAMPLE 11: Show results from 1 horse and 5 horses on same graph
EXAMPLE 1: Load the Fisher iris data
load fisheriris;
EXAMPLE 2: Compute the ends of the 95% confidence intervals for sepal length
sepalLens = reshape(meas(:, 1), 50, 3); % Make sepal lengths 50 x 3 sampleSize = size(sepalLens, 1); % Number of rows (points in the sample) sLenMeans = mean(sepalLens); % Calculate the mean sepal length each species sLenSEMs = std(sepalLens)./ sqrt(sampleSize); sLenCIEnds = sLenSEMs.* 1.96; % Calculate the size of confidence interval
| Questions | Answers |
| What is a random sample? | A random sample is a subset of values selected at random from a population. In common English usage, a sample often refers to a single value, but in statistics, a sample represents a group of values. |
| How do I estimate the true mean of a population if I only have one sample of that population? | The usual way is to calculate the sample mean and use
this value as an "estimate" of the true population mean. In MATLAB,
the mean of the sample held in vector x is just
mean(x). |
| How can I get a good estimate of the true population mean? | Usually larger random samples result in better estimates of the true population mean. Many biological research projects use very small samples because of the difficulty or expense of doing the experiment. Making sure that the sample is picked at random is very important and difficult in practice. If you take a different sample, you will usually get a different estimate. Taking multiple random samples and pooling the estimates is useful too. |
| How do I estimate the true standard deviation of a population if I only have one sample of that population? | The usual way is to calculate the unbiased estimator of the
standard deviation. In MATLAB, the unbiased estimator of population
standard deviation for a sample held in vector x is just
std(x). |
How is std(x) calculated? |
The std(x) is the unbiased sample estimate of
the population standard deviation. The evaluation takes the square root of the squared
error of the sample values from the sample mean divided by
the length of the sample minus 1. See the handout on Populations
and samples for the actual formulas. |
Does std(x) represent
the standard deviation of the sample in the vector x? |
No, std(x) is not the standard deviation of x. |
How can I calculate the actual sample standard deviation of x? |
Use std(x, 1) to calculate the actual standard
deviation of the values contained in the vector x. |
How is std(x, 1) calculated? |
Calculate std(x, 1) by taking the square root of the
variance of x.
In other words, the standard deviation is just the square root of
the MSE of the values from the mean. |
| Why not just use the sample standard deviation to estimate the population standard deviation? | Statisticians have found that, on the whole, sample standard deviation underestimates the population standard deviation, especially for small samples. In other words, sample standard deviation is a "biased" estimator of population standard deviation. They came up with a fudge factor to give a better estimate. |
| What does SEM stand for? | SEM stands for standard error of the mean. |
| What does SEM represent? | SEM gives the standard deviation of different sample estimates of the population mean. |
| How can I estimate SEM if I have only one sample? | Suppose the sample is in the vector x.
Calculate SEM as std(x)./ sqrt(length(x)). In other
words, estimate SEM by dividing
an estimate of the population standard deviation
by the square root of the sample size. |
EXAMPLE 3: Output population estimates for Iris setosa
fprintf('Population estimate of mean sepal length for Iris setosa: %g\n', ... sLenMeans(1)) fprintf('95%% confidence interval for this estimate: [%g, %g]\n', ... sLenMeans(1) - sLenCIEnds(1), sLenMeans(1) + sLenCIEnds(1)); fprintf('Standard error (SE or SEM): %g\n', sLenSEMs(1)); fprintf('Relative standard error (RSE): %g%%\n', ... 100*sLenSEMs(1)./sLenMeans(1));
Population estimate of mean sepal length for Iris setosa: 5.006 95% confidence interval for this estimate: [4.90829, 5.10371] Standard error (SE or SEM): 0.0498496 Relative standard error (RSE): 0.995796%
| Questions | Answers |
| What is the 95% confidence interval for the mean? | Roughly, we are 95% certain that the true population mean falls within this interval. More specifically, If we take a large number of different samples and compute the confidence interval for each, these 95% of these confidence intervals will contain the true mean. |
| Does a confidence interval represent a single value? | No, a confidence interval is always a vector of two values specifying a range. The first value is the smaller of the two values. |
| How do I estimate the 95% confidence interval for the population mean given a random sample of that population? | Suppose the sample is in the vector x. Follow these steps:
|
| Does the calculation of the 95% confidence interval for the population mean make any assumptions about the population? | Yes, the formula assumes that the population is normally distributed and may not be accurate for populations that are not normally distributed. Furthermore, the sample must be drawn at random from the population. |
Why does the fprintf statement use two
percent signs (%%) in a row to print out a single percent sign? |
Normally fprintf interprets the percent sign (%)
as the start of a format specifier for a variable value.
Two percent signs in a row tell MATLAB to interpret these
characters as a single percent rather than as the beginning of
a format specifier. |
EXAMPLE 4: Plot mean sepal length using 95% confidence interval error bars
irisSpecies = {'Setosa', 'Virginica', 'Versicolor'}; % Use a standard legend
irisTitle = 'Comparison of three iris species'; % Use a standard title
figure
errorbar(sLenMeans, sLenCIEnds, 'ks'); % Error bars use black squares
set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels
xlabel('Species of Iris');
ylabel('Sepal length in mm')
title(irisTitle)
legend('Mean (95% CI error bars)', 'Location', 'Southeast') % Put in lower right
| Questions | Answers |
| What do the error bars on this graph represent? | The error bars represent represent the 95% confidence intervals for the estimates of the true population mean sepal lengths of three iris species. |
| What can I say about the average sepal length of Iris versicolor? | Roughly, we are 95% certain that the average sepal length of Iris versicolor is between 6.4 mm and 6.8 mm. The statistical interpretation is that 95% of the samples give confidence intervals that contain the true mean. In other words, there is a 95% probability that this sample is one of those "good" samples. In this case the true mean will fall in the calculated confidence (be between 6.4 mm and 6.8 mm). |
| How does Iris versicolor compare with the other two species? | The error bars for the other two species are lower and smaller. These observations indicate that Iris versicolor has longer sepals than the other two species (e.g., the error bar for Iris versicolor is higher). Iris versicolor sepals also appear to vary more in length than sepals of the other species (e.g., the error bar for Iris versicolor is longer). Furthermore, none of the error bars overlap indicating that the three species are characterized by different length sepals. |
Why wasn't
sLength95CIEnds explicitly added to and
subtracted from sLengthMeans to
compute the ends of the confidence intervals as in EXAMPLE 3? |
The errorbar function expects the center position
of the bar and the distance above (or below) the bar. This function
takes care of the computation performed manually in EXAMPLE 3. |
EXAMPLE 5: Plot mean sepal length with SEM and 95% confidence interval error bars
xPositions = [1, 2, 3]; % Use these as base x-axis error bar positions figure hold on errorbar(xPositions-0.1, sLenMeans, sLenSEMs, 'g^') % Green up triangles errorbar(xPositions+0.1, sLenMeans, sLenCIEnds, 'bv') % Blue down triangles set(gca, 'XTick', 1:3, 'XTickLabel', irisSpecies) % Set ticks and tick labels xlabel('Species of Iris'); ylabel('Sepal length in mm') title(irisTitle) legend({'Mean (SEM error bars)', 'Mean (95% CI error bars)'}, 'Location', 'Southeast') hold off
| Questions | Answers |
| Are the SEM error bars always smaller than the 95% confidence intervals computed from the SEMs? | Yes, they are approximately half as long. |
| Why are the x positions of these error bars adjusted by +/- 0.1? | If you don't adjust the x values or you omit the x positions for the error bars entirely, MATLAB plots both sets of bars at the positions 1, 2 and 3, respectively. The overplotted error bars are hard to distinguish. |
| Why an offset of 0.1? | The size of the offset is up to you. Pick an offset that is small enough so that the error bars for each species are close together and distinguishable from those of the other species. |
EXAMPLE 6: Load the circadian rhythm data from Lab 1
load circadian.mat
EXAMPLE 7: Estimate hourly temperature mean and 95% CI (horse 1)
hPtsPerDay = 12; % 12 data points per day hSampleSz = length(horse(:, 1))./hPtsPerDay; % Sample size horse1 = reshape(horse(:, 1), hPtsPerDay, hSampleSz); % Extract and reshape data from first horse horse1Means = mean(horse1, 2); % Average for each hour over all days horse1STDs = std(horse1, 0, 2); % Unbiased sd for each hour over all days horse1SEMs = horse1STDs./sqrt(hSampleSz); % Standard error of mean for each hour over all days horse1CIEnds = horse1SEMs.* 1.96; % 95% confidence interval size
| Questions | Answers |
How big is horse(:, 1)? |
The horse(:, 1) array has 120 elements corresponding
to data taken every two hours for 10 days. |
What is the shape of horse1? |
The horse1 array has 12 rows and 10 columns. Each
row corresponds to an hour of the day, and each column corresponds
to a day of the experiment. |
| What is the purpose of reshaping the data for horse 1? | You want to compute the average for each hour over all days
of the experiment. After reshaping, each row of
horse1 holds the measurements made at
a particular time of day. Simply average each row of horse1. |
| Why average over all days of the experiment? | The assumption is that horse body temperature is a function of time of day. The measurements at a particular time should be clustered around a true value for that time, but are subject to experimental noise (with some being higher and some lower). The errors should cancel out in averaging. |
| How do I think of this averaging in terms of populations and samples? | Again, the assumption is that the horse body temperature depends on the time of day. For a particular time of day, the horse has a true value of body temperature. An actual measured value will vary from the true value due to biological variability and experimental noise. To compensate for this variability, measure the temperature at this time on several different days (the sample). Use this sample to predict the true population mean (the average body temperature at this time over all possible days). |
| What is the population in this example? | You can argue that the population is the temperature at this time for this horse over all days. Alternatively, you can argue that the population is all horses or all female horses. |
How do the samples correspond to horse1? |
Each row of horse1 correspond to a sample from a
different experiment. The horse1 array actually represents
12 different experiments. Row 1 corresponds to the experiment
"measuring the horse's body temperature at 8 am over 10 different days".
Similarly, row 2 corresponds to "measuring the horse's body temperature
at 10 am over 10 different days".
|
EXAMPLE 8: Plot hourly mean temperature using 95% CI error bars (1 horse)
hHours = horseHours(1:hPtsPerDay); % Extract hour of day for means figure errorbar(hHours, horse1Means, horse1CIEnds, '-ks'); % Error bars use black squares xlabel('Hours (from midnight)'); ylabel('Body temperature ( ^oC)') title('Mean temperature variation of thoroughbred horse (10-day average)') legend('Mean 1 horse (95% CI error bars)', 'Location', 'Southeast') % Put in lower right
| Questions | Answers |
Why did this example define the hHours variable? |
The data started at 8 am rather than at 1 am and is spaced
2 hours apart rather than 1 hour apart. If you don't provide
x values, the errorbar function plots the error bars
at positions 1, 2, ... rather than at positions 8, 10, ... . You
would then have to hand-label the tick marks to correct. As a rule,
you should try to explicitly set your x axis values. |
EXAMPLE 9: Estimate hourly temperature mean and 95% CI (5 horses)
hAllSampleSz = length(horse(:))./hPtsPerDay; % Sample size when all horses used horseR = reshape(horse, hPtsPerDay, hAllSampleSz); % Each row corresponds to an hour horseMeans = mean(horseR, 2); horseSTDs = std(horseR, 0, 2); horseSEMs = horseSTDs./sqrt(hAllSampleSz); horseCIEnds = horseSEMs.* 1.96;
| Questions | Answers |
How big is horseR? |
The horse array has 120 rows and 5 columns for
a total of 600 elements. After the reshape,
the horseR array has 12 rows and 50 columns.
Each row corresponds to a specific hour of the day (starting with
8 am). Each column corresponds to measurements of body temperature
at the time. There are 50 measurements (5 horses measured for 10 days). |
| What is the purpose of reshaping the data? | You want to compute the average for each hour over all horses for all days
of the experiment. After reshaping, each row of
horseR holds the measurements made at
a particular time of day. Simply average each row of horseR. |
| Why average over all days of the experiment? | The assumption is that horse body temperature is a function of time of day. The measurements at a particular time should be clustered around a true value for that time, but are subject to experimental noise (with some being higher and some lower). The errors should cancel out in the average. |
| How do I think of this averaging in terms of populations and samples? | Again, the assumption is that the horse body temperature depends on the time of day. For a particular time of day, horses have a particular value of the body temperature on average. To get a true indication of the population average, measure the temperature at this time on several different days for several different horses (the sample). Use this sample to predict the true population mean (the average temperature at this time over all days). |
| What is the population in this example? | You can argue that the population is all thoroughbred horses or maybe all female thoroughbred horses. |
How do the samples correspond to horseR? |
Each row of horseR correspond to a sample from a
different experiment. The horseR array actually represents
12 different experiments. Row 1 corresponds to the experiment
"measuring horse body temperature at 8 am for 5 horses
over 10 different days".
Similarly, row 2 corresponds to "measuring horse body temperature
at 10 am for 5 horses over 10 different days".
|
EXAMPLE 10: Plot hourly mean temperature using 95% CI error bars (5 horses)
figure errorbar(hHours, horseMeans, horseCIEnds, '-ro'); % Error bars use red circles xlabel('Hours (from midnight)'); ylabel('Body temperature ( ^oC)') title('Mean temperature variation of thoroughbred horse (10-day average)') legend('Mean 5 horses (95% CI error bars)', 'Location', 'Southeast') % Put in lower right
| Questions | Answers |
| Why are the error bars smaller than in EXAMPLE 8? | The error bars represent the 95% confidence intervals in both examples. However, the sample size for EXAMPLE 8 is 10, while the sample size of EXAMPLE 10 is 50. Since the SEM is the population standard deviation divided by the square root of the sample size, you would expect the error bars to be smaller for EXAMPLE 10. |
| Are the SEM error bars always smaller if the sample size is bigger? | Although this is the expected behavior, it is not guaranteed. The larger sample could include some outliers, making the standard deviation estimate larger. This effect could cancel out the effect of the larger sample size. If you observe such a case, you should look very carefully at the data, because it is likely that it doesn't follow a nice normal distribution. |
| Why does the curve of EXAMPLE 10 look smoother than the curve of EXAMPLE 8? | Again, the assumption is that the body temperature of the horse is a smooth function of time of day. Measurements of a particular horse on a particular day will vary from this relationship, with some being higher and some lower (provided the experiment doesn't have some bias in a particular direction). Thus, by averaging over multiple horses and multiple days, you would expect these variations to be smoothed out. If you observe that this isn't the case for your experiments, then you want to look for sources of bias or other errors. |
EXAMPLE 11: Show results from 1 horse and 5 horses on same graph
figure hold on errorbar(hHours, horse1Means, horse1CIEnds, '-ks'); errorbar(hHours, horseMeans, horseCIEnds, '-ro'); xlabel('Hours (from midnight)'); ylabel('Body temperature ( ^oC)') title('Mean temperature variation of thoroughbred horse (10-day average)') legend({'Mean 1 horse (95% CI error bars)' 'Mean 5 horses (95% CI error bars)'}, ... 'Location', 'Southeast') % Put in lower right
This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a cladogram showing the ancestry of the mammals discussed in Richard Dawkins' book The Ancestor's Tale and generated by Fred Hsu from ITOL: Interactive Tree of Life (http://itol.embl.de/). See http://en.wikipedia.org/wiki/The_Ancestor%27s_Tale for a description of The Ancestor's Tale.