LESSON: Basic Statistics

FOCUS QUESTION: How can I find typical characteristics and central tendencies of data?

This lesson shows you how to calculate and display statistical indicators such as mean, median, maximum and minimum.

In this lesson you will:
  • Calculate the mean, median, max, min, and standard deviation of various data groupings.
  • Output results to the Command Window.
Measles rash

Contents

DATA FOR THIS LESSON

File Description
NYCDiseases.mat The data set contains the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971.

The file is organized into the following variables:

  • measles - an array containing the monthly cases of measles
  • mumps - an array containing the monthly cases of mumps
  • chickenPox - an array containing the monthly cases of chicken pox
  • years - a vector containing the years 1931 through 1971
The data was extracted from the Hipel-McLeod Time Series Datasets Collection, available at http://www.stats.uwo.ca/faculty/aim/epubs/mhsets/readme-mhsets.html.

The data was first published in: Yorke, J.A. and London, W.P. (1973). "Recurrent Outbreaks of Measles, Chickenpox and Mumps", American Journal of Epidemiology, Vol. 98, pp. 469.

SETUP FOR LESSON

EXAMPLE 1: Load the data about New York contagious diseases

Create a new cell in which you type and execute:

   load NYCDiseases.mat;    % Load the disease data

You should see measles, mumps, chickenPox, and years variables in the Workspace Browser.

EXAMPLE 2: Calculate overall average monthly mumps cases

Create a new cell in which you type and execute:

   mumpsAver = mean(mumps(:));   % Average of entire array

You should see a new variable mumpsAver corresponding to the overall average monthly cases mumps.

EXERCISE 1: Create a variable to hold the overall average of the traffic data in count.dat.

EXAMPLE 3: Output the overall monthly average number of mumps cases

Create a new cell in which you type and execute:

   fprintf('Average mumps cases per month: %g\n', mumpsAver);

You should see the following output in the Command Window:

Average mumps cases per month: 502.24

EXERCISE 2: Output the overall average traffic computed in EXERCISE 1 on the same line with an informative message.

EXAMPLE 4: Output the overall median, maximum, and minimum monthly mumps cases

Create a new cell in which you type and execute:

   fprintf('mumps: median = %g [max = %g and min = %g]\n', ...
            median(mumps(:)), max(mumps(:)), min(mumps(:)));

You should see the following output in the Command Window:

mumps: median = 380.5 [max = 1956 and min = 50]

EXERCISE 3: Mumps average is about 500, and median is 380. That's a big difference - what does that mean?

EXAMPLE 5: Calculate the averages of mumps by month and by year

Create a new cell in which you type and execute:

   mumpsMonthAver = mean(mumps, 1);     % Average over the rows
   mumpsYearAver = mean(mumps, 2);      % Average over the columns

You should see two new variables in the Workspace Browser:

EXERCISE 4: Define variables for the averages of the traffic data by hour and by intersection.

EXAMPLE 6: Output the individual monthly averages of mumps

Create a new cell in which you type and execute:

   fprintf('Monthly averages of mumps: [ '); % Output a leading string
   fprintf('%g ', mumpsMonthAver);           % Output each element
   fprintf(']\n');                             % Output last ] and newline

You should see the following output in the Command Window:

Monthly averages of mumps: [ 492.146 574.049 842.951 912.073 879.805 786.268 453.585 229.463 146.463 152.878 218.61 338.585 ]

EXERCISE 5: Output hourly averages of traffic.

EXAMPLE 7: Calculate and output the monthly maximum of mumps by year.

Create a new cell in which you type and execute:

   fprintf('Yearly maxima of mumps: [ ');
   fprintf('%g ', max(mumps, [], 2));
   fprintf(']\n');

You should see the following output in the Command Window:

Yearly maxima of mumps: [ 329 901 1604 547 1938 668 1200 1738 555 1485 1261 1272 1070 859 793 1138 883 1956 596 1342 1003 838 1659 1220 945 1844 769 774 1183 754 717 803 1078 342 926 1020 500 607 639 527 300 ]

EXERCISE 6: Output the minimum traffic counts by hour.

EXAMPLE 8: Output the overall mean and median of measles, mumps and chicken pox in tabular form

Create a new cell in which you type and execute:

   fprintf('           Measles      Mumps    Chicken pox\n'); % Output the title
   fprintf('Mean:     %8.1f   %8.1f   %8.1f\n', ...
       mean(measles(:)), mean(mumps(:)), mean(chickenPox(:)));
   fprintf('Median :  %8.1f   %8.1f   %8.1f\n', ...
       median(measles(:)), median(mumps(:)), median(chickenPox(:)));

You should the following output in the Command Window:

           Measles      Mumps    Chicken pox
Mean:       1418.6      502.2      732.2
Median :     359.5      380.5      602.5

EXERCISE 7: Add additional rows to the table of EXAMPLE 8.
These rows should output the overall maximum and minimum for the three diseases.

EXERCISE 8: Relationships between mean, median, max and min.
In looking at the output of Exercise 7, you have a larger difference between the mean and median of measles than you did of mumps. What does that tell you?

EXERCISE 9: Create a table for traffic similar to that of EXAMPLE 8 and EXERCISE 6.
The columns correspond to the three intersections and the rows correspond to the different statistical indicators.

SUMMARY OF SYNTAX

MATLAB syntax Description
mean(X) Compute the averages of X along the first non-singleton dimension. For 2D arrays, this averages across the rows resulting in the column averages.
mean(X, 1) Compute the averages across dimension 1 of X resulting in the column averages.
mean(X, 2) Compute the averages across dimension 2 of X resulting in the row averages rows.
median(X) Compute the medians of the array X along the first non-singleton dimension. For 2D arrays, this computes the median across the rows resulting in the column medians).
median(X, 1) Compute the medians across dimension 1 of X resulting in the column medians).
median(X, 2) Compute the medians across dimension 2 of X resulting in the medians of the rows.
max(X) Compute the maxima of array X along the first non-singleton dimension. For 2D arrays, this computes the maxima across the rows resulting in column maxima.
max(X, [], 1) Compute the maxima of array X across dimension 1 resulting in the column maxima.
max(x, [], 2) Compute the maxima of array x across dimension 2 resulting in the row maxima.
min(X) Compute the minima of array X along the first non-singleton dimension. For 2D arrays, this computes the minima across the rows resulting in column minimaa.
min(X, [], 2) Compute the minima of array X across dimension 2 resulting in the row minima.
fprintf('My_Message') Output My_Message to the Command Window.
fprintf('The value %g is larger than zero\n', X) Substitute the value of the variable X at the point where the %g occurs in the message. The value of X must be numeric.
fprintf('Another message is %s\n', A) Substitute the value of the variable A at the point where the %s occurs in the message. A must contain a string.
fprintf('The value %5.2f\n', X) Substitute the value of the variable X at the point where the %5.2f occurs in the message. Output the value as a fixed point number of overall width 5 and 2 places to the right of the decimal. X must be a number.

This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last updated by Dawn Roberson on 28-Jan-2018. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is <http://en.wikipedia.org/wiki/File:RougeoleDP.jpg> -- original source Barbara Rice of CDC.