LESSON: Basic Statistics
FOCUS QUESTION: How can I find typical characteristics and central tendencies of data?
This lesson shows you how to calculate and display statistical indicators such as mean, median, maximum and minimum.
In this lesson you will:

Contents
 DATA FOR THIS LESSON
 SETUP FOR LESSON
 EXAMPLE 1: Load the data about New York contagious diseases
 EXAMPLE 2: Calculate overall average monthly mumps cases
 EXAMPLE 3: Output the overall monthly average number of mumps cases
 EXAMPLE 4: Output the overall median, maximum, and minimum monthly mumps cases
 EXAMPLE 5: Calculate the averages of mumps by month and by year
 EXAMPLE 6: Output the individual monthly averages of mumps
 EXAMPLE 7: Calculate and output the monthly maximum of mumps by year.
 EXAMPLE 8: Output the overall mean and median of measles, mumps and chicken pox in tabular form
 SUMMARY OF SYNTAX
DATA FOR THIS LESSON
File  Description 
NYCDiseases.mat 
The data set contains the monthly totals
of the number of new cases of measles, mumps, and chicken pox for
New York City during the years 19311971.
The file is organized into the following variables:
The data was first published in: Yorke, J.A. and London, W.P. (1973). "Recurrent Outbreaks of Measles, Chickenpox and Mumps", American Journal of Epidemiology, Vol. 98, pp. 469. 
SETUP FOR LESSON
 Create a BasicStats directory on your V: drive and make it your current directory.
 Download the NYCDiseases.mat to your BasicStats directory.
 Create a BasicStatsLesson.m script file in your BasicStats directory. Enter each of the examples in a new cell in this script.
EXAMPLE 1: Load the data about New York contagious diseases
Create a new cell in which you type and execute:
load NYCDiseases.mat; % Load the disease data
You should see measles, mumps, chickenPox, and years variables in the Workspace Browser.
EXAMPLE 2: Calculate overall average monthly mumps cases
Create a new cell in which you type and execute:
mumpsAver = mean(mumps(:)); % Average of entire array
You should see a new variable mumpsAver corresponding to the overall average monthly cases mumps.
EXAMPLE 3: Output the overall monthly average number of mumps cases
Create a new cell in which you type and execute:
fprintf('Average mumps cases per month: %g\n', mumpsAver);
You should see the following output in the Command Window:
Average mumps cases per month: 502.24
EXAMPLE 4: Output the overall median, maximum, and minimum monthly mumps cases
Create a new cell in which you type and execute:
fprintf('mumps: median = %g [max = %g and min = %g]\n', ... median(mumps(:)), max(mumps(:)), min(mumps(:)));
You should see the following output in the Command Window:
mumps: median = 380.5 [max = 1956 and min = 50]
EXAMPLE 5: Calculate the averages of mumps by month and by year
Create a new cell in which you type and execute:
mumpsMonthAver = mean(mumps, 1); % Average over the rows mumpsYearAver = mean(mumps, 2); % Average over the columns
You should see two new variables in the Workspace Browser:
 mumpsMonthAver  the column averages of mumps
 mumpsYearAver  the row averages of mumps
EXAMPLE 6: Output the individual monthly averages of mumps
Create a new cell in which you type and execute:
fprintf('Monthly averages of mumps: [ '); % Output a leading string fprintf('%g ', mumpsMonthAver); % Output each element fprintf(']\n'); % Output last ] and newline
You should see the following output in the Command Window:
Monthly averages of mumps: [ 492.146 574.049 842.951 912.073 879.805 786.268 453.585 229.463 146.463 152.878 218.61 338.585 ]
EXAMPLE 7: Calculate and output the monthly maximum of mumps by year.
Create a new cell in which you type and execute:
fprintf('Yearly maxima of mumps: [ '); fprintf('%g ', max(mumps, [], 2)); fprintf(']\n');
You should see the following output in the Command Window:
Yearly maxima of mumps: [ 329 901 1604 547 1938 668 1200 1738 555 1485 1261 1272 1070 859 793 1138 883 1956 596 1342 1003 838 1659 1220 945 1844 769 774 1183 754 717 803 1078 342 926 1020 500 607 639 527 300 ]
EXAMPLE 8: Output the overall mean and median of measles, mumps and chicken pox in tabular form
Create a new cell in which you type and execute:
fprintf(' Measles Mumps Chicken pox\n'); % Output the title fprintf('Mean: %8.1f %8.1f %8.1f\n', ... mean(measles(:)), mean(mumps(:)), mean(chickenPox(:))); fprintf('Median : %8.1f %8.1f %8.1f\n', ... median(measles(:)), median(mumps(:)), median(chickenPox(:)));
You should the following output in the Command Window:
Measles Mumps Chicken pox Mean: 1418.6 502.2 732.2 Median : 359.5 380.5 602.5
These rows should output the overall maximum and minimum for the three diseases.
EXERCISE 8: Relationships between mean, median, max and min.
In looking at the output of Exercise 7, you have a larger difference between the mean and median of measles than you did of mumps. What does that tell you?
EXERCISE 9: Create a table for traffic similar to that of EXAMPLE 8 and EXERCISE 6.
The columns correspond to the three intersections and the rows correspond to the different statistical indicators.
SUMMARY OF SYNTAX
MATLAB syntax  Description 
mean(X)  Compute the averages of X along the first nonsingleton dimension. For 2D arrays, this averages across the rows resulting in the column averages. 
mean(X, 1)  Compute the averages across dimension 1 of X resulting in the column averages. 
mean(X, 2)  Compute the averages across dimension 2 of X resulting in the row averages rows. 
median(X)  Compute the medians of the array X along the first nonsingleton dimension. For 2D arrays, this computes the median across the rows resulting in the column medians). 
median(X, 1)  Compute the medians across dimension 1 of X resulting in the column medians). 
median(X, 2)  Compute the medians across dimension 2 of X resulting in the medians of the rows. 
max(X)  Compute the maxima of array X along the first nonsingleton dimension. For 2D arrays, this computes the maxima across the rows resulting in column maxima. 
max(X, [], 1)  Compute the maxima of array X across dimension 1 resulting in the column maxima. 
max(x, [], 2)  Compute the maxima of array x across dimension 2 resulting in the row maxima. 
min(X)  Compute the minima of array X along the first nonsingleton dimension. For 2D arrays, this computes the minima across the rows resulting in column minimaa. 
min(X, [], 2)  Compute the minima of array X across dimension 2 resulting in the row minima. 
fprintf('My_Message')  Output My_Message to the Command Window. 
fprintf('The value %g is larger than zero\n', X)  Substitute the value of the variable X at the point where the %g occurs in the message. The value of X must be numeric. 
fprintf('Another message is %s\n', A)  Substitute the value of the variable A at the point where the %s occurs in the message. A must contain a string. 
fprintf('The value %5.2f\n', X)  Substitute the value of the variable X at the point where the %5.2f occurs in the message. Output the value as a fixed point number of overall width 5 and 2 places to the right of the decimal. X must be a number. 
This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last updated by Dawn Roberson on 28Jan2018. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is <http://en.wikipedia.org/wiki/File:RougeoleDP.jpg>  original source Barbara Rice of CDC.