LESSON QUESTIONS: Basic stats questions

FOCUS QUESTION: How can I depict typical characteristics and central tendancies of data?

Contents

EXAMPLE 1: Load the data about New York contagious diseases

   load NYCDiseases.mat;

EXAMPLE 2: Calculate overall average monthly mumps cases

   mumpsAver = mean(mumps(:));

Questions Answers
What is mumps(:)? This is the linear representation of the mumps array. mumps(:) treats mumps as though all of its columns were laid end to end to form a single column.
What is mean? mean is a MATLAB function that returns an average based on the value of its arguments. When its argument is a single row or column, the result is the average of all of the values in the array.
What are the arguments of a function? The arguments are the input values passed to the function so that it can perform its computation. These values appear in parantheses after the function name.
How could I compute the average of a list of numbers by hand? Add up all of the numbers and divide by the number of values.

EXAMPLE 3: Output the overall monthly average number of mumps cases

   fprintf('Average mumps cases per month: %g\n', mumpsAver);
Average mumps cases per month: 502.24

Questions Answers
What does fprintf do? The fprintf function in the example outputs to the Command Window. The fprintf function can also output to a file.
What does the first argument of fprintf represent? The first argument is the format string, which controls the appearance of the output. This string may contain a number of format specifiers for outputting variable values. Each format specifier starts with a %.
What is %g? The %g is a format specifier or rule for outputting a numeric value. Use the %s format specifier when outputting a string.
What does \n mean? The \n in a format string causes the output to start on a new line.
How many % specifications can I use in a format string? Use as many as are needed to output the variables. Recall that you need a % specification for each variable. If the format string doesn't have enough, MATLAB reuses the format string from the beginning.

EXAMPLE 4: Output the overall median, maximum, and minimum monthly mumps cases

   fprintf('mumps: median = %g [max = %g and min = %g]\n', ...
            median(mumps(:)), max(mumps(:)), min(mumps(:)));
mumps: median = 380.5 [max = 1956 and min = 50]

Questions Answers
What is median? The MATLAB median function finds the middle value after sorting the list of numbers. That is, half of the values in the list are less than or equal to this value and half are greater than or equal to the value.
Why is the median useful? The median sometimes give a better representative value than the average does, particularly if the list has some extreme values.
How do I compute the median of a list of numbers by hand? Sort the values in increasing order and take the center value in the list. For a list containing an even number of values, take the average of the two middle values.
What does the 1 represent in the expression median(count, 1)? The second argument of the median function specifies the dimension over which to find the median similar to the way sum's second argument does.

EXAMPLE 5: Calculate the averages of mumps by month and by year

   mumpsMonthlyAver = mean(mumps, 1 );
   mumpsYearlyAver = mean(mumps, 2);

Questions Answers
What does the value 1 represent in the expression mean(mean, 1)? The second argument of the mean function specifies the dimension over which to find the average similar to the way sum's second argument does.
What is the difference between mean(mumps) and mean(mumps, 2)? The mean(mumps) function call is equivalent to mean(mumps, 1). This call computes the column averages of the mumps array. The MATLAB function call mean(mumps) computes a row vector containing the column averages of the mumps array.

EXAMPLE 6: Output the individual monthly averages of mumps

   fprintf('Monthly averages of mumps: [ ');
   fprintf('%g ', mumpsMonthlyAver);
   fprintf(']\n');
Monthly averages of mumps: [ 492.146 574.049 842.951 912.073 879.805 786.268 453.585 229.463 146.463 152.878 218.61 338.585 ]

Questions Answers
What does the first fprintf print? The statement has no variables to display and the format string has no format specifiers. Therefore this fprintf just displays the format string.
What does the second fprintf print? MATLAB needs a format specifier for each element of the mumpsMonthlyAver array. When the format specifiers run out, MATLAB starts from the beginning of the entire format string. Thus, MATLAB prints each element of the array followed by a blank.
If the array X has a single row of 10 values what is the output of fprintf('X = %g\n', X)? The format string only specifies how to print one value, so MATLAB reuses the format string in its entirety for each of the 10 values. The result will be ten lines each containing X = followed by one of the values in X.
If X is an array with 2 rows and 3 columns, in what order does the statement fprintf('X = %g\n', X) output the values of X? MATLAB uses the order of the array's linear representation. That is, it prints the values in the first column of X, followed by those in the second column, and then the third.

EXAMPLE 7: Calculate and output the monthly maximum of mumps by year.

   fprintf('Yearly maxima of mumps: [ ');
   fprintf('%g ', max(mumps, [], 2));
   fprintf(']\n');
Yearly maxima of mumps: [ 329 901 1604 547 1938 668 1200 1738 555 1485 1261 1272 1070 859 793 1138 883 1956 596 1342 1003 838 1659 1220 945 1844 769 774 1183 754 717 803 1078 342 926 1020 500 607 639 527 300 ]

Questions Answers
What does max(mumps) do? The max(mumps) function call results in a row vector containing the largest value from each column of mumps.
Is max(mumps) the same as max(mumps, 1)? No, the second argument of the max function does not correspond to the dimension. When the second argument isn't empty, max does an element-by-element comparison between the corresponding first and second arguments. In the example, the result is the same size as mumps but each element is the larger of the corresponding element in mumps and the value 1.
Is max(mumps) the same as max(mumps, [], 1)? Yes, they are the same. Both result in a row vector that consists of the largest value from each column.
What does max(mumps, [], 2) do? The function call results in a column vector containing the maximum values of each row.
What does [] mean? The notation [] means an empty array, that is an array with 0 rows and 0 columns.
What does min(mumps) do? The function call results in a row vector containing the smallest value from each column of mumps.
How do the min and max functions compare? Their arguments follow the same rules. The difference is that where min finds the smallest values, max finds the largest values.
Is the value -0.002 smaller than -1? No, while -0.002 has smaller magnitude or absolute value than -1, it is larger in the sense of comparison. That is -0.002 is to the right of -1 on the number line.

EXAMPLE 8: Output overall mean and median of measles, mumps and chicken pox in tabular form

Create a new cell in which you type and execute:

   fprintf('           Measles      Mumps    Chicken pox\n'); % Output the title
   fprintf('Mean:     %8.1f   %8.1f   %8.1f\n', ...
       mean(measles(:)), mean(mumps(:)), mean(chickenPox(:)));
   fprintf('Median :  %8.1f   %8.1f   %8.1f\n', ...
       median(measles(:)), median(mumps(:)), median(chickenPox(:)));
           Measles      Mumps    Chicken pox
Mean:       1418.6      502.2      732.2
Median :     359.5      380.5      602.5

Questions Answers
What is the difference between a %g format specifier and a %f format specifier? Both specifiers are designed to output numerical values of variables in the output list. The %g specifier is a general numeric specifier that tries to figure out a good display based on the value. Thus %g might display using integer, decimal, or exponential notation. The %f specifier always displays in decimal notation and is useful for presenting tabular values.
What does the %8.1f actually mean? The f specifies decimal notation. The 8 says use a total of 8 columns (including the sign, decimal point and positions after the decimal). The .1 says always display one decimal place.

These lesson questions were written by Kay A. Robbins of the University of Texas at San Antonio and last modified by Dawn Roberson on 4-Jan-2014. Please contact kay.robbins@utsa.edu with comments or suggestions.