LESSON 5 QUESTIONS: Basic statistical indicators

FOCUS QUESTION: How can I depict typical characteristics and central tendancies of data?

Contents

EXAMPLE 1: Load the data about New York contagious diseases

   load NYCDiseases.mat;    % Load the disease data

EXAMPLE 2: Calculate overall average monthly measles cases

   measlesAverage = mean(measles(:));   % Average of entire array

Questions Answers
What is measles(:)? This is the linear representation of the measles array. measles(:) treats measles as though all of its columns were laid end to end to form a single column.
What is mean? mean is a MATLAB function that returns an average based on the value of its arguments. When its argument is a single row or column, the result is the average of all of the values in the array.
What are the arguments of a function? The arguments (i.e., input arguments) are the values passed to the function so that it perform its computation. These values appear in parantheses after the function name.
What does the value 1 represent in the expression mean(count, 1)? The second argument of the mean function specifies the dimension over which to find the average similar to the way sum's second argument does.
How could I compute the average of a list of numbers by hand? Add up all of the numbers and divide by the number of values.

EXAMPLE 3: Output the overall monthly average number of measles cases

   fprintf('Average measles cases per month: %g\n', measlesAverage);
Average measles cases per month: 1418.59

Questions Answers
What does fprintf do? The fprintf function in the example outputs to the Command Window. The fprintf function can also output to a file.
What does the first argument of fprintf represent? The first argument is the format string, which controls the appearance of the output. This string may contain a number of format specifications for outputting variable values. Each format specification starts with a %.
What is %g? The %g is a format specification or rule for outputting a numeric value. Use the %s format specification when outputting a string.
What does \n mean? The \n in a format string causes the output to start on a new line.

EXAMPLE 4: Calculate the individual monthly and yearly averages of measles

   measlesMonthlyAverages = mean(measles);     % Average the columns
   measlesYearlyAverages = mean(measles, 2);   % Average the rows

Questions Answers
What is the difference between mean(measles) and mean(measles, 2)? The mean(measles) function call is equivalent to mean(measles, 1). This call computes the column averages of the measles array. The MATLAB function call mean(measles) computes the row averages of the measles array.

EXAMPLE 5: Output the individual monthly averages of measles

   fprintf('Monthly averages of measles: [ '); % Output a leading string
   fprintf('%g ', measlesMonthlyAverages);     % Output each element
   fprintf(']\n');                             % Output ending ] and newline
Monthly averages of measles: [ 940.39 1816.39 3428.2 3855.12 3159.73 2100.54 696.122 192.195 80.0732 100.854 193.976 459.537 ]

Questions Answers
What does the first fprintf print? The statement has no variables to display and the format string has no format specifiers. Therefore this fprintf just displays the format string.
What does the second fprintf print? MATLAB needs a format specification for each element of the measlesMonthlyAverages array. When the format specifications run out, MATLAB starts from the beginning of the entire format string. Thus, MATLAB prints each element of the array followed by a blank.
If the array x has a single row of 10 values what is the output of fprintf(X = %g\n', x)? The format string only specifies how to print one value, so MATLAB reuses the format string in its entirety for each of the 10 values. The result will be ten lines each containing X = followed by one of the values in x.
If x is an array with 2 rows and 3 columns, in what order does the statement fprintf(X = %g\n', x) output the values of x? MATLAB uses the order of the array's linear representation. That is, it prints the values in the first column of x, followed by those in the second column, and then the third.

EXAMPLE 6: Create your own printList function

Questions Answers
Why are functions so important in data analysis? Functions allow you to generalize the code you write so that you can apply the same sequence of commands to other data.
What are title and list in the definition of printList? The title and list variables are called the parameters or input arguments of the printList function.
How do title and list get their values? In order to use printList, you must use the function name followed by two values in parentheses. These values are called the arguments of the function. MATLAB assigns their values to title and list while evaluating printList. For example, you could use printList('X = ', x) to output the variable x.
What determines the order of a function's arguments? The order is determined by the order that they appear in the function definition.

EXAMPLE 7: Output the monthly averages of measles by calling printList

    printList('Monthly averages of measles', measlesMonthlyAverages);
Monthly averages of measles: [ 940.39 1816.39 3428.2 3855.12 3159.73 2100.54 696.122 192.195 80.0732 100.854 193.976 459.537 ]

Questions Answers
What would printList('', measlesMonthlyAverages) do? Since '' denotes the empty string, printList would display the list of comma separated values in square brackets as before. The output would be preceded by a colon.
Would printList('Monthly averages of mumps', measlesMonthlyAverages) cause an error? MATLAB can only detect errors in syntax. It does not look at the meaning of what you write. Although the above message misidentifies the values, MATLAB would have no problem executing this statement.

EXAMPLE 8: Output the monthly averages and medians of mumps

   printList('Monthly averages of mumps', mean(mumps));
   printList('Monthly medians of mumps', median(mumps));
   printList('Monthly standard deviations of mumps', std(mumps, 1));
Monthly averages of mumps: [ 492.146 574.049 842.951 912.073 879.805 786.268 453.585 229.463 146.463 152.878 218.61 338.585 ]
Monthly medians of mumps: [ 412 490 745 838 799 717 437 211 145 137 201 284 ]
Monthly standard deviations of mumps: [ 243.269 301.026 437.967 423.423 377.872 299.881 188.211 95.5733 51.0715 50.3729 85.937 163.738 ]

Questions Answers
What is median? The MATLAB median function finds the middle value after sorting the list of numbers. That is, half of the values in the list are less than or equal to this value and half are greater than or equal to the value.
Why is the median useful? The median sometimes give a better representative value than the average does, particularly if the list has some extreme values.
How do I compute the median of a list of numbers by hand? Sort the values in increasing order and take the center value in the list. For a list containing an even number of values, take the average of the two middle values.
What does the 1 represent in the expression median(count, 1)? The second argument of the median function specifies the dimension over which to find the median similar to the way sum's second argument does.
What is std? The MATLAB std function finds the standard deviation of a list of values.
How do I compute the standard deviation of a list of numbers by hand? See the handout on statistical indicators for the formula.
What does the 1 represent in the expression std(count, 1)? This is tricky. The second parameter of the std function specifies type of standard deviation to compute, not the dimension. The function call std(count), which is equivalent to std(count, 0, 1), finds the unbiased estimator of the standard deviation of the columns of count. The function call std(count, 1), which is equivalent to std(count, 1, 1), finds the actual standard deviation of the columns of count. The difference is subtle and will be discussed in class.
How would I compute the ordinary standard deviation of the rows? Use std(count, 1, 2) to compute the ordinary standard deviation of the rows of count.
What happens if I use std(count, 2, 1)? The second argument of std must be either a 0, 1, or []. Other values give an error when the statement is executed.

EXAMPLE 9: Output the monthly and yearly maxima and minima of measles

   printList('Monthly maxima of measles', max(measles));
   printList('Monthly minima of measles', min(measles));
   printList('Yearly maxima of measles', max(measles, [], 2));
   printList('Yearly minima of measles', min(measles, [], 2));
Monthly maxima of measles: [ 6336 13226 25826 22741 8634 6253 1975 453 184 354 1050 2996 ]
Monthly minima of measles: [ 39 52 57 78 83 79 35 28 18 11 12 21 ]
Yearly maxima of measles: [ 7095 2537 9635 1414 6813 8792 3546 10018 969 2996 25826 557 5760 8498 358 6597 1682 6909 1008 5428 1915 8616 1122 10720 1865 6064 1949 7634 837 6780 1043 7875 1289 3338 1199 2349 83 494 1301 185 844 ]
Yearly minima of measles: [ 43 118 50 45 67 87 84 56 36 90 55 34 142 21 18 56 88 32 100 73 184 40 164 59 98 110 170 47 97 43 109 58 168 49 83 24 11 39 31 39 12 ]

Questions Answers
What does max(measles) do? The max(measles) function call results in a row vector containing the largest value from each column of measles.
Is max(measles) the same as max(measles, 1)? No, the second argument of the max function does not correspond to the dimension. When the second argument isn't empty, max does an element-by-element comparison between the corresponding first and second arguments. In the example, the result is the same size as measles but each element is the larger of the corresponding element in measles and the value 1.
Is max(measles) the same as max(measles, [], 1)? Yes, they are the same. Both result in a row vector that consists of the largest value from each column.
What does max(measles, [], 2) do? The function call results in a column vector containing the maximum values of each row.
What does [] mean? The notation [] means an empty array, that is an array with 0 rows and 0 columns.
What does min(measles) do? The function call results in a row vector containing the smallest value from each column of measles.
How do the min and max functions compare? Their arguments follow the same rules. The difference is that where min finds the smallest values, max finds the largest values.
Is the value -0.002 smaller than -1? No, while -0.002 has smaller magnitude or absolute value than -1, it is larger in the sense of comparison. That is -0.002 is to the right of -1 on the number line.

EXAMPLE 10: Output the overall maximum and minimum monthly measles cases

   fprintf('Measles: max = %g, min = %g\n', max(measles(:)), min(measles(:)));
Measles: max = 25826, min = 11

Questions Answers
What does max(measles(:)) do? Since measles(:) forms a single column, max(measles(:)) finds the largest value in the entire array.
What does min(measles(:)) do? Since measles(:) forms a single column, min(measles(:)) finds the smallest value in the entire array.
Do function calls have to be on a separate line? No, as this example illustrates, function calls result in return values that can be used within other expression in the same way as variables.
Do functions always return a single value? No, they can determine any number of values depending on how the function output arguments are defined. Our printList function did not return any values. The max and min functions each return a single value, but MATLAB has many functions that return multiple items.
How many % specifications can I use in a format string? Use as many as are needed to output the variables. Recall that you need a % specification for each variable. If the format string doesn't have enough, MATLAB reuses the format string from the beginning.

These lesson questions were written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions.