LESSON 5 QUESTIONS: Basic statistical indicators
FOCUS QUESTION: How can I depict typical characteristics and central tendancies of data?
Contents
- EXAMPLE 1: Load the data about New York contagious diseases
- EXAMPLE 2: Calculate overall average monthly measles cases
- EXAMPLE 3: Output the overall monthly average number of measles cases
- EXAMPLE 4: Calculate the individual monthly and yearly averages of measles
- EXAMPLE 5: Output the individual monthly averages of measles
- EXAMPLE 6: Create your own printList function
- EXAMPLE 7: Output the monthly averages of measles by calling printList
- EXAMPLE 8: Output the monthly averages and medians of mumps
- EXAMPLE 9: Output the monthly and yearly maxima and minima of measles
- EXAMPLE 10: Output the overall maximum and minimum monthly measles cases
EXAMPLE 1: Load the data about New York contagious diseases
load NYCDiseases.mat; % Load the disease data
EXAMPLE 2: Calculate overall average monthly measles cases
measlesAverage = mean(measles(:)); % Average of entire array
| Questions | Answers |
What is measles(:)? |
This is the linear representation of the measles
array. measles(:) treats measles as though
all of its columns were laid end to end to form a single column.
|
What is mean? |
mean is a MATLAB function that returns an
average based on the value of its arguments.
When its argument is a single row or column, the
result is the average of all of the values in the array.
|
| What are the arguments of a function? | The arguments (i.e., input arguments) are the values passed to the function so that it perform its computation. These values appear in parantheses after the function name. |
What does the value 1 represent in the expression
mean(count, 1)? |
The second argument
of the mean function specifies the dimension
over which to find the
average similar to the way sum's second
argument does. |
| How could I compute the average of a list of numbers by hand? | Add up all of the numbers and divide by the number of values. |
EXAMPLE 3: Output the overall monthly average number of measles cases
fprintf('Average measles cases per month: %g\n', measlesAverage);
Average measles cases per month: 1418.59
| Questions | Answers |
What does fprintf do? |
The fprintf function
in the example outputs to the
Command Window. The fprintf function can also output to a
file.
|
What does the first argument of
fprintf represent? |
The first argument is the format string, which controls
the appearance of the output. This string may contain
a number of format specifications for outputting variable values.
Each format specification starts with a %.
|
What is %g? |
The %g is a format specification or rule for
outputting a numeric value. Use the %s
format specification when outputting a string.
|
What does \n mean? |
The \n in a format string
causes the output to start on a new line.
|
EXAMPLE 4: Calculate the individual monthly and yearly averages of measles
measlesMonthlyAverages = mean(measles); % Average the columns measlesYearlyAverages = mean(measles, 2); % Average the rows
| Questions | Answers |
What is the difference between mean(measles)
and mean(measles, 2)? |
The
mean(measles) function call is equivalent to
mean(measles, 1). This call computes the column averages of
the measles array. The MATLAB function call
mean(measles) computes the
row averages of the measles array.
|
EXAMPLE 5: Output the individual monthly averages of measles
fprintf('Monthly averages of measles: [ '); % Output a leading string fprintf('%g ', measlesMonthlyAverages); % Output each element fprintf(']\n'); % Output ending ] and newline
Monthly averages of measles: [ 940.39 1816.39 3428.2 3855.12 3159.73 2100.54 696.122 192.195 80.0732 100.854 193.976 459.537 ]
| Questions | Answers |
What does the first fprintf print? |
The statement has no variables to display and the format string
has no format specifiers. Therefore this fprintf
just displays the format string.
|
What does the second fprintf print? |
MATLAB needs a format specification for each element of the
measlesMonthlyAverages array.
When the format specifications run out, MATLAB starts from the
beginning of the entire format string.
Thus, MATLAB prints each element of the array followed by a blank.
|
If the array x has a single row of 10 values
what is the output of fprintf(X = %g\n', x)? |
The format string only specifies how to print one value,
so MATLAB reuses the format string in its entirety for each of the 10 values. The
result will be ten lines each containing X = followed
by one of the values in x.
|
If x is an array with 2 rows and 3 columns,
in what order does the statement fprintf(X = %g\n', x)
output the values of x? |
MATLAB uses the order of the array's linear
representation. That is, it prints
the values in the first column of x,
followed by those in the second column, and then the third.
|
EXAMPLE 6: Create your own printList function
| Questions | Answers |
| Why are functions so important in data analysis? | Functions allow you to generalize the code you write so that you can apply the same sequence of commands to other data. |
What are title and list in
the definition of printList? |
The title and list variables are called
the parameters or input arguments of the
printList function.
|
How do title and list
get their values? |
In order to use printList, you must use the function
name followed by two values in parentheses. These values are called
the arguments of the function. MATLAB assigns their values
to title and list while evaluating
printList. For
example, you could use printList('X = ', x) to output
the variable x.
|
| What determines the order of a function's arguments? | The order is determined by the order that they appear in the function definition. |
EXAMPLE 7: Output the monthly averages of measles by calling printList
printList('Monthly averages of measles', measlesMonthlyAverages);
Monthly averages of measles: [ 940.39 1816.39 3428.2 3855.12 3159.73 2100.54 696.122 192.195 80.0732 100.854 193.976 459.537 ]
| Questions | Answers |
What would printList('', measlesMonthlyAverages) do? |
Since '' denotes the empty string,
printList would display the list of comma separated
values in square brackets as before. The output would be preceded by a colon.
|
Would
printList('Monthly averages of mumps', measlesMonthlyAverages) cause an error? |
MATLAB can only detect errors in syntax. It does not look at the meaning of what you write. Although the above message misidentifies the values, MATLAB would have no problem executing this statement. |
EXAMPLE 8: Output the monthly averages and medians of mumps
printList('Monthly averages of mumps', mean(mumps)); printList('Monthly medians of mumps', median(mumps)); printList('Monthly standard deviations of mumps', std(mumps, 1));
Monthly averages of mumps: [ 492.146 574.049 842.951 912.073 879.805 786.268 453.585 229.463 146.463 152.878 218.61 338.585 ] Monthly medians of mumps: [ 412 490 745 838 799 717 437 211 145 137 201 284 ] Monthly standard deviations of mumps: [ 243.269 301.026 437.967 423.423 377.872 299.881 188.211 95.5733 51.0715 50.3729 85.937 163.738 ]
| Questions | Answers |
What is median? |
The MATLAB median function finds the
middle value after sorting the list of numbers. That is, half of the values
in the list are less than or equal to this value and half are
greater than or equal to the value.
|
| Why is the median useful? | The median sometimes give a better representative value than the average does, particularly if the list has some extreme values. |
| How do I compute the median of a list of numbers by hand? | Sort the values in increasing order and take the center value in the list. For a list containing an even number of values, take the average of the two middle values. |
What does the 1 represent in the expression
median(count, 1)? |
The second argument
of the median function specifies the dimension
over which to find the
median similar to the way sum's second
argument does. |
What is std? |
The MATLAB std function finds the
standard deviation of a list of values.
|
| How do I compute the standard deviation of a list of numbers by hand? | See the handout on statistical indicators for the formula. |
What does the 1 represent in the expression
std(count, 1)? |
This is tricky. The second parameter
of the std function specifies type of
standard deviation to compute, not the dimension. The function call
std(count), which is equivalent to
std(count, 0, 1), finds the unbiased
estimator of the standard deviation of the columns
of count. The function call std(count, 1), which is
equivalent to std(count, 1, 1), finds the actual
standard deviation of the columns of count. The difference is
subtle and will be discussed in class. |
| How would I compute the ordinary standard deviation of the rows? | Use std(count, 1, 2) to compute the ordinary
standard deviation of the rows of count. |
What happens if I use std(count, 2, 1)? |
The second argument of std must be either a
0, 1, or [].
Other values give an error when the statement is
executed. |
EXAMPLE 9: Output the monthly and yearly maxima and minima of measles
printList('Monthly maxima of measles', max(measles)); printList('Monthly minima of measles', min(measles)); printList('Yearly maxima of measles', max(measles, [], 2)); printList('Yearly minima of measles', min(measles, [], 2));
Monthly maxima of measles: [ 6336 13226 25826 22741 8634 6253 1975 453 184 354 1050 2996 ] Monthly minima of measles: [ 39 52 57 78 83 79 35 28 18 11 12 21 ] Yearly maxima of measles: [ 7095 2537 9635 1414 6813 8792 3546 10018 969 2996 25826 557 5760 8498 358 6597 1682 6909 1008 5428 1915 8616 1122 10720 1865 6064 1949 7634 837 6780 1043 7875 1289 3338 1199 2349 83 494 1301 185 844 ] Yearly minima of measles: [ 43 118 50 45 67 87 84 56 36 90 55 34 142 21 18 56 88 32 100 73 184 40 164 59 98 110 170 47 97 43 109 58 168 49 83 24 11 39 31 39 12 ]
| Questions | Answers |
What does max(measles) do? |
The max(measles) function call results in a row vector containing the
largest value from each column of measles.
|
Is max(measles) the same as
max(measles, 1)? |
No, the second argument of the max
function does not correspond to the dimension. When the
second argument isn't empty, max does an element-by-element
comparison between the corresponding first and second
arguments. In the example, the result is the same size as
measles but each element is the larger of the
corresponding element in measles and the value 1.
|
Is max(measles) the same as
max(measles, [], 1)? |
Yes, they are the same. Both result in a row vector that consists of the largest value from each column. |
What does max(measles, [], 2) do? |
The function call results in a column vector containing the maximum values of each row. |
What does [] mean? |
The notation [] means an empty array, that is
an array with 0 rows and 0 columns.
|
What does min(measles) do? |
The function call results in a row vector containing the
smallest value from each column of measles.
|
How do the min and max
functions compare? |
Their arguments follow the same rules. The difference is
that where min finds the smallest values, max
finds the largest values.
|
Is the value -0.002 smaller than -1? |
No, while -0.002 has smaller magnitude or
absolute value than -1, it is larger in the sense of
comparison. That is -0.002 is to the right of
-1 on the number line.
|
EXAMPLE 10: Output the overall maximum and minimum monthly measles cases
fprintf('Measles: max = %g, min = %g\n', max(measles(:)), min(measles(:)));
Measles: max = 25826, min = 11
| Questions | Answers |
What does max(measles(:)) do? |
Since measles(:) forms a single column,
max(measles(:)) finds the largest value in the
entire array.
|
What does min(measles(:)) do? |
Since measles(:) forms a single column,
min(measles(:)) finds the smallest value in the
entire array.
|
| Do function calls have to be on a separate line? | No, as this example illustrates, function calls result in return values that can be used within other expression in the same way as variables. |
| Do functions always return a single value? | No, they can determine any number of values depending on how the
function output arguments are defined.
Our printList function did not return any
values. The max and min functions each
return a single value, but MATLAB has many functions that return
multiple items.
|
How many % specifications can I use
in a format string? |
Use as many as are needed to output the variables. Recall that you
need a % specification for each variable. If
the format string doesn't have enough, MATLAB
reuses the format string from the beginning.
|
These lesson questions were written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions.