LESSON 8: Vector logic for extracting data
FOCUS QUESTION: How can I extract the rows and columns of an array based on data characteristics?
Contents
- EXAMPLE 1: Load the consolidated sleep diary data
- EXAMPLE 2: Calculate the number of students in section 3 ( == )
- EXAMPLE 3: Calculate the average minutesToSleep of students in section 3 (indexing)
- EXAMPLE 4: Calculate the number of women in the cohort (use strcmp to compare strings)
- EXAMPLE 5: Calculate the % of women in the cohort
- EXAMPLE 6: Calculate the number of men in section 2 (use &)
- EXAMPLE 7: Calculate the number of students in section 2 or in section 3 (use |)
- EXAMPLE 8: Calculate % of wakeups that used an alarm
- EXAMPLE 9: Calculate the number of wakeups that were 7:30 am or later (use >= )
- EXAMPLE 10: Calculate % of wakeups between 7:30 am and 9:45 am ( &)
- EXAMPLE 11: Calculate % of wakeups that are after 7:30 am or don't use an alarm ( | and ~)
- EXAMPLE 12: Find the subjects with the earliest average wakeup
- EXAMPLE 13: Find number of bedtimes between 10:30 pm and 2:30 am (relative date)
EXAMPLE 1: Load the consolidated sleep diary data
load diaries.mat; % Load the sleep diaries
| Questions | Answers |
| Where do the variables come from when this file is loaded? | The file was created by saving variables from a MATLAB
workspace using the save command. When you load
this type of file, MATLAB recreates the saved variables along
with the values. |
| What is the .MAT format? | The .MAT format is a binary format that allows you to save an entire workspace or multiple variables, including complex structures in a single file. |
| What are the advantages of saving data in .MAT format? | The .MAT format efficiently stores variables and allows you to resume working in a workspace that you previously created. Thus, you don't have to reprocess data to put it in the form you need. |
| What are the disadvantages of saving data in .MAT format? | The .MAT format is proprietary, meaning that it belongs to Mathworks. Files stored in .MAT format are not recognized by most other applications. You cannot examine the contents of such a file using a text editor. |
EXAMPLE 2: Calculate the number of students in section 3 ( == )
sect3 = (section == 3); % sect3 has 1's corresponding to section 3 students totalSect3 = sum(sect3); % Add up the true's (1's) to find number of students fprintf('%g students in section 3\n', totalSect3);
| Questions | Answers |
My Workspace Browser indicates that sect3
is a logical array. What does that mean? |
Logical array element values are either
true or false. |
Why are sect3's values displayed
as 1 or 0 rather than true or
false? |
MATLAB represents the logical values true and
false by the 1 and 0,
respectively. You can use either representation. |
Why not just make sect3
be integer or double? |
Because sect3 is a logical array, you
know that its values will only be 1 (true)
or 0 (false) and not some other numerical value. |
| Can I do arithmetic on logical values? | Yes, you can use logical values in arithmetic expressions. MATLAB just converts logical values to 1's and 0's before doing the calculation. |
52 students in section 3
EXAMPLE 3: Calculate the average minutesToSleep of students in section 3 (indexing)
minutesSect3 = toSleepMinutes(:, sect3); % Pick out columns of section 3 students meanMinutes3 = mean(minutesSect3(:)); % Find overall mean fprintf('Average minutes to sleep for section 3 students = %g\n', ... meanMinutes3);
| Questions | Answers |
What is the purpose of using
sect3 as the column specifier of minutesToSleep? |
This type of specifier allows you to select rows and
columns based on a logical condition.
MATLAB picks out the columns of toSleepMinutes
corresponding to the positions where the specifier has 1's (true's). |
What is the size of minutesSect3 and why?
|
The minutesToSleep array has 21 rows and 144 columns.
The variable sect3 is a vector of length 144. (This
variable could not be used as an index vector for
minutesToSleep unless the sizes matched.) Since
sect3 has 52 ones corresponding to the 52 students in
section 3, minutesSect3 will have 21 rows and 52 columns.
|
Average minutes to sleep for section 3 students = 17.5321
EXAMPLE 4: Calculate the number of women in the cohort (use strcmp to compare strings)
women = strcmp(gender, 'female'); % women has 1's in positions corresponding to females totalWomen = sum(women); % Add up the trues (1's) to find number of women fprintf('%g women in the cohort\n', totalWomen);
| Questions | Answers |
What does strcmp(s, A) do? |
The strcmp function creates a logical vector
that is the same size as A. The result has
1's in the locations where A contains the
string s. The variable s contains
a single string, and the variable A is a cell
array of strings. |
Why is gender a cell array rather
than an array of char? |
Cell array elements can be of different lengths. We will almost always use cell arrays to represent arrays of strings. |
| How can I distinguish a cell array from an ordinary array? | Use braces ({ }) to designate cell arrays and
square brackets ([ ]) to designate ordinary arrays. |
74 women in the cohort
EXAMPLE 5: Calculate the % of women in the cohort
totalStudents = length(gender); % gender has an entry for each student percentWomen = 100.*totalWomen./totalStudents; % Add up all the trues (1's) fprintf('%g%% of the students in the cohort are women\n', percentWomen);
51.3889% of the students in the cohort are women
EXAMPLE 6: Calculate the number of men in section 2 (use &)
Create a new cell in which you type and execute:
womenSect3 = women & sect3; % 1's in positions of women in section 3 totalWomen3 = sum(womenSect3); % Add up the trues (1's) fprintf('%g women in section 3\n', totalWomen3);
30 women in section 3
| Questions | Answers |
What does A & B mean? |
The In the example, |
EXAMPLE 7: Calculate the number of students in section 2 or in section 3 (use |)
sect2or3 = (section == 2) | (section == 3); % sect2or3 has 1's for students in section 2 or 3 total2or3 = sum(sect2or3); % Add up the trues (1's) fprintf('%g students in sections 2 and 3\n', total2or3);
| Questions | Answers |
What does A | B mean? |
The In the example, |
98 students in sections 2 and 3
EXAMPLE 8: Calculate % of wakeups that used an alarm
totalAlarms = sum(useAlarm(:)); % Add up the trues (1's) [numDays, numDiaries] = size(bedTimes); % How many rows and columns? totalEntries = numDays*numDiaries; % Total number of entries percentAlarm = 100*totalAlarms/totalEntries; % Percentage of total entries fprintf('%g%% of the wake-ups used an alarm\n', percentAlarm);
| Questions | Answers |
Why use useAlarm(:) in the
calculation of totalAlarms |
We wanted to compute the total number of 1's in
useAlarm. The colon operator (:)
arranges the columns of useAlarm
into a single column. The result of sum(useAlarm(:))
is a single value. |
What is the result of sum(useAlarm)? |
The result is a row vector of 101 elements corresponding to
the column sums of useAlarm. |
How could I get the total of useAlarm
without using the linear representation (:)? |
You could apply the sum function twice:
sum(sum(useAlarm)). The inner sum
creates a vector of column sums. The outer sum
adds these column sums to find a single number. |
Why was percentAlarm calculated
using * and / instead of
.* and ./? |
Since totalAlarms and totalEntries
are just numbers rather than arrays, ordinary multiplication
and division work. |
Could I use
.* and ./ in the calculation
of percentAlarm? |
Yes, you can use .* and ./
in place of ordinary * and / as
along as the operands are just numbers (scalars) rather than
arrays. The * and / operators
have special meanings for matrix operands. |
66.2698% of the wake-ups used an alarm
EXAMPLE 9: Calculate the number of wakeups that were 7:30 am or later (use >= )
wakeupHours = (wakeTimes - floor(wakeTimes))*24; % Get fractional part of wakeTimes wakeGE730 = (wakeupHours >= 7.5); % Which are >= 7:30 am? totalWakeGE730 = sum(wakeGE730(:)); % Number of wake-ups after 7:30 am. fprintf('%g wake-ups are 7:30 am or later\n', totalWakeGE730);
1971 wake-ups are 7:30 am or later
| Questions | Answers |
What is the floor function? |
The floor function throws away the
fractional part of its operand. Since wakeTimes
is an array, the floor creates an array of
integers that is the same size as wakeTimes. |
Why multiply by 24 to compute wakeupHours? |
The expression wakeTimes - floor(wakeTimes) is
an array containing the wake-up times in units of fraction of a day.
Multiply by 24 to convert this expression to wake-up hour. |
What does A >= B mean? |
The result of A >= B is an array of 0's and
1's that is the same size as the arrays A and
B. The result has 1 in each entry where the corresponding
element of A is greater than or equal to B,
and 0 otherwise. Use A >= B to find the locations
of where the element of A at least as large as the
corresponding element of B. |
EXAMPLE 10: Calculate % of wakeups between 7:30 am and 9:45 am ( &)
wakeBetween = (7.5 <= wakeupHours) & (wakeupHours <= 9.75); % & means both betweenPercent = 100*sum(wakeBetween(:))/totalEntries; % Percentage of total entries fprintf('%g%% of the wake-ups are between 7:30 am and 9:45 am\n', betweenPercent);
34.7222% of the wake-ups are between 7:30 am and 9:45 am
| Questions | Answers |
What does A & B mean? |
The In the example, |
Why not just write 7.5 <= wakeupHours <= 9.75 to
designate the wake up hours between 7:30 and 9:45 am? |
Although this expression evaluates without an error, it does not
give the correct result. For example, 3 <= 4 <= 2 is
true. The reason is as follows.
The <= operator takes two
numerical values for comparison
and returns a logical value. In the example, 3 <= 4
is true. MATLAB converts the true to a 1 for the second
comparison. The second comparison then becomes 1 <= 2
which is true.
|
EXAMPLE 11: Calculate % of wakeups that are after 7:30 am or don't use an alarm ( | and ~)
Create a new cell in which you type and execute:
orWakeups = (wakeupHours > 7.5) | ~useAlarm; % | either one or both orPercent = 100*sum(orWakeups(:))/totalEntries; % Percentage of total entries fprintf(['%g%% of the wake-ups are either after 7:30 am ', ... 'or without an alarm\n'], orPercent);
| Questions | Answers |
What does ~A mean? |
The In the example, |
What does A | B mean? |
The In the example, |
75.5622% of the wake-ups are either after 7:30 am or without an alarm
EXAMPLE 12: Find the subjects with the earliest average wakeup
averWakeup = mean(wakeupHours); % Subject average wake up hour earliest = min(averWakeup); % Earliest average wake up hour earliestSub = find(averWakeup == earliest); % Pick earliest subjects fprintf('Earliest average wakeup time: %g\nEarliest subject(s):', earliest); fprintf(' %g', earliestSub); % Separate print in case more than 1 fprintf('\n'); % Start a new line
| Questions | Answers |
What good is find? |
The find function is useful when you
need to know the actual positions of the items being selected.
|
Earliest average wakeup time: 5.50243 Earliest subject(s): 91
EXAMPLE 13: Find number of bedtimes between 10:30 pm and 2:30 am (relative date)
bed = (bedTimes - floor(wakeTimes))*24; % Hours relative to 0:00 of wake-up day bedBetween = (-1.5 <= bed) & (bed <= 2.5); % & means both are true percentBetween = 100*sum(bedBetween(:))/totalEntries; % Percentage of total entries fprintf('%g%% of the bedtimes were between 10:30 pm and 2:30 am\n', ... percentBetween);
57.3743% of the bedtimes were between 10:30 pm and 2:30 am
| Questions | Answers |
Are the values of bed always positive? |
No, a negative value represents a bed-time before midnight. |
Why is bed calculated relative
to the wakeTimes rather than bedTimes? |
If you used
|
This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a photograph of a nocturnal instrument photographed by Michael Daly on 8/22/2009. The image is available on Wikipedia as http://en.wikipedia.org/wiki/Nocturnal_%28instrument%29.