LESSON 8: Vector logic for extracting data

FOCUS QUESTION: How can I extract the rows and columns of an array based on data characteristics?

This lesson demonstrates how to use relational and logical operators to extract data for analysis.

In this lesson you will:
  • Use relational operators (>, <, >=, <=, ==, ~=) to pick out pieces of the data.
  • Use logical operators (&;, |, ~) to combine tests.
  • Extract two groups based on a condition.
  • Work with realistic data.
Noctural instrument costume jewellry

Contents


DATA FOR THIS LESSON

File Description
diaries.mat
  • The data set contains contains sleep diary data for a cohort in MATLAB variables.
  • The arrays have a column for each person.
  • The vectors have an element for each person.
  • The values in column n correspond to the same person as the value in position n of each vector.
  • The file contains the following variables:
    • bedTimes - array of bed times in decimal-date format.
    • dayCaffeine - array of daytime caffeine indicators.
    • gender - vector of male/female gender designators.
    • nightCaffeine - array of evening caffeine indicators.
    • section - vector of section indicators. The possible section numbers are 0, 1, 2, and 3. Section 0 contains only a single instructor. The remaining values correspond to course section numbers.
    • toSleepMinutes - an array of number of minutes to fall asleep.
    • useAlarm - array of alarm use indicators.
    • wakeTimes - array of wakeup times in decimal-date format.
  • The data was originally gathered by students taking CS 1173 in the fall 2009 semester and anonymized and randomized to be unidentifiable.
  • The first column of each array represents the instructor's values, the rest of the columns represent individual students.
  • Diaries were recorded for 21 days (from September 23, 2009 to October 13, 2009).

SETUP FOR LESSON 8


EXAMPLE 1: Load the consolidated sleep diary data

Create a new cell in which you type and execute:

    load diaries.mat;  % Load the sleep diaries

You should see 8 variables in the Workspace Browser:

NOTE: All of the times are represented as doubles, which are real numbers. The integer part of the time gives the number of days since a reference day (in our case Jan 1, 0 AD) and the fractional part gives time on the current day represented as a fraction of 24 hours. You can use the datestr function to find out what date and time this double corresponds to (e.g., datestr(x) gives a string with the readable form of the date and time corresponding to the value x).

In the space below, draw a picture of each of the 8 arrays and label the rows and columns.  


EXAMPLE 2: Calculate the number of students in section 3 (==)

Create a new cell in which you type and execute:

   sect3 = (section == 3);     % sect3 has 1's corresponding to section 3 students
   totalSect3 = sum(sect3);    % Add up the true's (1's) to find number of students
   fprintf('%g students in section 3\n', totalSect3);

You should see 2 variables in the Workspace Browser:

You should also see the following output in the Command Window:

52 students in section 3

In the space below: Define a variable called sect4 that is a logical vector of the same length as section and has 1's (true) in the entries corresponding to students in section 4.

Enter your definition in this cell and execute the cell to create this variable.


EXAMPLE 3: Calculate the average minutesToSleep of students in section 3 (indexing)

Create a new cell in which you type and execute:

   minutesSect3 = toSleepMinutes(:, sect3);  % Pick out columns of section 3 students
   meanMinutes3 = mean(minutesSect3(:));          % Find overall mean
   fprintf('Average minutes to sleep for section 3 students = %g\n', ...
       meanMinutes3);

You should see 2 variables in your Workspace Browser:

You should also see the following output in the Command Window:

Average minutes to sleep for section 3 students = 17.5321

In the space below: Enter your definitions in this cell and execute the cell to create these variables.


EXAMPLE 4: Calculate the number of women in the cohort (strcmp to compare strings)

Create a new cell in which you type and execute:

   women = strcmp(gender, 'female');  % women has 1's in positions corresponding to females
   totalWomen = sum(women);           % Add up the trues (1's) to find number of women
   fprintf('%g women in the cohort\n', totalWomen);

You should see 2 variables in your Workspace Browser:

You should also see the following output in the Command Window:

74 women in the cohort

In the space below: Define a variable called men that holds a logical vector with one's in the positions corresponding to male students.

Enter your definition in this cell and execute the cell to create the variables.


EXAMPLE 5: Calculate the % of women in the cohort

Create a new cell in which you type and execute:

   totalStudents = length(gender);  % gender has an entry for each student
   percentWomen = 100.*totalWomen./totalStudents;
   fprintf('%g%% of the students in the cohort are women\n', percentWomen);

You should see 2 variables in your Workspace Browser:

You should also see the following output in the Command Window:

51.3889% of the students in the cohort are women

In the space below: Define a variable called fractMen that the fraction of students in the cohert that are male.

Enter your definition in this cell and execute the cell to create the variables.


EXAMPLE 6: Calculate the number of women in section 3 ( & )

Create a new cell in which you type and execute:

   womenSect3 = women & sect3;        % 1's in positions of women in section 3
   totalWomen3 = sum(womenSect3);     % Add up the trues (1's)
   fprintf('%g women in section 3\n', totalWomen3);

You should see 2 variables in your Workspace Browser:

You should also see the following output in the Command Window:

30 women in section 3


EXAMPLE 7: Calculate the number of students in section 2 or in section 3 ( | )

Create a new cell in which you type and execute:

   sect2or3 = (section == 2) | (section == 3);  % sect2or3 has 1's for students in section 2 or 3
   total2or3 = sum(sect2or3);                   % Add up the trues (1's)
   fprintf('%g students in sections 2 and 3\n', total2or3);

You should see 2 variables in your Workspace Browser:

You should also see the following output in the Command Window:

98 students in sections 2 and 3


EXAMPLE 8: Calculate % of wakeups that used an alarm

Create a new cell in which you type and execute:

   totalAlarms = sum(useAlarm(:));              % Add up the trues (1's)
   [numDays, numDiaries] = size(bedTimes);      % How many rows and columns?
   totalEntries = numDays*numDiaries;           % Total number of entries
   percentAlarm = 100*totalAlarms/totalEntries; % Percentage of total entries
   fprintf('%g%% of the wake-ups used an alarm\n', percentAlarm);

You should see 5 varaibles in your Workspace Browser:

You should also see the following output in the Command Window:

66.2698% of the wake-ups used an alarm


EXAMPLE 9: Calculate the number of wakeups that were 7:30 am or later ( >= )

Create a new cell in which you type and execute:

   wakeupHours = (wakeTimes - floor(wakeTimes))*24; % Get fractional part of wakeTimes
   wakeGE730 = (wakeupHours >= 7.5);                % Which are >= 7:30 am?
   totalWakeGE730 = sum(wakeGE730(:));              % Number of wake-ups after 7:30 am.
   fprintf('%g wake-ups are 7:30 am or later\n', totalWakeGE730);

You should see 3 variables in your Workspace Browser:

You should also see the following output in the Command Window:

1971 wake-ups are 7:30 am or later


EXAMPLE 10: Calculate % of wakeups between 7:30 am and 9:45 am ( & )

Create a new cell in which you type and execute:

   wakeBetween = (7.5 <= wakeupHours) & (wakeupHours <= 9.75);  % & means both
   betweenPercent = 100*sum(wakeBetween(:))/totalEntries; % Percentage of total entries
   fprintf('%g%% of the wake-ups are between 7:30 am and 9:45 am\n', betweenPercent);

You should see 2 variables in your Workspace Browser"

You should also see the following output in the Command Window:

34.7222% of the wake-ups are between 7:30 am and 9:45 am


EXAMPLE 11: Calculate % of wakeups that are after 7:30 am or don't use an alarm ( | and ~)

Create a new cell in which you type and execute:

   orWakeups = (wakeupHours > 7.5) | ~useAlarm;    % | either one or both
   orPercent = 100*sum(orWakeups(:))/totalEntries; % Percentage of total entries
   fprintf(['%g%% of the wake-ups are either after 7:30 am ', ...
            'or without an alarm\n'], orPercent);

You should see 2 variables in your Workspace Browser:

You should also see the following output in the Command Window:

75.5622% of the wake-ups are either after 7:30 am or without an alarm


EXAMPLE 12: Find the subjects with the earliest average wakeup

Create a new cell in which you type and execute:

   averWakeup = mean(wakeupHours);     % Subject average wake up hour
   earliest = min(averWakeup);         % Earliest average wake up hour
   earliestSub = find(averWakeup == earliest); % Pick earliest subjects
   fprintf('Earliest average wakeup time: %g\nEarliest subject(s):', earliest);
   fprintf(' %g', earliestSub);        % Separate print in case more than 1
   fprintf('\n');                      % Start a new line

You should see 3 variables in your Workspace Browser:

You should also see the following output in the Command Window:

Earliest average wakeup time: 5.50243
Earliest subject(s): 91


EXAMPLE 13: Find number of bedtimes between 10:30 pm and 2:30 am (relative date)

Create a new cell in which you type and execute:

   bed = (bedTimes - floor(wakeTimes))*24;     % Hours relative to 0:00 of wake-up day
   bedBetween = (-1.5 <= bed) & (bed <= 2.5);  % & means both are true
   percentBetween = 100*sum(bedBetween(:))/totalEntries;   % Percentage of total entries
   fprintf('%g%% of the bedtimes were between 10:30 pm and 2:30 am\n', ...
       percentBetween);

You should see 3 variables in your Workspace Browser:

You should also see the following output in the Command Window:


57.3743% of the bedtimes were between 10:30 pm and 2:30 am

SUMMARY OF SYNTAX

MATLAB syntax Description
ind = find(x) returns the positions of the non-zero elements of x.
Y = floor(X) returns and array Y that is the same size as the array X. Each element of Y is the largest integer that is less than or equal to the corresponding element of X. For example, floor(3.5) is 3, while floor(-2) is -2.
n = length(A) returns the number of elements in the longest dimension of A.
Logical element-wise operators:
&, |, ~
are used to combine arrays based on the logical element-wise operators AND, OR, and NOT. The logical value true corresponds to the value 1, and the logical value false corresponds to the value 0. If you apply a logical element-wise operator to an array of numerical values rather than logical values, MATLAB first creates a logical array with true corresponding to the nonzero elements and false corresponding to the zero elements.
  • and: A & B is true only in positions where both A and B are true.
  • or: A | B is true in the positions where A or B or both are true.
  • not: ~A is true only in the positions where A is false.
Relational element-wise operators:
>, <, >=, <=, ==, ~=
are used to combine arrays based on a comparison of their values. The result of applying a relational operator is a logical array of the same size as the input operands. The result has true in the positions where the relationship is true and false elsewhere.
  • greater than: A > B
  • greater than or equal: A >= B
  • less than: A < B
  • less than or equal: A <= B
  • equal: A == B
  • not equal: A ~= B
[rows, col] = size(A) returns the number of elements in the first two dimensions of A.
strcmp(s1, s2) returns true if the string s1 is equal to the string s2 and false otherwise.
strcmp(s1, C) returns a logical array of the same size as the cell array of strings C. The result has true in the positions corresponding to the entries of C that match s1 and false elsewhere.


This lesson was written by Kay A. Robbins of the University of Texas at San Antonio and last modified on 31-Dec-2010. Please contact krobbins@cs.utsa.edu with comments or suggestions. The image is a photograph of a nocturnal instrument photographed by Michael Daly on 8/22/2009. The image is available on Wikipedia as http://en.wikipedia.org/wiki/Nocturnal_%28instrument%29.