CS 1173 Data Analysis and Visualization Using MATLAB
Lesson summary

Recommended strategies:

You should print each lesson (and its questions) and watch the corresponding videos before we cover the lesson in class. You can make notes on the lesson printout while watching the videos. During class we will work on some of the exercises with the goal of helping you learn to solve problems and program independently.



Please save the videos on your home machine for watching rather than downloading them each time you want to watch. We have limited server capacity and this will take a load off our systems. Thanks!


Quizzes (Hybrid sections only):

The quizzes are short automatically graded assessments administered on Blackboard through the learning modules. You can retake these as often as you wish and the highest grade counts. To be effective in the class, you should do the learning modules before the lesson is covered in class.


Sleep diary

Starting Thursday, January 11, 2018 you will record some information about your sleep patterns for 21 days (HW4) You will use the data you gathered for laboratories 2, 3 and 4. Validating your sleep diary data will be done during class. We will also consolidate and anonymize the data for all sections for various analyses during the course.


Alternative links to the videos

We have put several alternative links to the videos if you have trouble accessing them through the Learning Modules. The most efficient way to access the videos is to right click on the UTSA-mp4 link and save the .mp4 locally to your machine. Then you can play it anytime without needing network access. We recommend this. We have also staged them on a server at UTSA with a player (the UTSA link).


Why data analysis?

Is data making the scientific method obsolete?:
In the TEDMED talk http://www.youtube.com/watch?v=dtNMA46YgX4 (14:06 min)
Atul Butte addresses this question and talks about the profound changes that data is making in medical research.
(See also the longer NIH lecture http://www.youtube.com/watch?v=o4KNG7nd938 entitled Translational Bioinformatics: Transforming 300 Billion Points of Data)


The big-data revolution in health care:
In the TED talk http://www.youtube.com/watch?v=Mb8x6vLcggc (16:18 min)
Joel Selanikio, founder of Magpi, talks about how a simple marriage of technology and data can profoundly change healthcare in developing countries.



Lesson Title and links Analysis/visualization conceptsMATLAB/computing concepts Mathematics/statistics concepts Labs/projects
1 Getting started in MATLAB
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How do I start using MATLAB?

Supporting videos:
Running an analysis (3:17 mins)
    UTSA    UTSA-mp4
Labeling a graph (1:40 mins)
    UTSA    UTSA-mp4
Using cells (3:22 mins)
    UTSA    UTSA-mp4

Supporting handouts:
  • Arrays as tables with rows and columns
  • Plotting the columns of an array against the integers from 1:n
  • Requirements for a well-designed plot
  • MATLAB environment and windows (Command, History, Workspace, Editor)
  • Changing the current directory
  • Loading data
  • Creating and running MATLAB scripts
  • Using MATLAB cell mode
  • The plot command
  • The xlabel, ylabel, title, and legend commands
  • Defining variables
  • Representing complex structures (such as arrays) symbolically and working with them in equations.
Pretest (HW1)
Sleep diary
2 Working with line graphs
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How do I display trends in data?

Supporting videos:
MATLAB workspace (3:17 mins)
    UTSA    UTSA-mp4
Setting up a project (4:47 mins)
    UTSA    UTSA-mp4
Working with arrays and variables (6:22 mins)
    UTSA     UTSA-mp4
Line graphs in MATLAB (8:34 mins)
    UTSA     UTSA-mp4

Supporting handouts:
    Array basics
  • Different ways to use line graphs
  • Setting explicit x-axis values using x-y plots
  • Using markers and colors to distinguish plots
  • Rescaling to make graphs more readable
  • Using colons to pick out rows and columns
  • Using element-wise division (./)
  • Using hold on and hold off to display multiple graphs on the same axis.
  • Performing arithmetic operations such as addition and multiplication on arrays
  • Using indexing to manipulate arrays
  • Concept of array dimension
  • Row and column operations
  • Writing an equation to calculate a quantity described in words

Lab 1
3 Introducing the sum function
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I transform the data to give more meaningful results?

Supporting videos:
The MATLAB sum function (4:28 min):
    UTSA     UTSA-mp4     Transcript
MATLAB linear representation of arrays (1:30 min):
    UTSA     UTSA-mp4     Transcript
Transposing an array (2:55 min):
    UTSA     UTSA-mp4     Transpose

Supporting handouts:
MATLAB sum function
  • Plotting summary information rather than individual data points
  • Using the sum function to summarize the dataset
  • Plotting pie charts
  • Using colons to specify ranges and increments
  • Using the linear representation (:) of an array for reordering
  • The sum function for adding up rows or columns
  • The transpose operator (') for flipping an array
  • The pie command
  • Combining and scaling arrays and vectors
  • Array transpose
  • Working with ranges and subintervals
  • Applying functions that map a 2D array to a vector (mapping from one vector space to another)
  • Function composition
  • Word problems requiring multiple function transformations.
4 Bar charts
Lesson (pdf)     Questions (pdf)    

FOCUS QUESTION: How can I show proportions and relative sizes of different data groups?

Supporting videos:
Bar chart basics in MATLAB (3:41 min)
    UTSA    UTSA-mp4    Transcript
Grouped and stacked bar charts MATLAB (3:02 min)
    UTSA    UTSA-mp4    Transcript
  • Bar charts for displaying both proportion and magnitude
  • Grouped or stacked bar charts for comparing multiple data sets
  • Scaling a data set to make the axes more understandable
  • Additional practice with the sum function
  • Using square brackets and commas to assemble an array
  • Additional examples of use of transpose and array assembly
  • The bar function for creating vertical and horizontal bar charts
  • The stack option of the bar function
  • Additional array manipulations
5 Basic stats
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I find typical characteristics and central tendencies of data?

Supporting videos:
Comparing mean and median (3:44 min)
    UTSA    UTSA-mp4    Transcript
Basic statistics in MATLAB (3:30 min)
    UTSA    UTSA-mp4    Transcript
Array statistics in MATLAB (2:06 min)
    UTSA    UTSA-mp4    Transcript

Supporting handouts:
Statistical indicators
MATLAB max function
MATLAB mean function
MATLAB median function
MATLAB min function
  • Statistical indicators: mean, median, maximum and minimum
  • Outputing information about a data set
  • The mean, median, max, and min functions for expressing basic statistical characteristics
  • The fprintf functions for outputting data
  • Working with basic statistical indicators such as mean and median.
Lab 2
6 Error bars
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I depict uncertainty and variability in data?

Supporting videos:
Measures of spread (Standard deviation, AAD, MAD, etc) (8:50 min)
    UTSA    UTSA-mp4
Basic errorbars in MATLAB (4:14 min)
    UTSA    UTSA-mp4
Alternative forms of errorbars in MATLAB (2:02 min)
    UTSA    UTSA-mp4
Errorbars with unequal wings in MATLAB (3:50 min)
    UTSA    UTSA-mp4

Supporting handouts:
MATLAB standard deviation function (std)
MATLAB reshape function
  • Using error bars to depict spread
  • Using error bars on bar charts
  • Comparisons of different measures of spread for a highly skewed data set
  • The errorbar function and its variations
  • Creating SD, MAD and IQR error bars
  • Using offsets to avoid overplotting.
  • Using gca to get and set axis properties
  • Interpretation of measures of spread (AAD MAD, SD, and IQR) as measures of error in using the mean to predict data.
  • Computing estimates of spread including AAD, MAD, SD, and IQR
7 Sampling
Lesson (pdf)    

FOCUS QUESTION: Do the characteristics of a sample reflect an entire population?

Supporting videos:

Supporting handouts:
Populations and samples
Lesson 7 template (download and unzip)
Lesson 7 script only (download)
Midterm examination
8 Linear models, Scatter plots, curve fitting and correlation
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I determine whether two variables are related?

Supporting handouts:
Example of putting a best fit line on graph in a script

Supporting videos:
Straight lines are handy tools (4:40 min)
    UTSA    UTSA-mp4
Linear models (6:45 min)
    UTSA    UTSA-mp4
Correlation in MATLAB (1:30 min)
    UTSA    UTSA-mp4
Scatter plots and linear fits in MATLAB (5:40 min)
    UTSA    UTSA-mp4
Summary of modeling in MATLAB(:51 min)
    UTSA    UTSA-mp4

Supporting handouts:
  • Computing the correlation between two data sets
  • Comparing two data sets by plotting them against each other in a scatter plot
  • Computing the best fit line
  • Evaluating the RMS (root mean squared) error between predictions and actual data
  • Adding a linear fit line to a scatter plot using the MATLAB plottools
  • Constructing strings for plot annotation
  • Using xlabel, ylabel, and title to directly annotate a plot
  • The corr function for computing correlations
  • The polyfit function for fitting a polynomial to data
  • The polyval function for evaluating a polynomial at an array of points
  • How is correlation computed?
  • Correlation does not imply causality
9 Histograms
Lesson (pdf)     Questions (pdf)    

FOCUS QUESTION: How can I show proportions and relative sizes of different data groups?

Supporting videos:
Histogram definition (1:57 min)
    UTSA    UTSA-mp4
Histograms with continuous data (2:11 min)
    UTSA    UTSA-mp4
Picking the number of histogram bins (3:21 min)
    UTSA    UTSA-mp4
Reading a histogram (1:50 min)
    UTSA    UTSA-mp4
Histogram features (1:16 min)
    UTSA    UTSA-mp4
Percentages versus counts (3:23 min)
    UTSA    UTSA-mp4
Comparing histograms (3:23 min)
    UTSA    UTSA-mp4

Supporting handouts:
Lesson 9 template (download and unzip) Lesson 9 script only (download)
  • Using histograms to convey distribution characteristics
  • Comparing the characteristics of common distributions (normal, uniform and exponential)
  • Scaling histograms to show the fraction of values rather than the number of values
  • The hist function for computing frequency tables
  • The stairs function for displaying a stair plot
  • Setting the number of bins and bin positions for a histogram
  • The random function for generating pseudo-random values from a specified distribution
  • Concept of distribution
  • First look at commonly used distributions: normal, uniform, and exponential
10 Vector logic for specializing plots
Lesson (pdf)     Questions (pdf)    

FOCUS QUESTION: How can I extract the rows and columns of an array based on data characteristics?

Supporting videos:
Logical arrays and indexing (7:52 min)

Supporting handouts:
Lesson 10 template (download and unzip)
Lesson 10 script only (download)
Note: the data for this lesson can be found on Blackboard in the Addl Information section.
  • Using logical operators to pick out subsets of the data
  • Using relational operators to compare data and set ranges
  • Using logical operators & (and), | (or), ~ (not) to express conditions on the data
  • Using vector indexing (logical vectors as array indexes) to select rows or columns
  • Using relational operators < (less than), <= (less than or equal), > (greater than), >= (greater than or =), == (equal), and ~= (not equal) to compare data values.
  • Logical and relational expressions
11 Hypothesis testing
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I tell whether the test group is different from the control group?

Supporting videos (narrated by Mark Doderer):

Hypothesis testing basics(10:50 min)
    UTSA    UTSA-mp4
One sample testing in MATLAB (ttest) (7:18 min)
    UTSA    UTSA-mp4
Two sample testing in MATLAB (ttest2) (5:58 min)
    UTSA    UTSA-mp4
More on sampling and confidence intervals (3:11 min)

Supporting handouts:
Lesson 11 template (download and unzip)
Lesson 11 script only (download)

Soft chalk Lesson:

  • Formulating a testable hypothesis
  • One-sided and two-sided hypothesis tests
  • Understanding significance levels and p-values
  • The ttest for testing population mean
  • The ttest2 for comparing population means
  • Using p-values and confidence intervals to obtain additional detail
12 Box plots
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I compare the distributions of data sets that have outliers?

Supporting videos:

Supporting handouts:
Box plots
Lesson 12 template (download and unzip)
Lesson 12 script only (download)
  • Comparing distributions using box plots
  • Computing relative data set sizes
  • Observing medians and IQRs
  • The boxplot function for showing distributions
  • Using labeled data in box plots
  • Other variations of the box plots
  • The repmat function
  • Distributions and outliers
  • Percentiles
  • Interquartile range
13 Program control
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION:How can I adapt code for different situations based on data?

Supporting videos:
If Construct (2:21 min)
For Loops (5:57min)

Supporting handouts:
Lesson 13 template (download and unzip)
Lesson 13 script only (download)
  • Relational expressions
  • Selection (if-else)
  • Loops (for)
14 Rates of change
Lesson (pdf)     toHeel.txt     toRump.txt     Questions (pdf)

FOCUS QUESTION: How can I characterize rates of change?

Supporting videos:

Supporting handouts:
MATLAB diff function
Lesson 14 template (download and unzip)
Lesson 14 script only (download)
  • Calculating slope, rate of change, or derivative of data
  • Displaying the slope over plotted with the function to emphasize features
  • Plotting data in multiple ranges on the same graph
  • Using gridlines to facilitate reading the graph
  • Calculating per capita growth rates for comparison in different populations
  • Calculating interval midpoints by averaging lower ends with the upper ends
  • The diff function for computing adjacent differences
  • Using a ratio of diff functions to approximate the slope
  • Using end notation in array index selection
  • Dividing a ratio of diff functions by the population to find the growth rate per capita or rate of change per capita
  • Practicing with concepts of previous lessons
  • Slope of a line to measure rate of change between two points.
  • Idea that the slope of the secant line approaches the slope of a curve in the limit
  • Handling of discontinuities
  • Rate of change with respect to time
  • Rate of change with respect to another variable
  • Percentage change
  • Growth rate per capita
15 Logarithmic scales
Lesson (pdf)     Questions (pdf)

Data file: WorldPopulation.csv

FOCUS QUESTION: How can I use logarithmic scales to understand rates of growth?

Supporting videos:

Supporting handouts:
Lesson 15 template (download and unzip)
Lesson 15 script only (download)
  • Calculating slope of data expressed on linear and logarithmic scales
  • Plotting data on various types of logarithmic axes
  • Examining rates of growth
  • Performing a linear fit and finding the R2 to characterize the quality of the fit.
  • Finding per capita growth rates
  • The semilogx and semilogy functions
  • The loglog function
  • Examining rates of growth

This course summary was last modified August 31, 2017.