CS 1173 Data Analysis and Visualization

 

Why data analysis?

Is data making the scientific method obsolete?:
In the TEDMED talk http://www.youtube.com/watch?v=dtNMA46YgX4 (14:06 min)
Atul Butte addresses this question and talks about the profound changes that data is making in medical research.
(See also the longer NIH lecture http://www.youtube.com/watch?v=o4KNG7nd938 entitled Translational Bioinformatics: Transforming 300 Billion Points of Data)

 

The big-data revolution in health care:
In the TED talk http://www.youtube.com/watch?v=Mb8x6vLcggc (16:18 min)
Joel Selanikio, founder of Magpi, talks about how a simple marriage of technology and data can profoundly change healthcare in developing countries.

 

Lesson summary

Recommended strategies:

You should attempt each lesson (and its questions) and watch the corresponding videos before we cover the lesson in class. You can make notes while watching the videos. During class we will work on some of the exercises with the goal of helping you learn to solve problems and program independently.

 

Quizzes (Hybrid sections only):

The quizzes are short automatically graded assessments administered on Blackboard through the learning modules. You can retake these as often as you wish and the highest grade counts. To be effective in the class, you should do the learning modules before the lesson is covered in class. These contribute to your attendance grade.

 

Sleep diary

Starting August 22, 2018 you will record some information about your sleep patterns for 21 days (HW4) You will use the data you gathered for laboratories 2, 3, and 4. Validating your sleep diary data will be done during class. We will also consolidate and anonymize the data for all sections for various analyses during the course.

 

Alternative links to the videos

We have put several alternative links to the videos if you have trouble accessing them through the Learning Modules.

 

Topics:

-->
Lesson Title and links Analysis/visualization conceptsMATLAB/computing concepts Mathematics/statistics concepts Labs/projects
1 Getting started in MATLAB
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How do I start using MATLAB?

Supporting videos:
MATLAB workspace (3:17 mins)
    Media Library     UTSA   
Working with arrays and variables (6:22 mins)
    Media Library     UTSA    
Setting up a project (4:47 mins)
    Media Library     UTSA   
Using cells (3:22 mins)
    Media Library     UTSA   

Supporting handouts:
    Percentages
  • Arrays as tables with rows and columns
  • Plotting the columns of an array against the integers from 1:n
  • Requirements for a well-designed plot
  • MATLAB environment and windows (Command, History, Workspace, Editor)
  • Changing the current directory
  • Loading data
  • Creating and running MATLAB scripts
  • Using MATLAB cell mode
  • The plot command
  • The xlabel, ylabel, title, and legend commands
  • Defining variables
  • Representing complex structures (such as arrays) symbolically and working with them in equations.
Pretest (HW1)
Sleep diary
2 Working with line graphs
Lesson - or - Questions

FOCUS QUESTION: How do I display trends in data?

Supporting videos:
Line graphs in MATLAB (8:34 mins)
    Media Library     UTSA     Running an analysis (3:17 mins)
    Media Library     UTSA   
Labeling a graph (1:40 mins)
    Media Library     UTSA   

Supporting handouts:
    Array basics
  • Different ways to use line graphs
  • Setting explicit x-axis values using x-y plots
  • Using markers and colors to distinguish plots
  • Rescaling to make graphs more readable
  • Using colons to pick out rows and columns
  • Using element-wise division (./)
  • Using hold on and hold off to display multiple graphs on the same axis.
  • Performing arithmetic operations such as addition and multiplication on arrays
  • Using indexing to manipulate arrays
  • Concept of array dimension
  • Row and column operations
  • Writing an equation to calculate a quantity described in words

Lab 1
3 Introducing the sum function
Lesson - or - Questions

FOCUS QUESTION: How can I transform the data to give more meaningful results?

Supporting videos:
The MATLAB sum function (4:28 min):
    Media Library    UTSA     Transcript
MATLAB linear representation of arrays (1:30 min):
    Media Library     UTSA     Transcript
Transposing an array (2:55 min):
    Media Library     UTSA     Transcript

Supporting handouts:
MATLAB sum function
  • Plotting summary information rather than individual data points
  • Using the sum function to summarize the dataset
  • Plotting pie charts
  • Using colons to specify ranges and increments
  • Using the linear representation (:) of an array for reordering
  • The sum function for adding up rows or columns
  • The transpose operator (') for flipping an array
  • The pie command
  • Combining and scaling arrays and vectors
  • Array transpose
  • Working with ranges and subintervals
  • Applying functions that map a 2D array to a vector (mapping from one vector space to another)
  • Function composition
  • Word problems requiring multiple function transformations.
4 Bar charts
Lesson - or - Questions

FOCUS QUESTION: How can I show proportions and relative sizes of different data groups?

Supporting videos:
Bar chart basics in MATLAB (3:41 min)
    Media Library     UTSA    Transcript
Grouped and stacked bar charts MATLAB (3:02 min)
    Media Library     UTSA    Transcript
  • Bar charts for displaying both proportion and magnitude
  • Grouped or stacked bar charts for comparing multiple data sets
  • Scaling a data set to make the axes more understandable
  • Additional practice with the sum function
  • Using square brackets and commas to assemble an array
  • Additional examples of use of transpose and array assembly
  • The bar function for creating vertical and horizontal bar charts
  • The stack option of the bar function
  • Additional array manipulations
5 Basic stats
Lesson - or - Questions

FOCUS QUESTION: How can I find typical characteristics and central tendencies of data?

Supporting videos:
Comparing mean and median (3:44 min)
    Media Library     UTSA    Transcript
Basic statistics in MATLAB (3:30 min)
    Media Library     UTSA    Transcript
Array statistics in MATLAB (2:06 min)
    Media Library     UTSA    Transcript

Supporting handouts:
Statistical indicators
MATLAB max function
MATLAB mean function
MATLAB median function
MATLAB min function
  • Statistical indicators: mean, median, maximum and minimum
  • Outputing information about a data set
  • The mean, median, max, and min functions for expressing basic statistical characteristics
  • The fprintf functions for outputting data
  • Working with basic statistical indicators such as mean and median.
Lab 2
6 Error bars
Lesson - or - Questions

FOCUS QUESTION: How can I depict uncertainty and variability in data?

Supporting videos:
Measures of spread (Standard deviation, AAD, MAD, etc) (8:50 min)
    Media Library     UTSA   
Basic errorbars in MATLAB (4:14 min)
    Media Library     UTSA   
Alternative forms of errorbars in MATLAB (2:02 min)
    Media Library     UTSA   
Errorbars with unequal wings in MATLAB (3:50 min)
    Media Library     UTSA   

Supporting handouts:
MATLAB standard deviation function (std)
MATLAB reshape function
  • Using error bars to depict spread
  • Using error bars on bar charts
  • Comparisons of different measures of spread for a highly skewed data set
  • The errorbar function and its variations
  • Creating SD, MAD and IQR error bars
  • Using offsets to avoid overplotting.
  • Using gca to get and set axis properties
  • Interpretation of measures of spread (AAD MAD, SD, and IQR) as measures of error in using the mean to predict data.
  • Computing estimates of spread including AAD, MAD, SD, and IQR
7 Sampling
Lesson - or -

FOCUS QUESTION: Do the characteristics of a sample reflect an entire population?

Supporting videos:

Supporting handouts:
Populations and samples
Lesson 7 template (download and unzip)
Lesson 7 script only (download)
Midterm examination
8 Linear models, Scatter plots, curve fitting and correlation
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I determine whether two variables are related?

Supporting handouts:
Example of putting a best fit line on graph in a script

Supporting videos:
Straight lines are handy tools (4:40 min)
    Media Library     UTSA   
Linear models (6:45 min)
    Media Library     UTSA   
Correlation in MATLAB (1:30 min)
    Media Library     UTSA   
Scatter plots and linear fits in MATLAB (5:40 min)
    Media Library     UTSA   
Summary of modeling in MATLAB(:51 min)
    Media Library     UTSA   

Supporting handouts:
  • Computing the correlation between two data sets
  • Comparing two data sets by plotting them against each other in a scatter plot
  • Computing the best fit line
  • Evaluating the RMS (root mean squared) error between predictions and actual data
  • Adding a linear fit line to a scatter plot using the MATLAB plottools
  • Constructing strings for plot annotation
  • Using xlabel, ylabel, and title to directly annotate a plot
  • The corr function for computing correlations
  • The polyfit function for fitting a polynomial to data
  • The polyval function for evaluating a polynomial at an array of points
  • How is correlation computed?
  • Correlation does not imply causality
9 Histograms
Lesson (pdf)     Questions (pdf)    

FOCUS QUESTION: How can I show proportions and relative sizes of different data groups?

Supporting videos:
Histogram definition (1:57 min)
    Media Library     UTSA   
Histograms with continuous data (2:11 min)
    Media Library     UTSA   
Picking the number of histogram bins (3:21 min)
    Media Library     UTSA   
Reading a histogram (1:50 min)
    Media Library     UTSA   
Histogram features (1:16 min)
    Media Library     UTSA   
Percentages versus counts (3:23 min)
    Media Library     UTSA   
Comparing histograms (3:23 min)
    Media Library     UTSA   

Supporting handouts:
Lesson 9 template (download and unzip) Lesson 9 script only (download)
  • Using histograms to convey distribution characteristics
  • Comparing the characteristics of common distributions (normal, uniform and exponential)
  • Scaling histograms to show the fraction of values rather than the number of values
  • The hist function for computing frequency tables
  • The stairs function for displaying a stair plot
  • Setting the number of bins and bin positions for a histogram
  • The random function for generating pseudo-random values from a specified distribution
  • Concept of distribution
  • First look at commonly used distributions: normal, uniform, and exponential
10 Vector logic for specializing plots
Lesson (pdf)     Questions (pdf)    

FOCUS QUESTION: How can I extract the rows and columns of an array based on data characteristics?

Supporting videos:
Media Library     Logical arrays and indexing (7:52 min)

Supporting handouts:
Lesson 10 template (download and unzip)
Lesson 10 script only (download)
Note: the data for this lesson can be found on Blackboard in the Addl Information section.
  • Using logical operators to pick out subsets of the data
  • Using relational operators to compare data and set ranges
  • Using logical operators & (and), | (or), ~ (not) to express conditions on the data
  • Using vector indexing (logical vectors as array indexes) to select rows or columns
  • Using relational operators < (less than), <= (less than or equal), > (greater than), >= (greater than or =), == (equal), and ~= (not equal) to compare data values.
  • Logical and relational expressions
HW6
11 Hypothesis testing
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I tell whether the test group is different from the control group?

Supporting videos (narrated by Mark Doderer):

Hypothesis testing basics(10:50 min)
    Media Library UTSA   
One sample testing in MATLAB (ttest) (7:18 min)
    Media Library UTSA   
Two sample testing in MATLAB (ttest2) (5:58 min)
    Media Library UTSA   
More on sampling and confidence intervals (3:11 min)
    Media Library UTSA

Supporting handouts:
Lesson 11 template (download and unzip)
Lesson 11 script only (download)
  • Formulating a testable hypothesis
  • One-sided and two-sided hypothesis tests
  • Understanding significance levels and p-values
  • The ttest for testing population mean
  • The ttest2 for comparing population means
  • Using p-values and confidence intervals to obtain additional detail
12 Box plots
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION: How can I compare the distributions of data sets that have outliers?

Supporting videos:

Supporting handouts:
Box plots
Lesson 12 template (download and unzip)
Lesson 12 script only (download)
  • Comparing distributions using box plots
  • Computing relative data set sizes
  • Observing medians and IQRs
  • The boxplot function for showing distributions
  • Using labeled data in box plots
  • Other variations of the box plots
  • The repmat function
  • Distributions and outliers
  • Percentiles
  • Interquartile range
13 Program control
Lesson (pdf)     Questions (pdf)

FOCUS QUESTION:How can I adapt code for different situations based on data?

Supporting videos:
If Construct (2:21 min)
    Media Library If Construct (2:21 min)
For Loops (5:57min)
    Media Library     For Loops (5:57min)

Supporting handouts:
Lesson 13 template (download and unzip)
Lesson 13 script only (download)
  • Relational expressions
  • Selection (if-else)
  • Loops (for)
14 Rates of change
Lesson (pdf)     toHeel.txt     toRump.txt     Questions (pdf)

FOCUS QUESTION: How can I characterize rates of change?

Supporting videos:

Supporting handouts:
MATLAB diff function
Lesson 14 template (download and unzip)
Lesson 14 script only (download)
  • Calculating slope, rate of change, or derivative of data
  • Displaying the slope over plotted with the function to emphasize features
  • Plotting data in multiple ranges on the same graph
  • Using gridlines to facilitate reading the graph
  • Calculating per capita growth rates for comparison in different populations
  • Calculating interval midpoints by averaging lower ends with the upper ends
  • The diff function for computing adjacent differences
  • Using a ratio of diff functions to approximate the slope
  • Using end notation in array index selection
  • Dividing a ratio of diff functions by the population to find the growth rate per capita or rate of change per capita
  • Practicing with concepts of previous lessons
  • Slope of a line to measure rate of change between two points.
  • Idea that the slope of the secant line approaches the slope of a curve in the limit
  • Handling of discontinuities
  • Rate of change with respect to time
  • Rate of change with respect to another variable
  • Percentage change
  • Growth rate per capita
15 Logarithmic scales
Lesson (pdf)     Questions (pdf)

Data file: WorldPopulation.csv

FOCUS QUESTION: How can I use logarithmic scales to understand rates of growth?

Supporting videos:

Supporting handouts:
Lesson 15 template (download and unzip)
Lesson 15 script only (download)
  • Calculating slope of data expressed on linear and logarithmic scales
  • Plotting data on various types of logarithmic axes
  • Examining rates of growth
  • Performing a linear fit and finding the R2 to characterize the quality of the fit.
  • Finding per capita growth rates
  • The semilogx and semilogy functions
  • The loglog function
  • Examining rates of growth

This course summary was last modified August 31, 2017.