CS 5163 (Introduction to Data Science)

News and Announcements

10/16: Pandas IO example code is uploaded. Download and run in ipython notebook. download data.

10/15: HW3 is available. Due on Oct 29th. data for Q2., data for Q3.

9/24: HW2 is available. Due on Oct 8th. Code and data for Q4 (43MB). (Alternatively, you can just download this csv file and use pandas to load it).

8/28: HW1 is available. Due on Sept 10th. Code Skeleton.

8/23: Welcome to CS5163 (Intro to Data Science)! Please take some time to complete a background survey.

Overview | Prerequisite | Time and Location | Instructor | Textbooks and Resources | Policies | Lecture Schedule and Slides | Assignments

Overview

This course covers the fundamentals of data science. Topics include data collection, preprocessing and transformation, visualization and exploratory analysis, and the mathematical and statistical foundations for data modeling, as well as introductions to data mining algorithms. Current programming language used is Python.

Prerequisite

This course is primarily designed for graduate students in the Computer Science department. Fundamental understanding of data structure, algorithms, and some knowledge of probability and statistics are expected. Students without background in Algorithms or Statistics should consult the instructor prior to taking the course.

Time and Location

We meet in room MH 3.04.10. Lecturers are Mon and Wed, 4:00-5:15 PM.

Instructor

Instructor: Dr. Jianhua Ruan
Office location: NPB 3.318
Office hours: Wed 1-3pm or by appointment
Email: jianhua.ruan 'at' utsa 'dot' edu
Phone: (210) 458-6819

Textbooks and Resources

Required:

 

Data Wrangling with Pandas, NumPy, and IPython(PDA) Python for Data Analysis:
Data Wrangling with Pandas, NumPy, and IPython

by Wes McKinney

 

 

 

 

 

https://images-na.ssl-images-amazon.com/images/I/51dZW9t3GeL._SX379_BO1,204,203,200_.jpg(DSS) Data Science from Scratch
First Principles with Python

by Joel Grus

 

 

 

 

 

Free ebook: (TS) Think Stats: Probability and Statistics for Programmers by Allen Downey available from Green Tea Press.

 

think_stats_comp.png

 

 

Optional:

Essential Tools for Working with Data(PDSH) Python Data Science Handbook

Essential Tools for Working with Data

By Jake VanderPlas

 

 

 

 

 

Grading Policy

5% Attendance and participation

30% Homework assignments and in-class exercises

30% Midterm exam

35% Final exam / project

 

I reserve the right to slightly adjust the weights of individual components if necessary.

 

Late assignments will not be accepted and a score of zero will be given, unless approved by the instructor.

Collaboration Policy

Assignments

Lecture Schedule and Slides

Part I: Course Intro, python tutorial

Part II: Data visualization

Part III: Numpy and vectorized computation

Part IV: Statistics and Probability

Part V: Pandas, data i/o and preprocessing

Part VI: Regression Analysis

Part VII: Performance Evaluation

Tentative lecture topics

Topics

Number of weeks

Python tutorial & basic plotting

1.5

Numpy, Stats and Prob, linear algebra

2.5

Pandas, data in/out, cleaning, transformation

1

Regression

1

Classification

3

Clustering

2

Networks

1