CS 5163 (Introduction to Data Science)

News and Announcements

10/16: Pandas IO example code is uploaded. Download and run in ipython notebook. download data.

10/15: HW3 is available. Due on Oct 29th. data for Q2., data for Q3.

9/24: HW2 is available. Due on Oct 8th. Code and data for Q4 (43MB). (Alternatively, you can just download this csv file and use pandas to load it).

8/28: HW1 is available. Due on Sept 10th. Code Skeleton.

8/23: Welcome to CS5163 (Intro to Data Science)! Please take some time to complete a background survey.

Overview | Prerequisite | Time and Location | Instructor | Textbooks and Resources | Policies | Lecture Schedule and Slides | Assignments


This course covers the fundamentals of data science. Topics include data collection, preprocessing and transformation, visualization and exploratory analysis, and the mathematical and statistical foundations for data modeling, as well as introductions to data mining algorithms. Current programming language used is Python.


This course is primarily designed for graduate students in the Computer Science department. Fundamental understanding of data structure, algorithms, and some knowledge of probability and statistics are expected. Students without background in Algorithms or Statistics should consult the instructor prior to taking the course.

Time and Location

We meet in room MH 3.04.10. Lecturers are Mon and Wed, 4:00-5:15 PM.


Instructor: Dr. Jianhua Ruan
Office location: NPB 3.318
Office hours: Wed 1-3pm or by appointment
Email: jianhua.ruan 'at' utsa 'dot' edu
Phone: (210) 458-6819

Textbooks and Resources



Data Wrangling with Pandas, NumPy, and IPython(PDA) Python for Data Analysis:
Data Wrangling with Pandas, NumPy, and IPython

by Wes McKinney






https://images-na.ssl-images-amazon.com/images/I/51dZW9t3GeL._SX379_BO1,204,203,200_.jpg(DSS) Data Science from Scratch
First Principles with Python

by Joel Grus






Free ebook: (TS) Think Stats: Probability and Statistics for Programmers by Allen Downey available from Green Tea Press.






Essential Tools for Working with Data(PDSH) Python Data Science Handbook

Essential Tools for Working with Data

By Jake VanderPlas






Grading Policy

5% Attendance and participation

30% Homework assignments and in-class exercises

30% Midterm exam

35% Final exam / project


I reserve the right to slightly adjust the weights of individual components if necessary.


Late assignments will not be accepted and a score of zero will be given, unless approved by the instructor.

Collaboration Policy


Lecture Schedule and Slides

Part I: Course Intro, python tutorial

Part II: Data visualization

Part III: Numpy and vectorized computation

Part IV: Statistics and Probability

Part V: Pandas, data i/o and preprocessing

Part VI: Regression Analysis

Part VII: Performance Evaluation

Tentative lecture topics


Number of weeks

Python tutorial & basic plotting


Numpy, Stats and Prob, linear algebra


Pandas, data in/out, cleaning, transformation