CS 6243: Machine Learning


Papers for Presentations

Lecture Notes

Introduction (Simplified Iris Dataset, Simplified Glass Dataset).

Nearest Neighbor, Decision Trees, Neural Networks, Bayesian Learning (an example created using an earlier version of Weka), Learning Rules, Support Vector Machines.

Bagging and Boosting, Evaluating Hypotheses, Computational Learning Theory, Reinforcement Learning, Unsupervised Learning.


Homework 1, Homework 2, Homework 3, Homework 4, Homework 5, Homework 6.

Homework 7, Homework 8, Homework 9, Homework 10, Homework 11.

Lab 1, Lab 2 (initial Perceptron.java).

Project, Addendum.

The project will be to implement a new classifier class LinearMachine in Weka. This project can be done singly or in groups (the groups will have more objectives to satisfy). Ideally, this class should include options for different algorithms (perceptron, LMS, weighted majority, winnow, and exponentiated update), and various parameters (learning rate, incremental/batch learning, margin, epochs). The project will include a comparison of the algorithm to other Weka algorithms. A more detailed description is forthcoming.

If you wish, you may propose and work on an alternative project, subject to the instructor's consent.



The textbook for the course is Tom Mitchell (1997). Machine Learning. McGraw-Hill.


We will be using Weka, a collection of machine learning algorithms implemented in Java. For information on Weka, start here.

Other Readings

A number of other readings will be assigned during the course. They will be identified below under "Required Reading" (e.g., on topics not covered by the book) or "Further Reading" (e.g., for additional detail or a different perspective).

Required Reading

Dietterich, T. G.. "Machine Learning" in Nature Encyclopedia of Cognitive Science, London: Macmillan, 2003.

Learning Sets of Rules.
J. R. Quinlan (1990). Learning logical definitions from relations. Machine Learning 5: 239-266.

Support Vector Machines.
Chapter 1 in Bernhard Scholkopf and Alex Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

Bagging and Boosting.
L. Breiman (1996). Bagging predictors. Machine Learning, 24:123-140.
Y. Freund and R. E. Schapire (1996). Experiments with a new boosting algorithm. In Proc. International Conference on Machine Learning, pp. 148-156.

Evaluating Hypotheses.
T. G. Dietterich (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10:1895-1923.
F. Provost, T. Fawcett, and R. Kohavi (1998). The case against accuracy estimation for comparing induction algorithms. In Proc. Fifteenth Intl. Conf. Machine Learning, pages 445-553.

Unsupervised Learning.
Mitchell, Section 6.12, pp. 191-196.
S. Kotsiantis and P. Pintelas (2004). Recent advances in clustering: a brief survey, WSEAS Transactions on Information Science and Applications 1:73-81.
P. J. Francis and B. J. Wills (1999). Introduction to principal components analysis. ArXiv Astrophysics e-prints.
R. Agrawal and R. Srikant (1994). Fast algorithms for mining association rules. Proc. of the 20th Int'l Conference on Very Large Databases, pp. 487-499.

Further Reading

General Information (on reserve in library).
Richard O. Duda, Peter E. Hart, and David G. Stork (2001). Pattern Classification. Wiley.
Chapters 18-21 in Stuart J. Russell and Peter Norvig (2003). Artificial Intelligence: A Modern Approach, 2nd Edition. Prentice-Hall.

Nearest Neighbor.
Russell/Norvig, Section 20.4, pp. 733-736.
Duda/Hart/Stork, Sections 4.5 and 4.6, pp. 177-192.

Decision Trees.
Russell/Norvig, Section 18.3, pp. 653-664.
Duda/Hart/Stork, Sections 8.2-8.4, pp. 395-413.

Linear Learning and Artificial Neural Networks.
Russell/Norvig, Section 20.5, pp. 736-748.
Duda/Hart/Stork, Chapters 5 and 6, pp. 215-349.

Bayesian Learning.
Russell/Norvig, Chapters 14 and 15, pp. 482-583.
Russell/Norvig, Sections 20.1-20.3, pp. 712-733.
Duda/Hart/Stork, Chapters 2 and 3, pp. 20-160.

Learning Sets of Rules.
Russell/Norvig, Chapter 19, pp. 678-711.

Support Vector Machines.
Chapter 7 in Bernhard Scholkopf and Alex Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
Russell/Norvig, Section 20.6, pp. 749-752.
Duda/Hart/Stork, Section 5.11, pp. 259-265.

Bagging and Boosting.
Russell/Norvig, Section 18.4, pp. 664-668.
Duda/Hart/Stork, Section 9.5, pp. 475-482.
A. Krogh and J. Vedelsby (1995). Neural network ensembles, cross validation and active learning. In D. S. Touretzky G. Tesauro and T. K. Leen, eds., Advances in Neural Information Processing Systems, pp. 231-238, MIT Press.
Y. Freund and R. E. Schapire (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119--139.

Evaluating Hypotheses.
Duda/Hart/Stork, Section 9.6, pp. 482-495.

Computational Learning Theory.
Chapter 5 in Bernhard Scholkopf and Alex Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
Russell/Norvig, Section 18.5, pp. 668-673.

Reinforcement Learning.
R. S. Sutton and A. G. Barto (1998). Reinforcement Learning: An Introduction, MIT Press.
Russell/Norvig, Chapter 21, pp. 763-789.

Unsupervised Learning.
Russell/Norvig, Section 20.3, pp. 724-733.
Duda/Hart/Stork, Sections 3.8 and 3.9, pp. 114-128.
Duda/Hart/Stork, Chapter 10, pp. 517-599.


UTSA has its own machine learning group.

General machine learning resources are available online from:

Online books on machine learning include:

Online resources for specific areas of machine learning include: