CS 6243: Machine Learning

Papers for Presentations

S. Cost and S. Salzberg (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10(1), 57-78.
Presented by John Salinas 2/1/05.

T. J. Hastie and R. J. Tibshirani (1996). Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607-616.

S. K. Murthy, S. Kasif, and S. Salzberg (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2: 1-32.

J. J. Oliver (1993). Decision graphs - an extension of decision trees. In Proceedings of the Fourth International Workshop on Artificial Intelligence and Statistics, pp. 343-350. Extended version available as TR 173, Department of Computer Science, Monash University.
Presented by Mark Doderer 2/3/05.

N. Littlestone (1988). Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Machine Learning, 2:285-318.

J. L. Elman. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Presented by Mark Robinson 2/10/05.

These two papers should be presented together.
M. Riedmiller and H. Braun (1993). A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In Proc. IEEE Conference on Neural Networks, San Fransisco.
S. Fahlman (1988). An Empirical Study of Learning Speeds in Backpropagation Networks. Technical Report CMU-CS-88-162, Carnegie Mellon University.

N. Littlestone and M. K. Warmuth (1994). The weighted majority algorithm. Information and Computation, 108(2):212-261.

D. Heckerman (1996). A Tutorial on Learning with Bayesian Networks. Technical Report MSR-TR-95-06, Microsoft Research.

P. Domingos and M. J. Pazzani (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29:103-130.

S. Muggleton and L. De Raedt (1994). Inductive logic programming: theory and methods. Journal of Logic Programming, 19,20:629-679.

W. W. Cohen (1995). Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, Lake Tahoe, California.
Presented by Amitava Karmaker 3/10/05.

J. C. Platt (1998). Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press.

These two papers should be presented together.
T. Joachims (1998), Text categorization with support vector machines: learning with many relevant features. Proceedings of the European Conference on Machine Learning, pp. 137-142, Springer.
T. Joachims (1999). Transductive inference for text classification using support vector machines. Proceedings of the International Conference on Machine Learning, pp. 200-209, Morgan Kaufmann.

D. H. Wolpert (1992). Stacked generalization. Neural Networks 5:241-259.

J. Friedman and T. Hastie and R. Tibshirani (1998). Additive logistic regression: a statistical view of boosting. Dept. of Statistics, Stanford University Technical Report.

These papers should be presented together. Focus on one and compare to the other two.
D. Opitz and R. Maclin (1999). Popular emsemble methods: an empirical study. Journal of Artificial Intelligence Research 11:169-198.
E. Bauer and R. Kohavi (1999). An empirical comparison of voting classification algorithms: Bagging, Boosting, and variants. Machine Learning 36:105-139.
T. G. Dietterich (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning 40:139-158.
to be presented by Giovanni Gonzalez

A. Blumer, A. Ehrenfeucht, D. Haussler and M.K. Warmuth (1987). Occam's razor. Information Processing Letters 24:377--380.

M. Kearns, R. Schapire, and L. Sellie (1994). Toward efficient agnostic learning. Machine Learning 17:115--142.

L. C. Baird (1995). Residual algorithms: reinforcement learning with function approximation. Proceedings of the Twelfth International Conference on Machine Learning, pp. 30-37.
To be presented by Qing Jiang.

W. W. Cohen and Y. Singer (1999). A simple, fast, and effective rule learner. In AAAI-99, Proceedings of the Sixteenth National Conference on Artifical Intelligence, pp. 335-342.

G. H. John, R. Kohavi, and K. Pfleger (1994). Irrelevant features and the subset selection problem. Proc. of the 11th International Conference on Machine Learning, pp. 121-129.

The second paper provides additional background for the first paper.
U. M. Fayyad and B. K. Irani (1993). Multi-interval discretization of continuous valued attributes for classification learning. In Proc. Interanational Joint Conference on Artificial Intelligence, pp. 1022-1027.
J. Dougherty, R. Kohavi, and M. Sahami (1995). Supervised and unsupervised discretization of continuous features. In Proc. International Conference on Machine Learning.

The second paper provides additional background for the first paper.
Y. Yang and X. Liu (1999). A re-examination of text categorization methods. Proc. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42-49.
S. E. Robertson and K. S. Jones (1997). Simple, Proven Approaches to Text Retrieval, Technical Report, Dept. of Information Science, Cambridge University.

K. A. DeJong, W. M. Spears, and F. D. Gordon (1993). Using genetic algorithms for concept learning. Machine Learning, 13:161-188.

J. R. Koza (1998). Genetic programming. In J. G. Williams and A. Kent (eds.), Encyclopedia of Computer Science and Technology, pp. 29-43, Marcel-Dekker.