CS 7641 & 4641 Machine Learning

CS 7641 & 4641 Machine Learning
Spring 2011

Charles Isbell, isbell@cc.gatech.edu
259, College of Computing Building, 385-6491

Required Text:

Machine Learning
by Tom Mitchell, McGraw Hill, 1997


General Information

Machine Learning is a three-credit course on, well, Machine Learning. Machine Learning is that area of Artificial Intelligence that is concerned with computer programs that modify and improve their performance through experience. The area is concerned with issues both theoretical and practical. This particular class is a part of a series of classes in the Intelligence thread and Intelligent Systems area, and as such takes care to present algorithms and approaches in such a way that grounds them in larger systems. We will cover a variety of topics, including: statistical supervised and unsupervised learning methods, randomized search algorithms, Bayesian learning methods, and reinforcement learning. The course also covers theoretical concepts such as inductive bias, the PAC and Mistake-bound learning frameworks, minimum description length principle, and Ockham's Razor. In order to ground these methods the course includes some programming and involvement in a semester-long research project.

Objectives

There are four primary objectives for the course:

  • To provide a broad survey of approaches and techniques in ML

  • To develop a deeper understanding of several major topics in ML

  • To develop the design and programming skills that will help you to build intelligent, adaptive artifacts

  • To develop the basic skills necessary to pursue research in ML

The last objective is the core one: you should develop enough background that you can pursue any desire you have to learn more about specific techniques in ML, either to pursue ML as a research career, or to apply ML techniques in other research areas in interesting (as opposed to uninteresting) ways. This is true for both the undergraduates and the graduates.

Prerequisites

The official prerequisite for this course is an introductory course in artificial intelligence. In particular, those of you with experience in a general representational issues in AI, some AI programming, and at least some background (or barring that, willingness to pick up some background) in statistics and information theory should be fine. Any student who did well in an AI course like this one should be fine. You will note that the syllabus for that particular course suggests at least some tentative background in some machine learning techniques as well. Having said all that, the most important prerequisite for enjoying and doing well in this class is your interest in the material. I say that every semester and I know it sounds trite, but it's true. In the end it will be your own motivation to understand the material that gets you through it more than anything else. If you are not sure whether this class is for you, please talk to me.

Resources

  • Readings. The textbook for the course is Machine Learning by Tom Mitchell. We will follow the textbook quite closely for most of the semester, so it is imperative that you have a copy of the book. We will also use supplemental readings as well, but those will be provided for you.
  • Computing. You will have access to CoC clusters for your programming assignments. You are free to use whatever machines you want to do your work; however, the final result will have to run on the standard CoC boxes. Exactly what this means will be spelled out. This shouldn't be much of a restriction for you.
  • Web. We will use the class web page to post last minute announcements, so check it early and often.

Statement of Academic honesty

At this point in your academic careers, I feel that it would be impolite to harp on cheating, so I won't. You are all adults, more or less, and are expected to follow the university's code of academic conduct (you know, the honor code). Furthermore, at least some of you are researchers-in-training, and I expect that you understand proper attribution and the importance of intellectual honesty.

For those of you whose grade depends upon a group project, I not only permit collaboration, I require it. Most importantly, each member of a group is expected to participate fully in the development and execution of that group's project. Your final project grade will depend not only on the group's performance but on evidence of your individual contribution.

Finally, unauthorized use of any previous semester course materials, such as tests, quizzes, homework, projects, and any other coursework, is prohibited in this course. In particular, you are not allowed to use old exams. Using these materials will be considered a direct violation of academic policy and will be dealt with according to the GT Academic Honor Code. Furthermore, I do not allow copies of my exams out in the ether (so there should not be any out there for you to use anyway). My policy on that is strict. If you violate the policy in any shape, form or fashion you will be dealt with according to the GT Academic Honor Code. I also have several... friends... from Texas who will help me personally deal with you.

Readings and Lectures

My research area is machine learning, and I'm deeply into the area. Given that and my enormous lung capacity, and my tendency to get distracted, it turns out that I can ramble on about the material for days on end; however, that rather misses the point.

Lectures are meant to summarize the readings and stress the important points. You are expected to come to class having already critically read any assigned material. Your active participation in class is crucial in making the course successful. I completely expect to be interrupted throughout a lecture with questions and maybe even your deep insights into the material. This is less about my teaching than about your learning. My role is merely to assist you in the process of learning more about the area.

Grading

Your final grade is divided into four components: assignments, a group project, a midterm and a final exam.

  • Assignments. There will be three or four graded assignments. They will be about programming and analysis. Generally, they are designed to give you deeper insight into the material and to prepare you for the exams. The programming will be in service of allowing you to run and discuss experiments, do analysis, and so on.

  • Midterm. There will be a written, closed-book midterm roughly halfway through the term. The exam will be in class.

  • Final Exam. There will be a written, closed-book final exam at whatever time is scheduled for our class' final exam.

  • Group Project.
    7641. There is a semester-long group project. The word around campus is that this is both a useful and fun exercise. At the end of the term, you will be required to produce a NIPS-style conference paper, and to give a short presentation. Along the way, your group will turn in a very short proposal and a somewhat longer progress report. The desiderata for the project will be discussed in class.
    4641. You will not be required to do the semester-long project; however, you will act as peer reviewers. This is a task you will take quite seriously. You will read at least two papers that are generated by the students in 7641 and critique them, providing pointers to missing related work, explanations of strengths and weaknesses, and other evaluation. Don't worry: this will be blind reviewing.

Due Dates

All graded assignments are due by the time and date indicated. I will not accept late assignments or make up exams. You will get zero credit for any late assignment. The only exceptions will require: a note from an appropriate authority and immediate notification of the problem when it arises. Naturally, your excuse must be acceptable. If a meteor landed on your bed and destroyed your assignment, I need a signed note from the meteor. You should also treat assigned readings as, well, assignments that are due at the beginning of each class.

Numbers

Component46417641
Assignments40%40%
Midterm20%15%
Final25%20%
Project*15%25%

Although class participation is not explictly graded, I will use your class participation to determine whether your grade can be lifted in case you are right on the edge of two grades. Participation means attending classes, participating in class discussions, asking relevant questions, volunteering to provide answers to questions, and providing constructive criticism and creative suggestions that improve the course.

* It is not possible to pass this class without doing a decent class project, even if you are only taking the class P/F. Anyone auditing the class should speak to me about what I require of auditors. The same is true for reviewers. If you don't turn in reviews that earn passing grades you cannot pass the class.

Disclaimer

I reserve the right to modify any of these plans as need be during the course of the class; however, I won't do anything capriciously, anything I do change won't be too drastic, and you'll be informed as far in advance as possible.


Schedule

Date Topic Reading Due
1/13 Thr Introduction and Overview Chp. 1
1/18 Tue Supervised Learning Review: Neural Networks & Decision Trees Chp. 3~4
1/20 Thr Group Project Formation Group Project: Team Formation (1/24 23:55)
1/25 Tue Instance-based Learning
Boosting
Chp. 8
1 2
1/27 Thr Support Vector Machines 1 2 3 Group Project: Informal Proposal (1/31 23:55)
2/1 Tue Bias, Variance (Jon, Peng)
2/3 Thr Bayesian Learning - Bayes Theorem, Maximum Likelihood Chp. 6
2/8 Tue Bayesian Learning - Bayes Optimal Classifier Chp. 6
2/10 Thr Bayesian Learning - Naive Bayes Chp. 6 Assignment 1: Supervised Learning (2/12 23:55)
Group Project: Formal Proposal (2/13 23:55)
2/15 Tue Computational Learning Theory Chp. 7
2/17 Thr Computational Learning Theory Chp. 7
2/22 Tue Randomized Optimization - RHC, SA, GA (Jon)
Information Theory, Entropy (Jon)
Chp. 9
1
2/24 Thr Randomized Optimization - MIMIC 1
3/1 Tue Midterm Exam
3/3 Thr Midterm Exam Review Course Drop (3/4 16:00)
3/8 Tue Clustering 1
3/10 Thr EM Algorithm & Impossibility results (clustering and NFL) Chp. 6
1
Assignment 2: Randomized Optimization (3/13 23:55)
3/15 Tue Canceled
3/17 Thr Feature Selection and Feature Transformation 1 2
3/22 Tue Spring Break
3/24 Thr Spring Break
3/29 Tue PCA, ICA, Randomized Projections 1 Group Project: Progress Report (3/29 23:55)
3/31 Thr Markov Decision Processes, POMDPs 1
4/5 Tue Reinforcement Learning Chp. 13
4/7 Thr Reinforcement Learning (Jon, Peng) 1 2 Assignment 3: Unsupervised Learning (4/9 23:55)
4/12 Tue Game Theory 1 Group Project: Paper Submission (4/12 23:55)
4/14 Thr Game Theory 1
4/19 Tue Game Theory 1 2
4/21 Thr Final Presentation Group Project: Final Project Material (4/24 23:55)
4/26 Tue Final Presentation
4/28 Thr Final Presentation Assignment 4: Markov Decision Processess (4/30 23:55)
5/3 Tue Final Exam (14:50 - 17:40)
Addressing Overfitting, Model Selection Chp. 5
1 2