CS 7641 & 4641
Machine Learning
Spring 2011
Charles Isbell,
isbell@cc.gatech.edu
259, College of Computing Building, 385-6491
Required Text:
Machine Learning
by Tom Mitchell, McGraw Hill, 1997
General Information
Machine Learning is a three-credit course on, well, Machine
Learning. Machine Learning is that area of Artificial Intelligence
that is concerned with computer programs that modify and improve their
performance through experience. The area is concerned with issues both
theoretical and practical. This particular class is a part of a
series of classes in the Intelligence thread and Intelligent Systems
area, and as such takes care to present algorithms and approaches in
such a way that grounds them in larger systems. We will cover a
variety of topics, including: statistical supervised and unsupervised
learning methods, randomized search algorithms, Bayesian learning
methods, and reinforcement learning. The course also covers
theoretical concepts such as inductive bias, the PAC and Mistake-bound
learning frameworks, minimum description length principle, and
Ockham's Razor. In order to ground these methods the course includes
some programming and involvement in a semester-long research project.
Objectives
There are four primary objectives for the course:
To provide a broad survey of approaches and techniques in ML
To develop a deeper understanding of several major topics in ML
To develop the design and programming skills that will help you
to build intelligent, adaptive artifacts
To develop the basic skills necessary to pursue research in ML
The last objective is the core one: you should develop enough
background that you can pursue any desire you have to learn more about
specific techniques in ML, either to pursue ML as a research career,
or to apply ML techniques in other research areas in interesting (as
opposed to uninteresting) ways. This is true for both the
undergraduates and the graduates.
Prerequisites
The official prerequisite for this course is an introductory course in
artificial intelligence. In particular, those of you with experience
in a general representational issues in AI, some AI programming, and
at least some background (or barring that, willingness to pick up some
background) in statistics and information theory should be fine. Any
student who did well in an AI course like
this one should be fine. You will note that the syllabus for that
particular course suggests at least some tentative background in
some machine learning techniques as well.
Having said all that, the most important prerequisite for enjoying and
doing well in this class is your interest in the material. I say that
every semester and I know it sounds trite, but it's true. In the end
it will be your own motivation to understand the material that gets
you through it more than anything else.
If you are not sure whether this class is for you, please talk to me.
Resources
- Readings. The textbook for the course is
Machine Learning by Tom Mitchell. We will follow the textbook
quite closely for most of the semester, so it is imperative that you
have a copy of the book. We will also use supplemental readings as
well, but those will be provided for you.
- Computing. You will have access to CoC clusters for your
programming assignments. You are free to use whatever machines you
want to do your work; however, the final result will have to run on
the standard CoC boxes. Exactly what this means will be spelled
out. This shouldn't be much of a restriction for you.
- Web. We will use the class web page to post last minute
announcements, so check it early and often.
Statement of Academic honesty
At this point in your academic careers, I feel that it would be
impolite to harp on cheating, so I won't. You are all adults, more or
less, and are expected to follow the university's code of academic
conduct (you know, the honor
code). Furthermore, at least some of you are
researchers-in-training, and I expect that you understand proper
attribution and the importance of intellectual honesty.
For those of you whose grade depends upon a group project, I not only
permit collaboration, I require it. Most importantly, each member of
a group is expected to participate fully in the development and
execution of that group's project. Your final project grade will
depend not only on the group's performance but on evidence of your
individual contribution.
Finally, unauthorized use of any previous semester course materials, such as tests, quizzes, homework, projects, and any other coursework, is prohibited in this course. In particular, you are not allowed to use old exams. Using these materials will be considered a direct violation of academic policy and will be dealt with according to the GT Academic Honor Code. Furthermore, I do not allow copies of my exams out in the ether (so there should not be any out there for you to use anyway). My policy on that is strict. If you violate the policy in any shape, form or fashion you will be dealt with according to the GT Academic Honor Code. I also have several... friends... from Texas who will help me personally deal with you.
Readings and Lectures
My research area is machine learning, and I'm deeply into the
area. Given that and my enormous lung capacity, and my tendency to get
distracted, it turns out that I can ramble on about the material for
days on end; however, that rather misses the point.
Lectures are meant to summarize the readings and stress the important
points. You are expected to come to class having already critically
read any assigned material. Your active participation in class is
crucial in making the course successful. I completely expect to be
interrupted throughout a lecture with questions and maybe even your
deep insights into the material. This is less about my teaching than
about your learning. My role is merely to
assist you in the process of learning more about the area.
Grading
Your final grade is divided into four components: assignments, a group
project, a midterm and a final exam.
Assignments. There will be three or four graded
assignments. They will be about programming and analysis. Generally,
they are designed to give you deeper insight into the material and to
prepare you for the exams. The programming will be in service of
allowing you to run and discuss experiments, do analysis, and so on.
Midterm. There will be a written, closed-book midterm
roughly halfway through the term. The exam will be in class.
Final Exam. There will be a written, closed-book final
exam at whatever time is scheduled for our class' final exam.
Group Project.
7641. There is a semester-long group project.
The word around campus is that this is both a useful and fun exercise.
At the end of the term, you will be required to produce a NIPS-style
conference paper, and to give a short presentation. Along the way,
your group will turn in a very short proposal and a somewhat longer
progress report. The desiderata for the project will be discussed in class.
4641. You will not be required to do the semester-long
project; however, you will act as peer reviewers. This is a task you
will take quite seriously. You will read at least two papers that are
generated by the students in 7641 and critique them, providing
pointers to missing related work, explanations of strengths and
weaknesses, and other evaluation. Don't worry: this will be blind
reviewing.
Due Dates
All graded assignments are due by the time and date indicated. I will
not accept late assignments or make up exams. You will get zero credit
for any late assignment. The only exceptions will require: a
note from an appropriate authority and immediate
notification of the problem when it arises. Naturally, your excuse
must be acceptable. If a meteor landed on your bed and destroyed your
assignment, I need a signed note from the meteor. You should also
treat assigned readings as, well, assignments that are due at the
beginning of each class.
Numbers
Component | 4641 | 7641 |
Assignments | 40% | 40% |
Midterm | 20% | 15% |
Final | 25% | 20% |
Project* | 15% | 25% |
Although class participation is not explictly graded, I will use your
class participation to determine whether your grade can be lifted in
case you are right on the edge of two grades. Participation means
attending classes, participating in class discussions, asking relevant
questions, volunteering to provide answers to questions, and providing
constructive criticism and creative suggestions that improve the
course.
* It is not possible to pass this class without doing a
decent class project, even if you are only taking the class P/F.
Anyone auditing the class should speak to me about what I require of
auditors. The same is true for reviewers. If you don't turn in
reviews that earn passing grades you cannot pass the class.
Disclaimer
I reserve the right to modify any of these plans as need be during the
course of the class; however, I won't do anything capriciously,
anything I do change won't be too drastic, and you'll be informed as
far in advance as possible.
Schedule
Date |
Topic |
Reading |
Due |
1/13 |
Thr |
Introduction and Overview |
Chp. 1 |
|
1/18 |
Tue |
Supervised Learning Review: Neural Networks & Decision Trees |
Chp. 3~4 |
|
1/20 |
Thr |
Group Project Formation |
|
Group Project: Team Formation (1/24 23:55) |
1/25 |
Tue |
Instance-based Learning Boosting |
Chp. 8 1 2 |
|
1/27 |
Thr |
Support Vector Machines |
1 2 3 |
Group Project: Informal Proposal (1/31 23:55) |
2/1 |
Tue |
Bias, Variance (Jon, Peng) |
|
|
2/3 |
Thr |
Bayesian Learning - Bayes Theorem, Maximum Likelihood |
Chp. 6 |
|
2/8 |
Tue |
Bayesian Learning - Bayes Optimal Classifier |
Chp. 6 |
|
2/10 |
Thr |
Bayesian Learning - Naive Bayes |
Chp. 6 |
Assignment 1: Supervised Learning (2/12 23:55) Group Project: Formal Proposal (2/13 23:55) |
2/15 |
Tue |
Computational Learning Theory |
Chp. 7 |
|
2/17 |
Thr |
Computational Learning Theory |
Chp. 7 |
|
2/22 |
Tue |
Randomized Optimization - RHC, SA, GA (Jon) Information Theory, Entropy (Jon) |
Chp. 9 1 |
|
2/24 |
Thr |
Randomized Optimization - MIMIC |
1 |
|
3/1 |
Tue |
Midterm Exam |
|
|
3/3 |
Thr |
Midterm Exam Review |
|
Course Drop (3/4 16:00) |
3/8 |
Tue |
Clustering |
1 |
|
3/10 |
Thr |
EM Algorithm & Impossibility results (clustering and NFL) |
Chp. 6 1 |
Assignment 2: Randomized Optimization (3/13 23:55) |
3/15 |
Tue |
Canceled |
|
|
3/17 |
Thr |
Feature Selection and Feature Transformation |
1 2 |
|
3/22 |
Tue |
Spring Break |
3/24 |
Thr |
Spring Break |
3/29 |
Tue |
PCA, ICA, Randomized Projections |
1 |
Group Project: Progress Report (3/29 23:55) |
3/31 |
Thr |
Markov Decision Processes, POMDPs |
1 |
|
4/5 |
Tue |
Reinforcement Learning |
Chp. 13 |
|
4/7 |
Thr |
Reinforcement Learning (Jon, Peng) |
1 2 |
Assignment 3: Unsupervised Learning (4/9 23:55) |
4/12 |
Tue |
Game Theory |
1 |
Group Project: Paper Submission (4/12 23:55) |
4/14 |
Thr |
Game Theory |
1 |
|
4/19 |
Tue |
Game Theory |
1 2 |
|
4/21 |
Thr |
Final Presentation |
|
Group Project: Final Project Material (4/24 23:55) |
4/26 |
Tue |
Final Presentation |
|
|
4/28 |
Thr |
Final Presentation |
|
Assignment 4: Markov Decision Processess (4/30 23:55) |
5/3 |
Tue |
Final Exam (14:50 - 17:40) |
|
|
|
|
Addressing Overfitting, Model Selection |
Chp. 5 1 2 |
|