CSCI-4390/6390: Data Mining, Fall 2011


Class Time: MR 10-11:50AM
Room: Carnegie 113
Instructor Office Hours: 12-1PM, MR, Lally 307


TA: Amina Shabbeer
TA Office Hours: 4-5PM, TW, AE 304
TA Contact: shabba@rpi.edu




Announcements

  • Nov 70: Assignment 6 has been posted.
  • Nov 7: Assignment 5 has been posted.
  • Oct 25: updated chap8.pdf on PCA, kernel PCA and SVD.
  • Oct 24: Assignment 4 has been posted.
  • Oct 14: Assignment 3 has been posted.
  • Sep 25: Assignment 2 has been posted.
  • Sep 17: Assignment 1 has been posted.
  • Sep 14: Activate your piazza account
  • Sep 12: Book chapters, as well as lectures are posted online after each lecture. Make sure to check the course website.
  • Aug 18: Course website is up, with the tentative calendar and syllabus.



Calendar & Lecture Notes

A tentative sequence of topics to be covered in the classes; changes are likely as the course progresses.

Day: Date Topic Chapters Lecture Notes
M: Aug 29 CLASSES CANCELLED
R: Sep 1 Data Mining Overview & Data Analysis Foundations (DA): Algebraic & Probabilistic Views chap1.pdf Attach:dmintro.pptx,lecture1.pdf
M: Sep 5 Labor Day Holiday
R: Sep 8 DA: Numeric Attributes chap2.pdf lecture2.pdf
M: Sep 12 NO CLASS NSF-RPI Workshop on Complex Data
R: Sep 15 DA: Numeric Attributes & Eigenvectors lecture3.pdf
M: Sep 19 DA: Categorical Data chap3.pdf lecture4.pdf
R: Sep 22 DA: Graph Data chap4.pdf lecture5.pdf
M: Sep 26 DA: Graph Models lecture6.pdf
R: Sep 29 DA: Kernel Methods chap5.pdf lecture7.pdf
M: Oct 3 DA: High Dimensional Analysis chap6.pdf lecture8.pdf
R: Oct 6 EXAM I
Tue: Oct 11 NO CLASS
R: Oct 13 DA: Dimensionality Reduction chap8.pdf lecture9.pdf
M: Oct 17 Frequent Pattern Mining (FPM): Itemset Mining chap10.pdf lecture10.pdf
R: Oct 20 FPM: Itemset Summaries & Sequence Mining chap11.pdf, chap12.pdf lecture11.pdf
M: Oct 24 FPM: Sequence Mining, Graph Mining chap13.pdf lecture12.pdf
R: Oct 27 FPM: Graph Mining, Classification (CLASS): Linear Discriminants chap27.pdf lecture13.pdf
M: Oct 31 CLASS: Linear Discriminants, Support Vector Machines (SVM) chap28.pdf lecture14.pdf
R: Nov 3 CLASS: SVMs lecture15.pdf
M: Nov 7 EXAM II
R: Nov 10 CLASS: Bayesian Classifier, Decision Trees chap26.pdf, chap24.pdf lecture16.pdf
M: Nov 14 Clustering (CLUS): Partitional chap16.pdf lecture17.pdf
R: Nov 17 CLUS: Hierarchical Clustering chap17.pdf lecture18.pdf
M: Nov 21 CLUS: Density-based Clustering, chap18.pdf lecture19.pdf
R: Nov 24 Thanksgiving Break
M: Nov 28 CLUS: Subspace Clustering chap19.pdf lecture20.pdf
R: Dec 1 Spectral & Graph Clustering chap20.pdf lecture21.pdf
M: Dec 5 Evaluation & Assessment chap21.pdf lecture22.pdf
R: Dec 8 EXAM III




Syllabus

Introduction

Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, classification, and clustering. Emphasis will be laid on the algorithmic approach.

Learning Objectives

After taking this course students will be

  • able to describe the fundamental data mining tasks like pattern mining, classification and clustering
  • able to analyze the key algorithms for the main tasks
  • able to implement and apply the techniques to real world datasets
Prerequisites

The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also essentially pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the python language, with NumPy package for numeric computations. You are expected to learn python on your own via web tutorials, etc. Assignments must be submitted via email to .

Textbook

There is no required text for the course. Notes will be posted online on the course webpage.

The following text books are also good references:

  • Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2006.
  • Data Mining: Concepts and Techniques (2nd edition), by Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2006.
Grading Policy

Your grade will be a combination of the following items.

  • Assignments (40%): The assignments are meant to be practically oriented. You'll be asked to implement some algorithms and apply them to real datasets, to complement the theory. There will be roughly one assignment every two weeks.
  • Exams (60%): There will be three exams covering the main topics of the course. The tentative exam schedule is posted on the class schedule table. There is no comprehensive final exam. All exams are open book.
Other Policies
  • Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible for brushing up on any missed material.
  • Laptops: Absolutely no laptops will be allowed in class during lectures. The only exception is during exams, to access the class notes online and to use the calculator. Even during the exam, you may not use any other software (e.g., R, python, matlab, etc.) for the computations, and you may not "browse" for solutions (you are not likely to find anything!).
  • Late Assignments: Most assignments will be due just before midnight on the due date. Students get an automatic one day extension with 20% penalty. No late assignments will be accepted after the midnight following the due date.
Academic Integrity

You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.

The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic.

GlossyBlue theme adapted by David Gilbert
Powered by PmWiki