Recent Changes - Search:

Main Page


Piazza Site


edit SideBar


CSCI-4390/6390: Data Mining, Fall 2015

Class Time: TF 10-11:50AM
Room: Low 3051
Instructor Office Hours: TF 12-1PM, Lally 307

TA: Niu Xiang
TA Office Hours: MR 4-5PM, AE 119
TA Contact: ,


  • Nov 25, Assign5 has been posted online with due date: 7th Dec, 2015.
  • Nov 16, Assign4 has been posted online with due date: 23rd Nov, 2015.
  • Oct 22, Assign3 has been posted online with due date: 30th Oct, 2015.
  • Sep 29, Assign2 has been posted online with due date: 6th Oct, 2015.
  • Sep 13, Assign1 has been posted online with due date: 21st Sep, 2015.
  • Sep 7: Email invitations for Piazza were sent. If you did not get it, please email me.
  • Aug 6: Course website is up, with the syllabus and tentative calendar. We will use the Piazza site for discussions and Q&A; an invitation to sign-up on Piazza will be sent later.

Calendar & Lecture Notes

A tentative sequence of topics to be covered in the classes; changes are likely as the course progresses.

Day: Date Topic Readings Lectures
T: Sep 1 Data Mining and Analysis: Intro Chapter 1 Attach:intro.pptx
F: Sep 4 Algebraic and Probabilistic Views Chapter 1 Attach:slides-chap1.pdf
T: Sep 8 Numeric Attributes & Eigen-decomposition Chapter 2 Attach:slides-chap2.pdf
F: Sep 11 Eigen-decomposition Chapters 2, 3 Attach:slides-chap2.pdf
T: Sep 15 Categorical Data, High dimensional Data Chapters 3, 6 Attach:slides-chap3.pdf, Attach:slides-chap6.pdf
F: Sep 18 Dimensionality Reduction, Classification: Linear Discriminants Chapters 7, 20 Attach:slides-chap7.pdf, Attach:slides-chap20.pdf
T: Sep 22 LDA, SVD, Kernels Chapters 20, 7, 5 Attach:slides-chap7.pdf, Attach:slides-chap20.pdf
F: Sep 25 kernels, SVM Chapters 5, 21 Attach:slides-chap5.pdf, Attach:slides-chap21.pdf
T: Sep 29 SVMs Chapter 21 Attach:slides-chap21.pdf
F: Oct 2 Prof. Jiawei Han Lecture: CBIS Auditorium (9:45am-11am)
T: Oct 6 SVMs, kernel PCA, Kernel LDA Chapters 20, 21, 7 Attach:slides-chap21.pdf, Attach:slides-chap20.pdf, Attach:slides-chap7.pdf
F: Oct 9 EXAM I
T: Oct 13 NO CLASS (Mon Schedule)
F: Oct 16 Bayes Classifier, Decision Trees Chapters 18, 19 Attach:slides-chap18.pdf, Attach:slides-chap19.pdf
T: Oct 20 Neural Networks Readings: NN-chapter2-M.pdf Attach:slides-NN.pdf
F: Oct 23 Classification Evaluation Chapter 22 Attach:slides-chap22.pdf
T: Oct 27 Regression Readings: regression.pdf Attach:slides-regression.pdf
F: Oct 30 NO CLASS
T: Nov 3 KMeans/EM Clustering Chapter 13 Attach:slides-chap13.pdf
F: Nov 6 Hierarchical & Density-based Clustering Chapter 14, 15 Attach:slides-chap14.pdf, Attach:slides-chap15.pdf
T: Nov 10 EXAM II
F: Nov 13 Spectral & Graph Clustering Chapter 16 Attach:slides-chap16.pdf
T: Nov 17 Cluster Evaluation Chapter 17 Attach:slides-chap17.pdf
F: Nov 20 Frequent Pattern Mining: Itemset Mining Chapters 8,9 Attach:slides-chap8.pdf, Attach:slides-chap9.pdf
T: Nov 24 Graph Mining Chapter 11 Attach:slides-chap11.pdf
F: Nov 27 NO CLASS (Thanksgiving Break)
T: Dec 1 Pattern Assessment
F: Dec 4 Graph Analysis
T: Dec 8 Kernels & Graphs
F: Dec 11 EXAM III



Data mining is the process of automatic discovery of patterns, models, and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, classification, regression, and clustering. Emphasis will be laid on the algorithmic approach.

Learning Objectives

After taking this course students will be

  • able to describe the fundamental data mining tasks like pattern mining, classification, regression and clustering
  • able to analyze the key algorithms for the main tasks
  • able to implement and apply the techniques to real world datasets

The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the python language, with NumPy package for numeric computations. You are expected to learn python on your own via web tutorials, etc.


The main required textbook for the course is:

Readings from the will be posted on the course schedule, and supplementary material will be provided where necessary.

Grading Policy

Your grade will be a combination of the following items.

  • Exams (50%): There will be three exams covering the main topics of the course. The tentative exam schedule is posted on the class schedule table. There is no comprehensive final exam. All exams are open book.
  • Assignments & HW (30%): The assignments are meant to be practically oriented, thought they may include other questions. You'll be asked to implement some algorithms and apply them to real datasets, to complement the theory. There will be roughly one assignment every two weeks (5-6 assignments in total).
  • Data Mining Challenge Problem (20%): We will endeavor to take part in a public data mining competition, e.g., those held by Kaggle. Details will be provided later. This will involve applying data mining methods to real-world challenge tasks, and then assessing the results in a blind evaluation. If this is not feasible due to any reason, the 20% will be distributed equally among the other two categories (e.g., exams 60%, assignments: 40%).
Other Policies
  • Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible for any topics and assignments for the missed classes.
  • Laptops: Absolutely no laptops will be allowed in class during lectures or exams.
  • Late Assignments: Most assignments will be due just before midnight on the due date. Students can get an automatic one day extension for a 20% grade penalty. No late assignments will be accepted after the midnight following the due date.
Academic Integrity

You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.

The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic.

Edit - History - Print - Recent Changes - Search
Page last modified on November 25, 2015, at 12:30 AM