CSCI-4390/6390: Data Mining, Fall 2015
Class Time: TF 10-11:50AM
Room: Low 3051
Instructor Office Hours: TF 12-1PM, Lally 307
TA: Niu Xiang
TA Office Hours: MR 4-5PM, AE 119
TA Contact: ,
Table of Contents (hide)
A tentative sequence of topics to be covered in the classes; changes are likely as the course progresses.
|T: Sep 1||Data Mining and Analysis: Intro||Chapter 1||Attach:intro.pptx|
|F: Sep 4||Algebraic and Probabilistic Views||Chapter 1||Attach:slides-chap1.pdf|
|T: Sep 8||Numeric Attributes & Eigen-decomposition||Chapter 2||Attach:slides-chap2.pdf|
|F: Sep 11||Eigen-decomposition||Chapters 2, 3||Attach:slides-chap2.pdf|
|T: Sep 15||Categorical Data, High dimensional Data||Chapters 3, 6||Attach:slides-chap3.pdf, Attach:slides-chap6.pdf|
|F: Sep 18||Dimensionality Reduction, Classification: Linear Discriminants||Chapters 7, 20||Attach:slides-chap7.pdf, Attach:slides-chap20.pdf|
|T: Sep 22||LDA, SVD, Kernels||Chapters 20, 7, 5||Attach:slides-chap7.pdf, Attach:slides-chap20.pdf|
|F: Sep 25||kernels, SVM||Chapters 5, 21||Attach:slides-chap5.pdf, Attach:slides-chap21.pdf|
|T: Sep 29||SVMs||Chapter 21||Attach:slides-chap21.pdf|
|F: Oct 2||Prof. Jiawei Han Lecture: CBIS Auditorium (9:45am-11am)|
|T: Oct 6||SVMs, kernel PCA, Kernel LDA||Chapters 20, 21, 7||Attach:slides-chap21.pdf, Attach:slides-chap20.pdf, Attach:slides-chap7.pdf|
|F: Oct 9||EXAM I|
|T: Oct 13||NO CLASS (Mon Schedule)|
|F: Oct 16||Bayes Classifier, Decision Trees, Classification Evaluation|
|T: Oct 20||Logistic Regression|
|F: Oct 23||Clustering: Partitional & EM|
|T: Oct 27||Hierarchical, Density-based Clustering|
|F: Oct 30||Spectral & Graph Clustering|
|T: Nov 3||Spectral & Graph Clustering, Evaluation|
|F: Nov 6||Cluster Evaluation & Assessment|
|T: Nov 10||EXAM II|
|F: Nov 13||Frequent Pattern Mining: Itemset Mining|
|T: Nov 17||Pattern Summarization|
|F: Nov 20||Sequence Mining|
|T: Nov 24||Graph Mining|
|F: Nov 27||NO CLASS (Thanksgiving Break)|
|T: Dec 1||Pattern Assessment|
|F: Dec 4||Graph Analysis|
|T: Dec 8||Kernels & Graphs|
|F: Dec 11||EXAM III|
Data mining is the process of automatic discovery of patterns, models, and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, classification, regression, and clustering. Emphasis will be laid on the algorithmic approach.
After taking this course students will be
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the python language, with NumPy package for numeric computations. You are expected to learn python on your own via web tutorials, etc.
The main required textbook for the course is:
Readings from the will be posted on the course schedule, and supplementary material will be provided where necessary.
Your grade will be a combination of the following items.
You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic.