CSCI-4390/6390: Data Mining, Fall 2012
Class Time: MR 10-11:50AM
Room: Greene 120
Instructor Office Hours: MR 12-1PM, Lally 307
TA: Nilothpal Talukder
TA Office Hours: W 2-4PM, Amos Eaton 119
Table of Contents (hide)
A tentative sequence of topics to be covered in the classes; changes are likely as the course progresses.
|M: Aug 27||NO CLASS|
|R: Aug 30||NO CLASS|
|M: Sep 3||Labor Day Holiday|
|R: Sep 6||Data Mining and Analysis (DA): Algebraic and Probabilistic Views||Attach:chap1.pdf||Attach:dmintro.pptx, Attach:Lecture1.PDF|
|M: Sep 10||DA: Numeric Attributes||Attach:chap2.pdf||Attach:Lecture2.PDF|
|R: Sep 13||DA: Numeric Attributes: Eigen-decomposition||Attach:Lecture3.PDF|
|M: Sep 17||DA: Dimensionality Reduction||Attach:chap7.pdf||Attach:Lecture4.PDF|
|R: Sep 20||DA: High Dimensional Analysis||Attach:chap6.pdf||Attach:Lecture5.PDF|
|M: Sep 24||DA: Categorical Data &||Attach:chap3.pdf||Attach:Lecture6.PDF|
|R: Sep 27||DA: Kernel Methods||Attach:chap5.pdf||Attach:Lecture7.PDF|
|M: Oct 1||DA: Kernels||Attach:Lecture8.PDF|
|R: Oct 4||EXAM I|
|Tue: Oct 9||Classification (CLASS): Linear Discriminants, SVMs||Attach:chap22.pdf||Attach:Lecture9.PDF|
|R: Oct 11||CLASS: SVMs||Attach:chap23.pdf||Attach:Lecture10.PDF|
|M: Oct 15||CLASS: Bayesian Classifier, Decision Trees||Attach:chap21.pdf, Attach:chap19.pdf||Attach:Lecture11.PDF|
|R: Oct 18||CLASS: Classifier Evaluation||Attach:chap24.pdf||Attach:Lecture12.PDF|
|M: Oct 22||CLASS: Classifier Evaluation||Attach:Lecture13.PDF|
|R: Oct 25||Clustering (CLUS): Partitional||Attach:chap13.pdf||Attach:Lecture14.PDF|
|M: Oct 29||NO CLASS|
|R: Nov 1||EXAM II|
|M: Nov 5||CLUS: EM-based||Attach:Lecture15.PDF|
|R: Nov 8||CLUS: Hierarchical, Density-based Clustering||Attach:chap14.pdf, Attach:chap15.pdf||Attach:Lecture16.PDF|
|M: Nov 12||CLUS: Spectral & Graph Clustering||Attach:chap17.pdf||Attach:Lecture17.PDF|
|R: Nov 15||CLUS: Spectral & Graph Clustering||Attach:Lecture18.PDF|
|M: Nov 19||CLUS: Evaluation & Assessment||Attach:chap18.pdf||Attach:Lecture19.PDF|
|R: Nov 22||Thanksgiving Break|
|M: Nov 26||Frequent Pattern Mining (FPM): Itemset Mining||Attach:chap8.pdf, Attach:chap9.pdf||Attach:Lecture20.PDF|
|R: Nov 29||FPM: Sequence Mining||Attach:chap10.pdf||Attach:Lecture21.PDF|
|M: Dec 3||FPM: Graph Mining||Attach:chap11.pdf||Attach:Lecture22.PDF|
|R: Dec 6||EXAM III|
Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, classification, and clustering. Emphasis will be laid on the algorithmic approach.
After taking this course students will be
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also essentially pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the python language, with NumPy package for numeric computations. You are expected to learn python on your own via web tutorials, etc. Assignments must be submitted via email to .
Students will be given draft chapters from the forthcoming book
The following text books are also good references:
Your grade will be a combination of the following items.
You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic.