Main
CSCI-4390/6390: Data Mining, Fall 2009
Class: 10-11:50AM, MR, Low 3045
Instructor Office Hours: 12-1PM, MR
Announcements
|
Calendar & Lecture Notes/Videos
A tentative sequence of topics to be covered in the classes; changes are likely as the course progresses.
| Day: Date | Topic | Chapters | Lecture Notes | Video
|
|---|---|---|---|---|
| M: Aug 31 | Data Mining Overview | |||
| R: Sep 3 | Exploratory Data Analysis (EDA): Numeric Attributes | Video | ||
| M: Sep 7 | Labor Day Holiday | |||
| R: Sep 10 | EDA: Numeric & Categorical Attributes | Video | ||
| M: Sep 14 | Frequent Pattern Mining (FPM): Itemset Mining | Video | ||
| R: Sep 17 | Clustering (CLUS): Partitional (KMeans, EM) | Video | ||
| M: Sep 21 | Classification (CLASS): Decision Trees | Video | ||
| R: Sep 24 | EDA: High Dimensional Data | Video | ||
| M: Sep 28 | EDA: Dimensionality Reduction: PCA | Video | ||
| R: Oct 1 | EDA: Dimensionality Reduction: PCA/SVD | Video | ||
| M: Oct 5 | EXAM I | |||
| R: Oct 8 | EDA: Linear Discriminant Analysis: LDA | Video | ||
| Tue: Oct 13 | FPM: Itemset Summaries | Video | ||
| R: Oct 15 | FPM: Sequence Mining | Video | ||
| M: Oct 19 | FPM:Sequence Mining, CLASS: Probabilistic | Video | ||
| R: Oct 22 | CLASS: Support Vector Machines (SVM) | Video | ||
| M: Oct 26 | CLASS: SVM contd. | Video | ||
| R: Oct 29 | CLASS: Kernel SVM, Rule-based | Video | ||
| M: Nov 2 | CLASS: Classifier Evaluation | Video | ||
| R: Nov 5 | EXAM II | |||
| M: Nov 9 | CLUS: Hierarchical/Density-based Clustering | Video | ||
| R: Nov 12 | CLUS: Density-based Clustering (Kernel Density Estimation) | Video | ||
| M: Nov 16 | CLUS: Subspace Clustering | Video | ||
| R: Nov 19 | CLUS: Spectral Clustering | Video | ||
| M: Nov 23 | CLUS: Cluster Validity | |||
| R: Nov 26 | Thanksgiving Break | |||
| M: Nov 30 | CLASS: Kernel PCA/LDA | |||
| R: Dec 3 | EXAM III | |||
| M: Dec 7 | Social Network Analysis (SNA) | |||
| R: Dec 10 | SNA: Graph Mining |
Syllabus
IntroductionData mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: statistical foundations, pattern mining, classification, and clustering. Emphasis will be laid on the algorithmic foundations. Learning ObjectivesAfter taking this course students will be
PrerequisitesThe pre-requisites for this course include data structures and algorithms and discrete mathematics. Basics of linear algebra, and probability & statistics will be very useful as well. Assignments will require the use of the R software. Students are expected to learn R on their own. Assignments must be submitted online at the wiki site. Knowledge of pmwiki markup usage will be your responsibility. TextbookThere is no required text for the course. Notes will be handed out in class. The following text books are also good references:
Grading PolicyYour grade will be a combination of the following items. Note that the final distribution is subject to some change depending on the number of assignments, but exams will be at least 60%.
Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Academic IntegrityYou may consult other members of the class on the homeworks, but you must submit your own work. Anytime you borrow material from the web or elsewhere, you must acknowledge the source. The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic. |