Main
Dmcourse.Main History
Hide minor edits - Show changes to markup
[l] FPM: Sequence Mining
[l] FPM: Sequence Mining [l] Attach:chap10.pdf [l] Attach:Lecture21.PDF
[l]
[l] CLUS: Spectral & Graph Clustering [l]
[l] CLUS: Evaluation & Assessment [l] Attach:chap18.pdf
[l] [l] Attach:Lecture18.PDF
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] CLUS: Evaluation & Assessment [l] Attach:chap18.pdf
[l] FPM: Sequence Mining
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] FPM: Graph Mining
[l] FPM: Sequence Mining
[l] FPM: Pattern Assessment
[l] FPM: Graph Mining
[l]
- Nov 8: Assign5 has be posted. It is due on 16th Nov, before midnight.
[l] CLUS: EM-based [l] [l] Attach:Lecture15.PDF
[row bgcolor=aliceblue] [l]R: Nov 8
[row bgcolor=aliceblue] [l]R: Nov 8 [l] CLUS: Subspace Clustering
[l] Clustering (CLUS): Hierarchical, Partitional
[l] Clustering (CLUS): Partitional [l] Attach:chap13.pdf
[l] CLUS: Density-based Clustering
[l] CLUS: Hierarchical, Density-based Clustering
- Oct 23: Assign4 has be posted. It is due on 30th Oct, before midnight.
[l] CLASS: Classifier Evaluation
[l] CLASS: Classifier Evaluation [l] Attach:Lecture13.PDF
- Oct 12: Assign3 has be posted. It is due on 19th Oct, before midnight.
[l]
[l] CLASS: SVMs, Bayesian Classifier
[l] CLASS: SVMs
[l] ASS: Decision Trees
[l] CLASS: Bayesian Classifier, Decision Trees
[l] DA: Kernels, Classification (CLASS): Linear Discriminants [l] Attach:chap22.pdf [l]
[l] DA: Kernels [l] [l] Attach:Lecture8.PDF
[l] CLASS: SVMs
[l] Classification (CLASS): Linear Discriminants, SVMs [l] Attach:chap22.pdf [l]
[l] Classification (CLASS): Linear Discriminants & SVMs
[l] DA: Kernels, Classification (CLASS): Linear Discriminants
[l] CLASS: Bayesian Classifier CL
[l] CLASS: SVMs, Bayesian Classifier
[l] Classification (CLASS): Bayesian Classifier
[l] Classification (CLASS): Linear Discriminants & SVMs
[l] CLASS: Decision Trees
[l] CLASS: SVMs
[l] CLASS: Linear Discriminants & SVMs
[l] CLASS: Bayesian Classifier CL
[l] CLASS: SVMs
[l] ASS: Decision Trees
- Sep 24: Assign2 has be posted. It is due on 1st Oct, before midnight.
[l] DA: Categorical Data & Kernel Methods [l] Attach:chap3.pdf, Attach:chap5.pdf
[l] DA: Categorical Data & [l] Attach:chap3.pdf [l] Attach:Lecture6.PDF
[l] DA: Dimensionality Reduction
[l] DA: Kernel Methods [l] Attach:chap5.pdf
[l] DA: Categorical Data & High Dimensional Analysis [l] Attach:chap3.pdf, Attach:chap6.pdf
[l] DA: High Dimensional Analysis [l] Attach:chap6.pdf [l] Attach:Lecture5.PDF
[l] Kernel Methods
[l] DA: Categorical Data & Kernel Methods [l] Attach:chap3.pdf, Attach:chap5.pdf
[l] DA: Dimensionality Reduction & Categorical Data [l] Attach:chap3.pdf , Attach:chap7.pdf
[l] DA: Dimensionality Reduction [l] Attach:chap7.pdf
[l] DA: High Dimensional Analysis
[l] DA: Categorical Data & High Dimensional Analysis [l] Attach:chap3.pdf, Attach:chap6.pdf
- Sep 14: [Dmcourse/Assign1 | Assign1]] has be posted. It is due on 21st Sep, before midnight.
- Sep 14: Assign1 has be posted. It is due on 21st Sep, before midnight.
- Sep 14: [Dmcourse/Assign1 | Assign1]] has be posted. It is due on 21st Sep, before midnight.
[l] DA: Numeric and Categorical Attributes
[l] DA: Numeric Attributes
[l] DA: Numeric and Categorical Attributes [l] Attach:chap3.pdf
[l] DA: Numeric Attributes: Eigen-decomposition [l]
[l] DA: Kernel Approach and Graph Analysis
[l] DA: Dimensionality Reduction & Categorical Data [l] Attach:chap3.pdf
[l] DA: Dimensionality Reduction
[l] Kernel Methods
[l] Attach:chap1.pdf
[l] Attach:chap3.pdf
[l] Attach:chap2.pdf [l] Attach:Lecture2.PDF
[l] DA: Kernel Approach
[l] DA: Numeric and Categorical Attributes [l] Attach:chap3.pdf
[l] DA: Graph Analysis
[l] DA: Kernel Approach and Graph Analysis
- Sep 7: Everyone enrolled in the course should have already signed up for the piazza account (or they should have received an email to do so). Please sign up immediately to receive class announcements and emails.
[!c]Lectures
[l] (Attach:)chap1.pdf, (Attach:)chap2.pdf
[l] (Attach:)chap3.pdf
[l] Attach:chap3.pdf
[l] (Attach:)chap1.pdf, (Attach:)chap2.pdf
[l] (Attach:)chap3.pdf
[l] CLASS: SVMs FPM: Pattern Assessment
[l] CLASS: SVMs
[l] CLUS: Spectral & Graph Clustering
[l] CLUS: Evaluation & Assessment
[l] CLUS: Evaluation & Assessment
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] FPM: Sequence Mining
[l] FPM: Sequence Mining
[l] FPM: Graph Mining
[l] FPM: Graph Mining
[l] FPM: Pattern Assessment
[l] DA: Numeric Attributes
[l] DA: Numeric and Categorical Attributes
[l] DA: Categorical Attributes
[l] DA: Kernel Approach
[l] DA: Kernel Approach
[l] DA: Graph Analysis
[l] DA: High Dimensional Analysis
[l] DA: High Dimensional Analysis
[l] DA: Dimensionality Reduction
[l] DA: Dimensionality Reduction
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] DA: Dimensionality Reduction
[l] FPM: Sequence Mining
[l] Classification (CLASS): Bayesian Classifier
[l] DA: Graph Analysis
[l] CLASS: Decision Trees
[l] DA: Graph Analysis & Mining
[l] CLASS: Linear Discriminants & SVMs
[l] FPM: Graph Mining
[l] CLASS: SVMs FPM: Pattern Assessment
[l] FPM: Pattern Assessment
[l] CLASS: Classifier Evaluation
[l] Classification (CLASS): Bayesian Classifier
[l] CLASS: Classifier Evaluation
[l] CLASS: Decision Trees
[l] Clustering (CLUS): Hierarchical, Partitional
[l] CLASS: Linear Discriminants & SVMs
[l] CLUS: Density-based Clustering
[l] CLASS: SVMs
[l] CLUS: Subspace Clustering
[l] CLASS: Classifier Evaluation
[l] CLUS: Spectral & Graph Clustering
[l] Clustering (CLUS): Hierarchical, Partitional
[l] CLUS: Spectral & Graph Clustering
[l] CLUS: Density-based Clustering
[l] CLUS: Evaluation & Assessment
[l] CLUS: Spectral & Graph Clustering
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] CLUS: Subspace Clustering
[l] FPM: Sequence Mining
[l] CLUS: Evaluation & Assessment
[l] FPM: Graph Mining
TA Office Hours: TBA\\
TA Office Hours: W 2-4PM, Amos Eaton 119\\
TA Contact: email:talkun@rpi.edu
TA Contact:
TA Contact: emailto:talkun@rpi.edu
TA Contact: email:talkun@rpi.edu
Room: TBA\\
Room: Greene 120\\
TA: TBA\\
TA: Nilothpal Talukder\\
TA Contact: TBA
TA Contact: ehidden-email:gnyhxa@ecv.rqh
[l] DA: Numeric Attributes
[l]DA: Numeric Attributes
[l] DA: Categorical Attributes
[l]Thanksgiving Break
[l]Thanksgiving Break
Instructor Office Hours: 12-1PM, MR, Lally 307\\
Instructor Office Hours: MR 12-1PM, Lally 307\\
There is no required text for the course. Notes will be posted online on the course webpage.
Students will be given draft chapters from the forthcoming book
- Data Mining and Analysis: Foundations and Algorithms, Mohammed J. Zaki and Wagner Meira, Jr, Cambridge University Press, 2013.
- Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible for brushing up on any missed material.
- Laptops: Absolutely no laptops will be allowed in class during lectures. The only exception is during exams, to access the class notes online and to use the calculator. Even during the exam, you may not use any other software (e.g., R, python, matlab, etc.) for the computations, and you may not "browse" for solutions (you are not likely to find anything!).
- Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible for any topics and assignments for the missed classes.
- Laptops: Absolutely no laptops will be allowed in class during lectures. The only exception is during exams, to access the class notes online and to use the calculator functions. Even during the exam, you may not use any other software (e.g., R, python, matlab, etc.) for the computations.
[l] DA: Graph Analysis
[l] DA: Graph Analysis
[l] DA: Graph Analysis & Mining
[l] Data Mining and Analysis (DA): Introduction
[l] Data Mining and Analysis (DA): Algebraic and Probabilistic Views
[l]DA: Algebraic and Probabilistic Views
[l]DA: Numeric Attributes
[l] DA: Numeric Attributes
[row bgcolor=aliceblue] [l]R: Sep 20
[row bgcolor=aliceblue] [l]R: Sep 20 [l] DA: High Dimensional Analysis
[l] DA: High Dimensional Analysis
[row bgcolor=aliceblue] [l]R: Sep 27
[row bgcolor=aliceblue] [l]R: Sep 27 [l] Frequent Pattern Mining (FPM): Itemset Mining
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] FPM: Sequence Mining
[l] FPM: Sequence Mining
[l] DA: Graph Analysis
[l] DA: Graph Analysis
[l] FPM: Graph Mining
[l] FPM: Graph Mining
[l] FPM: Pattern Assessment
[l] FPM: Graph Mining
[l] Classification (CLASS): Bayesian Classifier
[l] FPM: Pattern Assessment
[l] CLASS: Decision Trees
[l] , Classification (CLASS): Linear Discriminants
[l] CLASS: Linear Discriminants & SVMs
[l] CLASS: Bayesian Classifier, Decision Trees
[row bgcolor=aliceblue] [l]R: Nov 15
[row bgcolor=aliceblue] [l]R: Nov 15 [l] Clustering (CLUS): Hierarchical, Partitional
[l] Clustering (CLUS): Hierarchical, Partitional
[l] CLUS: Density-based Clustering
[l] CLUS: Density-based Clustering
[l] CLUS: Spectral & Graph Clustering
[l] CLUS: Spectral & Graph Clustering
[l] CLUS: Subspace Clustering
[l]DA: Numeric Attributes
[l] Data Mining and Analysis (DA): Introduction
[l]DA: Numeric Attributes & Eigenvectors
[l]DA: Algebraic and Probabilistic Views
[l] DA: Categorical Data
[l] DA: Numeric Attributes
[l] DA: Graph Data
[l] DA: Kernel Approach
[l] DA: Graph Models
[l] DA: High Dimensional Analysis
[l] DA: Kernel Methods
[l] DA: Dimensionality Reduction
[l]DA: High Dimensional Analysis
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l]DA: Dimensionality Reduction
[l] FPM: Sequence Mining
[l] Frequent Pattern Mining (FPM): Itemset Mining
[l] DA: Graph Analysis
[l] FPM: Itemset Summaries & Sequence Mining
[l] FPM: Graph Mining
[l] FPM: Sequence Mining, Graph Mining
[l] FPM: Graph Mining
[l] FPM: Graph Mining, Classification (CLASS): Linear Discriminants
[l] FPM: Pattern Assessment
[l] CLASS: SVMs
[l] , Classification (CLASS): Linear Discriminants
[l] CLASS: Bayesian Classifier, Decision Trees
[l] CLASS: SVMs
[l] Clustering (CLUS): Partitional
[l] CLASS: Bayesian Classifier, Decision Trees
[l] CLUS: Hierarchical Clustering
[l] CLASS: Classifier Evaluation
[l] CLUS: Density-based Clustering,
[l] Clustering (CLUS): Hierarchical, Partitional
[l] CLUS: Subspace Clustering
[l] CLUS: Density-based Clustering
[l] Spectral & Graph Clustering
[l] CLUS: Spectral & Graph Clustering
[l] Evaluation & Assessment
[l] CLUS: Evaluation & Assessment
[l] CLASS: Linear Discriminants, Support Vector Machines (SVM)
[l] NO CLASS
[l] CLASS: SVMs
[l] EXAM II
[l]EXAM II
[l] CLASS: SVMs
- Aug 6: Students in the class must sign up for the Piazza course discussion site. All discussions and Q&A will be carried out using Piazza.
[!c]Chapters [!c]Lecture Notes
[!c]Readings
[l] NO CLASS%
[l] NO CLASS
[l] chap2.pdf
[l] lecture2.pdf
[l] NO CLASS NSF-RPI Workshop on Complex Data
[l]
[l] lecture3.pdf
[l] chap3.pdf
[l] lecture4.pdf
[l] chap4.pdf
[l] lecture5.pdf
[l]
[l] lecture6.pdf
[l] chap5.pdf
[l] lecture7.pdf
[l] chap6.pdf
[l] lecture8.pdf
[l] NO CLASS
[l] chap8.pdf
[l] lecture9.pdf
[l] chap10.pdf
[l] lecture10.pdf
[l] chap11.pdf, chap12.pdf
[l] lecture11.pdf
[l] chap13.pdf
[l] lecture12.pdf
[l] chap27.pdf
[l] lecture13.pdf
[l] chap28.pdf
[l] lecture14.pdf
[l]
[l] lecture15.pdf
[l] chap26.pdf
, chap24.pdf
[l] lecture16.pdf
[l] chap16.pdf
[l] lecture17.pdf
[l] chap17.pdf
[l] lecture18.pdf
[l] chap18.pdf
[l] lecture19.pdf
[l] chap19.pdf
[l] lecture20.pdf
[l] chap20.pdf
[l] lecture21.pdf
[l] chap21.pdf
[l] lecture22.pdf
- Nov 70: Assignment 6 has been posted.
- Nov 7: Assignment 5 has been posted.
- Oct 25: updated chap8.pdf on PCA, kernel PCA and SVD.
- Oct 24: Assignment 4 has been posted.
- Oct 14: Assignment 3 has been posted.
- Sep 25: Assignment 2 has been posted.
- Sep 17: Assignment 1 has been posted.
- Sep 14: Activate your piazza account
- Sep 12: Book chapters, as well as lectures are posted online after each lecture. Make sure to check the course website.
- Aug 18: Course website is up, with the tentative calendar and syllabus.
- Aug 6: Course website is up, with the syllabus and tentative calendar.
[l]M: Aug 29 [l]CLASSES CANCELLED
[l]M: Aug 27 [l]NO CLASS
[l]R: Sep 1
[l] Data Mining Overview & Data Analysis Foundations (DA): Algebraic & Probabilistic Views
[l] chap1.pdf
[l] Attach:dmintro.pptx,lecture1.pdf
[l]R: Aug 30 [l] NO CLASS%
[l]M: Sep 5 [l]Labor Day Holiday
[l]M: Sep 3 [l]Labor Day Holiday
[l]R: Sep 8
[l]R: Sep 6
[l]M: Sep 12
[l]M: Sep 10
[l]R: Sep 15
[l]R: Sep 13
[l]M: Sep 19
[l]M: Sep 17
[l]R: Sep 22
[l]R: Sep 20
[l]M: Sep 26
[l]M: Sep 24
[l]R: Sep 29
[l]R: Sep 27
[l]M: Oct 3
[l]M: Oct 1
[l]R: Oct 6
[l]R: Oct 4
[l]Tue: Oct 11
[l]Tue: Oct 9
[l]R: Oct 13
[l]R: Oct 11
[l]M: Oct 17
[l]M: Oct 15
[l]R: Oct 20
[l]R: Oct 18
[l]M: Oct 24
[l]M: Oct 22
[l]R: Oct 27
[l]R: Oct 25
[l]M: Oct 31
[l]M: Oct 29
[l]R: Nov 3
[l]R: Nov 1
[l]M: Nov 7
[l]M: Nov 5
[l]R: Nov 10
[l]R: Nov 8
[l]M: Nov 14
[l]M: Nov 12
[l]R: Nov 17
[l]R: Nov 15
[l]M: Nov 21
[l]M: Nov 19
[l]R: Nov 24
[l]R: Nov 22
[l]M: Nov 28
[l]M: Nov 26
[l]R: Dec 1
[l]R: Nov 29
[l]M: Dec 5
[l]M: Dec 3
[l]R: Dec 8
[l]R: Dec 6
CSCI-4390/6390: Data Mining, Fall 2011
CSCI-4390/6390: Data Mining, Fall 2012
Room: Carnegie 113\\
Room: TBA\\
TA: Amina Shabbeer
TA Office Hours: 4-5PM, TW, AE 304
TA Contact: shabba@rpi.edu
TA: TBA
TA Office Hours: TBA
TA Contact: TBA
[l] CLUS: Subspace Clustering, Spectral & Graph Clustering
[l] chap19.pdf, chap20.pdf
[l] CLUS: Subspace Clustering
[l] chap19.pdf
[l] lecture20.pdf
[l] Evaluation & Assessment
[l] Spectral & Graph Clustering
[l] chap20.pdf
[l] CLUS: Density-based Clustering, Subspace Clustering [l] chap18.pdf, chap19.pdf
[l] CLUS: Density-based Clustering,
[l] chap18.pdf
[l] lecture19.pdf
[l] CLUS: Spectral & Graph Clustering
[l] chap20.pdf
[l] CLUS: Subspace Clustering, Spectral & Graph Clustering
[l] chap19.pdf, chap20.pdf
[l] CLUS: Density-based Clustering [l] chap18.pdf
[l] CLUS: Density-based Clustering, Subspace Clustering [l] chap18.pdf, chap19.pdf
[l] CLUS: Subspace Clustering
[row bgcolor=aliceblue] [l]R: Dec 1
[l] chap20.pdf
[row bgcolor=aliceblue]
[l]R: Dec 1
[l] Evaluation & Assessment
[l] CLUS: Graph Clustering
[l] Evaluation & Assessment
[l] CLASS: Bayesian Classifier
[l] CLASS: Bayesian Classifier, Decision Trees
[l] chap26.pdf
, chap24.pdf
[l] FPM: Graph Mining
[l] FPM: Graph Mining, Classification (CLASS): Linear Discriminants
[l] lecture13.pdf
[l] chap27.pdf
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)
[l] chap27.pdf
, chap28.pdf
[l] CLASS: Linear Discriminants, Support Vector Machines (SVM)
[l] chap28.pdf
- Oct 25: updated chap8.pdf on PCA, kernel PCA and SVD.
[l] lecture11.pdf
[l] FPM: Sequence Mining, Graph Mining
[row bgcolor=aliceblue] [l]R: Oct 27
[row bgcolor=aliceblue] [l]R: Oct 27 [l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)
[l] CLASS: Decision Trees
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)
[l] CLASS: Bayesian Classifier
[l] CLASS: SVMs & Decision Trees
[l] Clustering (CLUS): Partitional
[l] CLASS: Bayesian Classifier
[l] CLUS: Partitional
[l] Clustering (CLUS): Partitional
[l] FPM: Sequence Mining
[l] FPM: Itemset Summaries & Sequence Mining [l] chap10.pdf, chap11.pdf
[l]DA: Dimensionality Reduction, Frequent Pattern Mining (FPM): Itemset Mining [l] chap8.pdf, chap10.pdf
[l]DA: Dimensionality Reduction
[l] chap8.pdf
[l] lecture9.pdf
[l] FPM: Itemsets and Sequences
[l] Frequent Pattern Mining (FPM): Itemset Mining [l] chap10.pdf [l]
[l] Graph Mining
[l] FPM: Sequence Mining
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)CLASS: SVMs
[l] FPM: Graph Mining
[l] CLASS: SVMs
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)
[l] NO CLASS
[row bgcolor=aliceblue] [l]R: Oct 13
[row bgcolor=aliceblue] [l]R: Oct 13 [l] FPM: Sequence Mining
[l] FPM:Graph Mining
[l] FPM: Itemsets and Sequences
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)
[l] Graph Mining
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)CLASS: SVMs
[row bgcolor=aliceblue] [l]R: Oct 27
[row bgcolor=aliceblue] [l]R: Oct 27 [l] CLASS: Decision Trees
[l] CLASS: Decision Trees
[row bgcolor=aliceblue] [l]R: Nov 3
[row bgcolor=aliceblue] [l]R: Nov 3 [l] CLASS: Ensembles & Classifier Assessment
[l]DA: High Dimensional Analysis & Dimensionality Reduction (PCA/SVD)
[l]DA: High Dimensional Analysis
[l] chap6.pdf
[l] lecture8.pdf
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l]DA: Dimensionality Reduction, Frequent Pattern Mining (FPM): Itemset Mining
[l] DA: Graph Models, Kernel Method
[l] DA: Graph Models
[l]
[l] lecture6.pdf
[l] DA: High Dimensional Analysis
[l] DA: Kernel Methods
[l]DA: Dimensionality Reduction (PCA/SVD)
[l]DA: High Dimensional Analysis & Dimensionality Reduction (PCA/SVD)
[l] DA: Graph Models
[l] DA: Graph Data
[l] chap4.pdf
[l] lecture5.pdf
[l] DA: Kernel Method
[l] DA: Graph Models, Kernel Method
- Sep 17: Assignment 1 has been posted.
[l] DA: Graph Data
[l] DA: Categorical Data
[l] chap3.pdf
[l] lecture4.pdf
[l]DA: Numeric & Categorical Attributes
[l]DA: Numeric Attributes & Eigenvectors
[l]
[l] lecture3.pdf
[l] Attach:chap1.pdf
[l] chap1.pdf
[l] Attach:chap2.pdf
[l] chap2.pdf
- Sep 12: Book chapters, as well as lectures are posted online after each lecture. Make sure to check the course website.
[l]
[l] Attach:chap1.pdf
[l]
[l] Attach:chap2.pdf
TA Office Hours: 4-5PM, TW, AE 217\\
TA Office Hours: 4-5PM, TW, AE 304\\
[l]DA: Numeric & Categorical Attributes
[l]DA: Numeric Attributes
[l]
[l] lecture2.pdf
TA: TBA
TA Office Hours: TBA
TA Contact: TBA
TA: Amina Shabbeer
TA Office Hours: 4-5PM, TW, AE 217
TA Contact: shabba@rpi.edu
[l]Data Mining Overview & Data Analysis Foundations (DA)
[l]CLASSES CANCELLED
[l] DA: Algebraic & Probabilistic Views
[l] Data Mining Overview & Data Analysis Foundations (DA): Algebraic & Probabilistic Views
You are expected to learn python on your own via web tutorials, etc.
You are expected to learn python on your own via web tutorials, etc. Assignments must be submitted via email to .
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]DA: Dimensionality Reduction (PCA/SVD)
[l]EDA: Frequent Pattern Mining (FPM): Itemset Mining
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l] NO CLASS NSF-RPI Workshop
[l] NO CLASS NSF-RPI Workshop on Complex Data
You are expected to learn python on your own via web tutorials, etc.
The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic.
The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic.
Your grade will be a combination of the following items. Note that the final distribution is subject to some change depending on the number of assignments, but exams will be at least 60%.
- Assignments (40%): The assignments are meant to be practically oriented. You'll be asked to run some mining methods on some real datasets, or to implement some algorithms, to complement the theory. There will be roughly one assignment per week, to be submitted via the course wiki site. User accounts will be created after first day of class.
Your grade will be a combination of the following items.
- Assignments (40%): The assignments are meant to be practically oriented. You'll be asked to implement some algorithms and apply them to real datasets, to complement the theory. There will be roughly one assignment every two weeks.
- Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible entirely responsible for brushing up on any missed material.
- Laptops: Absolutely no laptops will be allowed in class during lectures. The only exception is during exams, to access the class notes online and to use the calculator. Even during the exam, you may not use any other software (e.g., R, python, etc) for the computations, and you may not "browse" for solutions (you are not likely to find anything!).
- Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible for brushing up on any missed material.
- Laptops: Absolutely no laptops will be allowed in class during lectures. The only exception is during exams, to access the class notes online and to use the calculator. Even during the exam, you may not use any other software (e.g., R, python, matlab, etc.) for the computations, and you may not "browse" for solutions (you are not likely to find anything!).
Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: statistical foundations, pattern mining, classification, and clustering. Emphasis will be laid on the algorithmic foundations.
Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, classification, and clustering. Emphasis will be laid on the algorithmic approach.
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also essentially pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the R software. Students are expected to learn R on their own. Assignments must be submitted online at the wiki site. Knowledge of pmwiki markup usage will be your responsibility.
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also essentially pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the python language, with NumPy package for numeric computations.
[l] Classification (CLASS): Linear Discriminants
[l] Classification (CLASS): Linear Discriminants, Support Vector Machines (SVM)
[l] CLASS: Support Vector Machines (SVM)
[row bgcolor=aliceblue] [l]R: Oct 27
[row bgcolor=aliceblue] [l]R: Oct 27 [l] CLASS: Decision Trees
[l] CLASS: Decision Trees
[row bgcolor=aliceblue] [l]R: Nov 3
[row bgcolor=aliceblue] [l]R: Nov 3 [l] CLASS: Ensembles & Classifier Assessment
[l] CLASS: Ensembles & Classifier Assessment
[l] Clustering (CLUS): Partitional
[l] Clustering (CLUS): Partitional
[l] CLUS: Partitional
[l]DA: Numeric Attributes
[l]DA: Numeric & Categorical Attributes
[l]DA: Categorical Attributes
[l]DA: Numeric & Categorical Attributes
[l] DA: Graph Models
[row bgcolor=aliceblue] [l]R: Sep 29
[row bgcolor=aliceblue] [l]R: Sep 29 [l] DA: High Dimensional Analysis
[l]EDA: High Dimensional Analysis
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]EDA: Frequent Pattern Mining (FPM): Itemset Mining
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l] FPM: Sequence Mining
[l]FPM: Sequence Mining
[l] FPM:Graph Mining
[l]FPM:Graph Mining
[l] Classification (CLASS): Linear Discriminants
[l] Classification (CLASS): Linear Discriminant Analysis (LDA)
[l] CLASS: Support Vector Machines (SVM)
[l] CLASS: Support Vector Machines
[l] CLASS: SVMs
[l] CLASS: SVMs
[l] CLASS: Decision Trees
[l]CLASS: Bayesian Classifier
[l] CLASS: Bayesian Classifier
[l] CLASS: Decision Trees & Classifier Assessment
[l] CLASS: Ensembles & Classifier Assessment
[l]Data Mining Overview
[l]Data Mining Overview & Data Analysis Foundations (DA)
[l]Exploratory Data Analysis (EDA): Data Matrix
[l] DA: Algebraic & Probabilistic Views
[l]EDA: Numeric Attributes
[l]DA: Numeric Attributes
[l]Categorical Attributes
[l]DA: Categorical Attributes
[l]Categorical Attributes
[l] DA: Graph Data
[l] EDA: Graph Data Analysis
[l] DA: Graph Models
[l] EDA: Web Centralities
[l] DA: Graph Models
[l]EDA: Graph Models
[l] DA: Kernel Method
[l]EDA: High Dimensional Analysis & Dimensionality Reduction (PCA/SVD)
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]Classification (CLASS): Linear Discriminant Analysis (LDA)
[l] Classification (CLASS): Linear Discriminant Analysis (LDA)
[l]CLASS: Decision Trees & Classifier Assessment
[l] CLASS: Decision Trees & Classifier Assessment
[l]Clustering (CLUS): Partitional
[l] Clustering (CLUS): Partitional
[l]CLUS: Hierarchical Clustering
[l] CLUS: Hierarchical Clustering
[l]CLUS: Density-based Clustering
[l] CLUS: Density-based Clustering
[l]CLUS: Subspace Clustering
[l] CLUS: Subspace Clustering
[l]CLUS: Spectral Clustering
[l] CLUS: Spectral & Graph Clustering
[l]CLUS: Kernel K-means
[l] CLUS: Graph Clustering
[l] NO CLASS NSF-RPI Workshop%
[l] NO CLASS NSF-RPI Workshop
[l]EDA: Numeric Attributes
[l] NO CLASS NSF-RPI Workshop%
[l]M: Sep 6
[l]M: Sep 5
[l]R: Sep 9
[l]R: Sep 8
[l]M: Sep 13
[l]M: Sep 12
[l]R: Sep 16
[l]R: Sep 15
[l]M: Sep 20
[l]M: Sep 19
[l]R: Sep 23
[l]R: Sep 22
[l]M: Sep 27
[l]M: Sep 26
[l]R: Sep 30
[l]R: Sep 29
[l]M: Oct 4
[l]M: Oct 3
[l]R: Oct 7
[l]R: Oct 6
[l]Tue: Oct 12
[l]Tue: Oct 11
[l]R: Oct 14
[l]R: Oct 13
[l]M: Oct 18
[l]M: Oct 17
[l]R: Oct 21
[l]R: Oct 20
[l]M: Oct 25
[l]M: Oct 24
[l]R: Oct 28
[l]R: Oct 27
[l]M: Nov 1
[l]M: Oct 31
[l]R: Nov 4
[l]R: Nov 3
[l]M: Nov 8
[l]M: Nov 7
[l]R: Nov 11
[l]R: Nov 10
[l]M: Nov 15
[l]M: Nov 14
[l]R: Nov 18
[l]R: Nov 17
[l]M: Nov 22
[l]M: Nov 21
[l]R: Nov 25
[l]R: Nov 24
[l]M: Nov 29
[l]M: Nov 28
[l]R: Dec 2
[l]R: Dec 1
[l]M: Dec 6
[l]M: Dec 5
[l]R: Dec 9
[l]R: Dec 8
[l]M: Aug 30
[l]M: Aug 29
[l]R: Sep 2
[l]R: Sep 1
[l] chap1.pdf
[l] intro.pptx
, Lecture1.pdf
[l] [l]Lecture2.pdf
[l] chap2.pdf [l] Lecture3.pdf
[l] [l] Lecture4.pdf
[l] chap3.pdf [l] Lecture5.pdf
[l] [l] Lecture6.pdf
[l] Chap5.pdf
[l] Lecture7.pdf
[l] [l]Lecture8.pdf
[l] [l]Lecture9.pdf
[l] chap7.pdf [l]Lecture10.pdf
[l]chap9.pdf [l]Lecture11.pdf
[l] chap11.pdf [l]Lecture12.pdf
[l] chap13.pdf [l]Lecture13.pdf
[l] chap14.pdf [l]Lecture14.pdf
[l] chap29.pdf
[l]Lecture15.pdf
[l] chap30.pdf
[l]Lecture16.pdf
[l] [l] Lecture17.pdf
[l] chap28.pdf
[l] Lecture18.pdf
[l] chap26.pdf
[l] Lecture19.pdf
[l] chap17.pdf [l] Lecture20.pdf
[l] chap18.pdf [l]
[l] chap20.pdf
[l]
[l] chap21.pdf [l] Lecture21.pdf
[l] chap22.pdf [l] Lecture22.pdf
[l] chap6.pdf, and chap17, sec 17.3
[l] Lecture23.pdf
- Nov 14: Assign5 posted.
- Oct 29: Assign4 posted.
- Oct 17: Assign3 posted.
- Oct 5: updated chap5.pdf is online
- Sep 26: Assign2 posted.
- Sep 26: Check the chapter notes often for updates. Usually there is a date printed on top to indicate if there is a new version.
- Sep 21: Pranay will hold TA hours on 22nd (wed) between 12-1:45pm; he will not hold hours on friday (23rd).
- Sep 15: Assign1 posted. You may also want to check ou the Pmwiki guidelines and the Quick R Tutorial.
- Sep 3: Accounts for the Assignment page were mailed out. Contact me if you did not get that.
- Sep 1: First three chapters now posted online.
- Aug 2: Course website is up, with the tentative calendar and syllabus.
- Aug 18: Course website is up, with the tentative calendar and syllabus.
Lecture notes and videos from last year (Fall09) are available here.
CSCI-4390/6390: Data Mining, Fall 2010
CSCI-4390/6390: Data Mining, Fall 2011
TA: Pranay Anchuri
TA Office Hours: 12:00-1:50PM TF
TA Contact: AE106, x2857,
TA: TBA
TA Office Hours: TBA
TA Contact: TBA
[l]CLUS: Graph Clustering
[l]CLUS: Kernel K-means
[l] chap6.pdf, and chap17, sec 17.3
[l] Lecture23.pdf
[l] chap18.pdf [l]
[l] chap20.pdf
[l]
[l] chap21.pdf [l]
[l]
[l] chap28.pdf
[l]CLASS: Clustering (CLUS): Partitional (KMeans, EM)
[l]CLASS: Decision Trees & Classifier Assessment [l] [l] Lecture19.pdf
[l]CLASS: Decision Trees & Naive Bayes
[l]CLASS: Bayesian Classifier [l] [l] Lecture18.pdf
[l] CLASS: SVM + Decision Trees
[l] CLASS: SVMs [l] [l] Lecture17.pdf
[l]CLASS: Probabilistic Method
[l]CLASS: Decision Trees & Naive Bayes
[l] CLASS: SVM
[l] CLASS: Support Vector Machines
l] chap30.pdf
[l]Lecture16.pdf
[l]Classification (CLASS): Decision Trees
[l]Classification (CLASS): Linear Discriminant Analysis (LDA)
l] chap29.pdf
[l]Lecture15.pdf
[l] CLASS: Probabilistic Methods
[l] CLASS: SVM
[l] CLASS: Linear Discriminant Analysis (LDA)
[l] CLASS: SVM + Decision Trees
[l]CLASS: Support Vector Machines (SVM)
[l]CLASS: Probabilistic Method
[l]CLASS: Kernel SVMs
[l]CLASS: Clustering (CLUS): Partitional (KMeans, EM)
[l]Clustering (CLUS): Partitional (KMeans, EM)
[l]Clustering (CLUS): Partitional
[l]
[l] chap7.pdf
[l]chap9.pdf [l]Lecture11.pdf
[l] chap11.pdf
[l]Lecture9.pdf
[l]EDA: High Dimensional Analysis & Dimensionality Reduction (PCA/SVD)
[l]EDA: High Dimensional Analysis [l] [l]Lecture10.pdf
[l]EDA: High Dimensional Analysis & Dimensionality Reduction (PCA/SVD)
[row bgcolor=aliceblue] [l]R: Oct 14
[row bgcolor=aliceblue] [l]R: Oct 14 [l]FPM: Sequence Mining
[l]FPM: Sequence Mining
[row bgcolor=aliceblue] [l]R: Oct 21
[row bgcolor=aliceblue] [l]R: Oct 21 [l]Classification (CLASS): Decision Trees
[l]Classification (CLASS): Decision Trees
[row bgcolor=aliceblue] [l]R: Oct 28
[row bgcolor=aliceblue] [l]R: Oct 28 [l] CLASS: Linear Discriminant Analysis (LDA)
[l] CLASS: Linear Discriminant Analysis (LDA)
[row bgcolor=aliceblue] [l]R: Nov 4
[row bgcolor=aliceblue] [l]R: Nov 4 [l]CLASS: Kernel SVMs
[l]Clustering (CLUS): Partitional (KMeans, EM)
[l]CLASS: Kernel SVMs
[l]Clustering (CLUS): Partitional (KMeans, EM)
[row bgcolor=aliceblue] [l]R: Nov 18
[row bgcolor=aliceblue] [l]R: Nov 18 [l]CLUS: Density-based Clustering
[l]CLUS: Subspace Clustering
[l]CLUS: Density-based Clustering
[l]CLUS: Subspace Clustering
[row bgcolor=aliceblue] [l]R: Dec 2
[row bgcolor=aliceblue] [l]R: Dec 2 [l]CLUS: Graph Clustering
[l]Cluster Evaluation
[l]CLUS: Graph Clustering
[l]EDA: Graph Models & High Dimensional Data
[l]EDA: Graph Models
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]EDA: High Dimensional Analysis & Dimensionality Reduction (PCA/SVD)
[l] EDA: Graph Models
[l] EDA: Web Centralities [l] Lecture7.pdf
[l]EDA: High Dimensional Data
[l]EDA: Graph Models & High Dimensional Data
- Sep 26: Check the chapter notes often for updates. Usually there is a date printed on top to indicate if there is a new version.
Instructor Office Hours: 12-1PM, MR
Instructor Office Hours: 12-1PM, MR, Lally 307
[l] [l] Lecture7.pdf
[l]: EDA: Graph Models
[l] EDA: Graph Models
[l] EDA: Graph Data Analysis Graph Models
[l] EDA: Graph Data Analysis
[l]: EDA: Graph Models
[row bgcolor=aliceblue] [l]R: Sep 30
[row bgcolor=aliceblue] [l]R: Sep 30 [l]EDA: Dimensionality Reduction (PCA/SVD)
[l]EDA: Dimensionality Reduction (PCA/SVD)
[row bgcolor=aliceblue] [l]R: Oct 7
[row bgcolor=aliceblue] [l]R: Oct 7 [l]Frequent Pattern Mining (FPM): Itemset Mining
[l]Frequent Pattern Mining (FPM): Itemset Mining
[row bgcolor=aliceblue] [l]R: Oct 14
[row bgcolor=aliceblue] [l]R: Oct 14 [l]FPM:Graph Mining
[l]FPM:Graph Mining
[row bgcolor=aliceblue] [l]R: Oct 21
[row bgcolor=aliceblue] [l]R: Oct 21 [l] CLASS: Probabilistic Methods
[l] CLASS: Probabilistic Methods
[row bgcolor=aliceblue] [l]R: Oct 28
[row bgcolor=aliceblue] [l]R: Oct 28 [l]CLASS: Support Vector Machines (SVM)
[l]CLASS: Support Vector Machines (SVM)
[row bgcolor=aliceblue] [l]R: Nov 4
[row bgcolor=aliceblue] [l]R: Nov 4 [l]EXAM II
[l]CLASS: Graph Classification
[l]EXAM II
- Sep 21: Pranay will hold TA hours on 22nd (wed) between 12-1:45pm; he will not hold hours on friday (23rd).
[l]Categorical Attributes [l] [l] Lecture6.pdf
[row bgcolor=aliceblue] [l]R: Sep 23
[row bgcolor=aliceblue] [l]R: Sep 23 [l]EDA: High Dimensional Data
[l]EDA: Dimensionality Reduction (PCA)
[l]EDA: High Dimensional Data
[l]EDA: Dimensionality Reduction (SVD)
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]EDA: Categorical Attributes [l] chap3.pdf
[l]EDA: Numeric Attributes [l] [l] Lecture4.pdf
[l]EDA: Graph Data Analysis Graph Models
[l]Categorical Attributes
chap3.pdf
[l] EDA: Graph Models
[l] EDA: Graph Data Analysis Graph Models
[l] intro.pptx
, Lec1.pdf
[l] intro.pptx
, Lecture1.pdf
[l]Lec2.pdf
[l]Lecture2.pdf
[l] Intro.pptx, Lec1.pdf
[l] intro.pptx
, Lec11.pdf
[l]Lec2.pdf
[l]Lec2.pdf
[l] chap2.pdf
[l] chap2.pdf
[l] chap3.pdf
[l] chap3.pdf
[l]Exploratory Data Analysis (EDA): Numeric Attributes [l] chap2.pdf
[l]Exploratory Data Analysis (EDA): Data Matrix [l] Lec2.pdf
[l]EDA: Categorical Attributes [l] chap3.pdf
[l]EDA: Numeric Attributes [l] chap2.pdf
[l]EDA: Graph Data Analysis
[l]EDA: Categorical Attributes [l] chap3.pdf
[l]EDA: Graph Models
[l]EDA: Graph Data Analysis Graph Models
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l] EDA: Graph Models
[l]Clustering (CLUS): Partitional (KMeans, EM)
[l]EDA: High Dimensional Data
[l]Classification (CLASS): Decision Trees
[l]EDA: Dimensionality Reduction (PCA)
[l]EDA: High Dimensional Data
[l]EDA: Dimensionality Reduction (SVD)
[l]EDA: Dimensionality Reduction (PCA)
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l]FPM: Pattern Significance
[l]Classification (CLASS): Decision Trees
[l]CLASS: Classifier Evaluation
[l]Clustering (CLUS): Partitional (KMeans, EM)
- Sep 3: Accounts for the Assignment page were mailed out. Contact me if you did not get that.
[l] [l] ,
[l] chap1.pdf [l] Intro.pptx, Lec1.pdf
[l] chap2.pdf
[l] chap3.pdf
Lecture notes and videos from last year (Fall09) are available here.
[l]EDA: Eigenvalues Primer; Graph Data Analysis
[l]EDA: Graph Data Analysis
[l]EDA: High Dimensional Data
[l]EDA: Graph Models
[l]EDA: Dimensionality Reduction (PCA)
[row bgcolor=aliceblue] [l]R: Sep 23
[row bgcolor=aliceblue] [l]R: Sep 23 [l]Clustering (CLUS): Partitional (KMeans, EM)
[l]Clustering (CLUS): Partitional (KMeans, EM)
[row bgcolor=aliceblue] [l]R: Sep 30
[row bgcolor=aliceblue] [l]R: Sep 30 [l]EDA: High Dimensional Data
[l]FPM: Itemset Summaries
[l]EDA: Dimensionality Reduction (PCA)
[l] CLASS: Linear Discriminant Analysis (LDA)
[l]FPM: Pattern Significance
CLASS: Probabilistic Methods
[l] CLASS: Probabilistic Methods
[l] CLASS: Linear Discriminant Analysis (LDA)
[row bgcolor=aliceblue] [l]R: Oct 28
[row bgcolor=aliceblue] [l]R: Oct 28 [l]CLASS: Kernel SVMs, Graph Classification
[l]CLASS: Classifier Evaluation
[l]CLASS: Kernel SVMs
[l]CLUS: Hierarchical Clustering
[l]CLASS: Graph Classification
[l]CLUS: Density-based Clustering
[l]CLASS: Classifier Evaluation
[l]CLUS: Subspace Clustering
[l]CLUS: Hierarchical Clustering
[l]CLUS: Spectral Clustering
[l]CLUS: Density-based Clustering
[l]CLUS: Graph Clustering
[l]CLUS: Subspace Clustering
[l]Cluster Evaluation
[l]CLUS: Spectral Clustering
[l]Social Network Analysis (SNA)
[l]CLUS: Graph Clustering
[l]SNA: Graph Mining
[l]Cluster Evaluation
[l]EDA: Numeric & Categorical Attributes
[l]EDA: Categorical Attributes
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l]EDA: Eigenvalues Primer; Graph Data Analysis
[l]Clustering (CLUS): Partitional (KMeans, EM)
[l]EDA: High Dimensional Data
[l]Classification (CLASS): Decision Trees
[l]EDA: Dimensionality Reduction (PCA)
[l]EDA: High Dimensional Data
[l]Frequent Pattern Mining (FPM): Itemset Mining
[l]EDA: Dimensionality Reduction: PCA
[l]Clustering (CLUS): Partitional (KMeans, EM)
[l]EDA: Dimensionality Reduction: PCA/SVD
[l]Classification (CLASS): Decision Trees
[l]EDA: Linear Discriminant Analysis: LDA
[l]FPM: Itemset Summaries
[l]FPM: Itemset Summaries
[row bgcolor=aliceblue] [l]R: Oct 14
[row bgcolor=aliceblue] [l]R: Oct 14 [l]FPM:Graph Mining
[l]FPM:Sequence Mining, CLASS: Probabilistic
[l] CLASS: Linear Discriminant Analysis (LDA)
[l]CLASS: Support Vector Machines (SVM)
CLASS: Probabilistic Methods
[l]CLASS: SVM contd.
[l]CLASS: Support Vector Machines (SVM)
[l]CLASS: Kernel SVM, Rule-based
[l]CLASS: Kernel SVMs, Graph Classification
[l]CLUS: Hierarchical/Density-based Clustering
[l]CLUS: Hierarchical Clustering
[l]CLUS: Density-based Clustering (Kernel Density Estimation)
[l]CLUS: Density-based Clustering
[l]Kernel Methods: Kernel K-means
[l]CLUS: Graph Clustering
[l]Kernel Methods: Kernel PCA/LDA
[l]Cluster Evaluation
- knowledgeable about the fundamental data mining tasks like pattern mining, classification and clustering
- able to understand the key algorithms for the main tasks
- able to describe the fundamental data mining tasks like pattern mining, classification and clustering
- able to analyze the key algorithms for the main tasks
Announcements#Announcements
Calender#Calendar & Lecture Notes
Announcements
Announcements#Announcements
Calendar & Lecture Notes
Calender#Calendar & Lecture Notes
There is no required text for the course. Notes will be handed out in class.
There is no required text for the course. Notes will be posted online on the course webpage.
- Exams (60%): There will be three exams covering the main topics of the course. The tentative exam schedule is posted on the class schedule table. There is no comprehensive final exam.
Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes.
- Exams (60%): There will be three exams covering the main topics of the course. The tentative exam schedule is posted on the class schedule table. There is no comprehensive final exam. All exams are open book.
Other Policies
- Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible entirely responsible for brushing up on any missed material.
- Laptops: Absolutely no laptops will be allowed in class during lectures. The only exception is during exams, to access the class notes online and to use the calculator. Even during the exam, you may not use any other software (e.g., R, python, etc) for the computations, and you may not "browse" for solutions (you are not likely to find anything!).
- Late Assignments: Most assignments will be due just before midnight on the due date. Students get an automatic one day extension with 20% penalty. No late assignments will be accepted after the midnight following the due date.
You may consult other members of the class on the homeworks, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
TA Contact: AE106, x2857, anchup@rpi.edu
TA Contact: AE106, x2857, anchup@rpi.edu
TA Office Hours: TBD
TA Contact:
TA Office Hours: 12:00-1:50PM TF
TA Contact: AE106, x2857, anchup@rpi.edu
Class: 10-11:50AM, MR, Room: Carnegie 113\\
Class Time: MR 10-11:50AM
Room: Carnegie 113\\
TA & TA Office Hours: Pranay Anchuri, Hours TBD
TA: Pranay Anchuri
TA Office Hours: TBD
TA Contact:
Class: 10-11:50AM, MR, Room: TBD\\
Class: 10-11:50AM, MR, Room: Carnegie 113\\
TA & TA Office Hours: TBD
TA & TA Office Hours: Pranay Anchuri, Hours TBD
Calendar & Lecture Notes/Videos
Calendar & Lecture Notes
[l]EXAM III
[l]Social Network Analysis (SNA)
[l]Social Network Analysis (SNA)
[row bgcolor=aliceblue] [l]R: Dec 9
[row bgcolor=aliceblue] [l]R: Dec 9 [l]EXAM III
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Basics of linear algebra, and probability & statistics will be very useful as well. Assignments will require the use of the R software. Students are expected to learn R on their own. Assignments must be submitted online at the wiki site. Knowledge of pmwiki markup usage will be your responsibility.
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also essentially pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the R software. Students are expected to learn R on their own. Assignments must be submitted online at the wiki site. Knowledge of pmwiki markup usage will be your responsibility.
You may consult other members of the class on the homeworks, but you must submit your own work. Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
You may consult other members of the class on the homeworks, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
CSCI-4390/6390: Data Mining, Fall 2009
CSCI-4390/6390: Data Mining, Fall 2010
Class: 10-11:50AM, MR, Low 3045
Instructor Office Hours: 12-1PM, MR
Class: 10-11:50AM, MR, Room: TBD
Instructor Office Hours: 12-1PM, MR
TA & TA Office Hours: TBD
- Dec 4: Exam III solutions
have been posted.
- Dec 2: Solutions to Assignment 6 posted on the assignment page
- Nov 17: Assignment 6 posted.
- Nov 12: Exam II solutions
have been posted.
- Nov 4: Solutions to Assignment 5 posted on the assignment page.
- Oct 31: Solutions to Assignment 4 posted on the assignment page.
- Oct 24: Assignment 5 posted.
- Oct 13: Exam I solutions
have been posted.
- Oct 10: Assignment 4 has been posted.
- Oct 4: Solutions for Assignment 3 posted.
- Sep 27: Solutions for Assignments 1 and 2 have been posted on the respective pages.
- Sep 26: Assignment 3 is now available.
- Sep 18: Assignment 2 is now available.
- Sep 12: I have posted the notes below. They are time-stamped so that if I update them, you can check if your copy is the latest one or not.
- Sep 8: Assignment 1 has been posted. See the general R/pmwiki instruction at Assignments and see the specific assignment at Assign1
- Sep 2: Passwords for the assignment submission wiki were sent out yesterday. Contact me if you did not get the email.
- Aug 30: Slight update of the syllabus.
- Aug 19: Course website is up, with the tentative calendar and syllabus.
- Aug 2: Course website is up, with the tentative calendar and syllabus.
[!c]Video
[l]M: Aug 31
[l]M: Aug 30
[l] [l]PDF [l]
[l]R: Sep 3
[l]R: Sep 2
[l]PDF [l]PDF [l]Video
[l]M: Sep 7
[l]M: Sep 6
[l]R: Sep 10
[l]R: Sep 9
[l]PDF [l]PDF [l]Video
[l]M: Sep 14
[l]M: Sep 13
[l]PDF [l]PDF [l]Video
[l]R: Sep 17
[l]R: Sep 16
[l]PDF [l]PDF [l]Video
[l]M: Sep 21
[l]M: Sep 20
[l]PDF [l]PDF [l]Video
[l]R: Sep 24
[l]R: Sep 23
[l]PDF [l]PDF [l]Video
[l]M: Sep 28
[l]M: Sep 27
[l]PDF [l]PDF [l]Video
[l]R: Oct 1
[l]R: Sep 30
[l]PDF [l]PDF [l]Video
[l]M: Oct 5
[l]M: Oct 4
[l]R: Oct 8
[l]R: Oct 7
[l]PDF [l]PDF [l]Video
[l]Tue: Oct 13
[l]Tue: Oct 12
[l]PDF [l]PDF [l]Video
[l]R: Oct 15
[l]R: Oct 14
[l]PDF [l]PDF [l]Video
[l]M: Oct 19
[l]M: Oct 18
[l]PDF [l]PDF [l]Video
[l]R: Oct 22
[l]R: Oct 21
[l]PDF [l]PDF [l]Video
[l]M: Oct 26
[l]M: Oct 25
[l]PDF [l]PDF [l]Video
[l]R: Oct 29
[l]R: Oct 28
[l]PDF [l]PDF [l]Video
[l]M: Nov 2
[l]M: Nov 1
[l] [l]PDF [l]Video
[l]R: Nov 5
[l]R: Nov 4
[l]M: Nov 9
[l]M: Nov 8
[l]PDF [l]PDF [l]Video
[l]R: Nov 12
[l]R: Nov 11
[l]PDF [l]PDF [l]Video
[l]M: Nov 16
[l]M: Nov 15
[l]PDF [l]PDF [l]Video
[l]R: Nov 19
[l]R: Nov 18
[l] PDF [l]PDF [l]Video
[l]M: Nov 23
[l]M: Nov 22
[l]PDF [l]PDF [l]Video
[l]R: Nov 26
[l]R: Nov 25
[l]M: Nov 30
[l]M: Nov 29
[l]PDF [l]PDF [l]Video
[l]R: Dec 3
[l]R: Dec 2
[l]M: Dec 7
[l]M: Dec 6
[l]PDF [l]PDF [l]Video
[l]R: Dec 10
[l]R: Dec 9
[l]PDF [l]PDF [l]Video
- Dec 2: Solutions to Assignment 6 posted on the assignment page
[l]
[l]PDF
[l] [l] [l]
[l]PDF [l]PDF [l]Video
[l]CLUS: Kernel K-means
[l]Kernel Methods: Kernel K-means
[l] [l]
[l]PDF [l]Video
[l]CLASS: Kernel PCA/LDA
[l]Kernel Methods: Kernel PCA/LDA
[l]CLUS: Cluster Validity
[l]CLUS: Kernel K-means
[l]CLUS: Hierarchical/Density-based
[l]CLUS: Hierarchical/Density-based Clustering
[l]CLUS: Density-based (Kernel Density Estimation) [l]
[l]CLUS: Density-based Clustering (Kernel Density Estimation) [l]PDF
[l]
[l]PDF
[l]
[l]PDF
[l]CLUS: Subspace
[l]CLUS: Subspace Clustering
[l] [l]
[l]PDF [l]Video
- Nov 12: [((Attach:)exam2-sol.pdf | Exam II solutions]] have been posted.
- Nov 12: Exam II solutions
have been posted.
- Nov 12: [((Attach:)exam2-sol.pdf | Exam II solutions]] have been posted.
[l]CLUS: Hierarchical
[l]CLUS: Hierarchical/Density-based
[l]CLUS: Density-based
[l]CLUS: Density-based (Kernel Density Estimation)
[l] [l]
[l]PDF [l]Video
- Nov 4: Solutions to Assignment 5 posted on the assignment page.
[l]CLUS: Density-based
[l]CLUS: Hierarchical
[l]CLUS: Subspace
[l]CLUS: Density-based
[l]CLUS: Subspace contd.
[l]CLUS: Subspace
[l]CLASS: Kernel Methods (Kernel SVM)
[l]CLUS: Spectral Clustering
[l]CLASS: Kernel PCA/LDA
[l]CLUS: Cluster Validity
[l]CLUS: Spectral Clustering
[l]CLASS: Kernel PCA/LDA
[l]CLUS: Hierarchical
[l]CLASS: Classifier Evaluation
[l] [l]
[l]PDF [l]Video
- Oct 31: Solutions to Assignment 4 posted on the assignment page.
[l]CLASS: Instance-based/Rule-based
[l]CLASS: Kernel SVM, Rule-based
[l] [l]
[l]PDF [l]Video
[l] [l] [l]
[l]PDF [l]PDF [l]Video
[l]
[l]PDF
[l]
[l]PDF
[l]
[l]PDF
[l]
[l]PDF
[l] [l]
[l]PDF [l]Video
[l]CLASS: Instance-based/Rule-based
[l]CLASS: Support Vector Machines (SVM)
[l]CLASS: Support Vector Machines (SVM)
[l]CLASS: SVM contd.
[l]CLASS: SVM contd.
[l]CLASS: Instance-based/Rule-based
[l]CLASS: Instance-based/Rule-based
[l]FPM:Sequence Mining, CLASS: Probabilistic
[l]CLASS: Probabilistic
[l]CLASS: Instance-based/Rule-based
[l] Video
[l]Video
[l] [l]
[l]PDF [l]Video
[l]EXAM II
[l]CLUS: Hierarchical
[l]CLUS: Hierarchical
[l]EXAM II
[l]
[l]PDF
[l]
[l]PDF
[l]
[l]EDA: Linear Discriminant Analysis: LDA
[l]EDA: Dimensionality Reduction (PCA/SVD)
[l]EDA: Dimensionality Reduction: PCA
[l]EDA: Linear Discriminant Analysis (LDA)
[l]EDA: Dimensionality Reduction: PCA/SVD
[l] [l]
[l]PDF [l]Video
[l]FPM: Itemset Summaries
[l]
[l]FPM: Sequence Mining
[l]FPM: Itemset Summaries
[l]CLASS: Instance-based/Rule-based
[l]FPM: Sequence Mining
[l]CLASS: Probabilistic
[l]CLASS: Instance-based/Rule-based
[l]CLASS: Support Vector Machines (SVM)
[l]CLASS: Probabilistic
[l]CLASS: SVM contd.
[l]CLASS: Support Vector Machines (SVM)
[l]CLAS: Ensemble Methods
[l]CLASS: SVM contd.
[l] [l] [l]
[l]PDF [l]PDF [l]Video
- Sep 27: Solutions for Assignments 1 and 2 have been posted on the respective pages.
[l]PDF
[l]PDF
[l] [l]
[l]PDF [l]Video
[l]PDF
[l]PDF
[l]PDF
[l]PDF
[l]
[l]PDF
[l]
[l]PDF
[l]PDF
[l]PDF
[l]Clustering (CLUS): Partitional
[l]Clustering (CLUS): Partitional (KMeans, EM)
[l]PDF
[l]EDA: Numeric & Categorical Attributes
[l]EDA: Numeric & Categorical Attributes
[!c]Notes
[!c]Chapters [!c]Lecture Notes
[l]
[l]PDF
[l]
PDF [l]Video
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]
[l]CLUS: Density-based
[l]CLUS: Density-based
[l]M: Aug 31
[l]M: Aug 31
[l]T: Oct 13 (Monday Schedule)
[l]Tue: Oct 13
(:table border=1 width=100%:) (:cellnr bgcolor=lavender:) Day: Date (:cell bgcolor=lavender:) Topic (:cell bgcolor=lavender:)Notes (:cell bgcolor=lavender:)Video
[table border=1 width=100%] [row bgcolor=lavender] [!c]Day: Date [!c]Topic [!c]Notes [!c]Video
(:cellnr:) M: Aug 31
(:cell:) Data Mining Overview
(:cell:) PDF
(:cell:)
(:cellnr bgcolor=aliceblue:) R: Sep 3
(:cell:) Exploratory Data Analysis (EDA): Numeric Attributes
(:cell:) PDF
(:cell:) Video
[row]
[l]M: Aug 31
[l]Data Mining Overview
[l]PDF
[l]
[row bgcolor=aliceblue]
[l]R: Sep 3
[l]Exploratory Data Analysis (EDA): Numeric Attributes
[l]PDF
[l]Video
(:cellnr:) M: Sep 7 (:cell:) Labor Day Holiday (:cellnr bgcolor=aliceblue:) R: Sep 10 (:cell:) EDA: Numeric & Categorical Attributes (:cell:) PDF (:cell:) Video
[row] [l]M: Sep 7 [l]Labor Day Holiday [row bgcolor=aliceblue] [l]R: Sep 10 [l]EDA: Numeric & Categorical Attributes [l]PDF [l]Video
(:cellnr:) M: Sep 14 (:cell:) Frequent Pattern Mining (FPM): Itemset Mining (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Sep 17 (:cell:) Clustering (CLUS): Partitional (:cell:) (:cell:)
[row] [l]M: Sep 14 [l]Frequent Pattern Mining (FPM): Itemset Mining [l] [l] [row bgcolor=aliceblue] [l]R: Sep 17 [l]Clustering (CLUS): Partitional [l] [l]
(:cellnr:) M: Sep 21 (:cell:) Classification (CLASS): Decision Trees (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Sep 24 (:cell:) EDA: High Dimensional Data (:cell:) (:cell:)
[row] [l]M: Sep 21 [l]Classification (CLASS): Decision Trees [l] [l] [row bgcolor=aliceblue] [l]R: Sep 24 [l]EDA: High Dimensional Data [l] [l]
(:cellnr:) M: Sep 28 (:cell:) EDA: Dimensionality Reduction (PCA/SVD) (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 1 (:cell:) EDA: Linear Discriminant Analysis (LDA) (:cell:) (:cell:)
[row] [l]M: Sep 28 [l]EDA: Dimensionality Reduction (PCA/SVD) [l] [l] [row bgcolor=aliceblue] [l]R: Oct 1 [l]EDA: Linear Discriminant Analysis (LDA) [l] [l]
(:cellnr:) M: Oct 5 (:cell:) EXAM I (:cellnr bgcolor=aliceblue:) R: Oct 8 (:cell:) FPM: Itemset Summaries (:cell:) (:cell:)
[row] [l]M: Oct 5 [l]EXAM I [row bgcolor=aliceblue] [l]R: Oct 8 [l]FPM: Itemset Summaries [l] [l]
(:cellnr:) T: Oct 13 (Monday Schedule) (:cell:) FPM: Sequence Mining (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 15 (:cell:) CLASS: Instance-based/Rule-based (:cell:) (:cell:)
[row] [l]T: Oct 13 (Monday Schedule) [l]FPM: Sequence Mining [l] [l] [row bgcolor=aliceblue] [l]R: Oct 15 [l]CLASS: Instance-based/Rule-based [l] [l]
(:cellnr:) M: Oct 19 (:cell:) CLASS: Probabilistic (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 22 (:cell:) CLASS: Support Vector Machines (SVM) (:cell:) (:cell:)
[row] [l]M: Oct 19 [l]CLASS: Probabilistic [l] [l] [row bgcolor=aliceblue] [l]R: Oct 22 [l]CLASS: Support Vector Machines (SVM) [l] [l]
(:cellnr:) M: Oct 26 (:cell:) CLASS: SVM contd. (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 29 (:cell:) CLAS: Ensemble Methods (:cell:) (:cell:)
[row] [l]M: Oct 26 [l]CLASS: SVM contd. [l] [l] [row bgcolor=aliceblue] [l]R: Oct 29 [l]CLAS: Ensemble Methods [l] [l]
(:cellnr:) M: Nov 2 (:cell:) EXAM II (:cellnr bgcolor=aliceblue:) R: Nov 5 (:cell:) CLUS: Hierarchical (:cell:) (:cell:)
[row] [l]M: Nov 2 [l]EXAM II [row bgcolor=aliceblue] [l]R: Nov 5 [l]CLUS: Hierarchical [l] [l]
(:cellnr:) M: Nov 9 (:cell:) CLUS: Density-based (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Nov 12 (:cell:) CLUS: Subspace (:cell:) (:cell:)
[row] [l]M: Nov 9 [l]CLUS: Density-based [l] [l] [row bgcolor=aliceblue] [l]R: Nov 12 [l]CLUS: Subspace [l] [l]
(:cellnr:) M: Nov 16 (:cell:) CLUS: Subspace contd. (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Nov 19 (:cell:) CLASS: Kernel Methods (Kernel SVM) (:cell:) (:cell:)
[row] [l]M: Nov 16 [l]CLUS: Subspace contd. [l] [l] [row bgcolor=aliceblue] [l]R: Nov 19 [l]CLASS: Kernel Methods (Kernel SVM) [l] [l]
(:cellnr:) M: Nov 23 (:cell:) CLASS: Kernel PCA/LDA (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Nov 26 (:cell:) Thanksgiving Break
[row] [l]M: Nov 23 [l]CLASS: Kernel PCA/LDA [l] [l] [row bgcolor=aliceblue] [l]R: Nov 26 [l]Thanksgiving Break
(:cellnr:) M: Nov 30 (:cell:) CLUS: Spectral Clustering (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Dec 3 (:cell:) EXAM III
[row] [l]M: Nov 30 [l]CLUS: Spectral Clustering [l] [l] [row bgcolor=aliceblue] [l]R: Dec 3 [l]EXAM III
(:cellnr:) M: Dec 7 (:cell:) Social Network Analysis (SNA) (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Dec 10 (:cell:) SNA: Graph Mining (:cell:) (:cell:)
[row] [l]M: Dec 7 [l]Social Network Analysis (SNA) [l] [l] [row bgcolor=aliceblue] [l]R: Dec 10 [l]SNA: Graph Mining [l] [l]
(:tableend:)
[tableend]
(:cellnr bgcolor=aliceblue:) R: Sep 10 – EDA: Numeric & Categorical Attributes
(:cellnr bgcolor=aliceblue:) R: Sep 10 (:cell:) EDA: Numeric & Categorical Attributes
(:cellnr bgcolor=lavender:) Topic
(:cellnr bgcolor=lavender:) Day: Date (:cell bgcolor=lavender:) Topic
(:cellnr:) M: Aug 31 – Data Mining Overview
(:cellnr:) M: Aug 31 (:cell:) Data Mining Overview
(:cellnr bgcolor=aliceblue:) R: Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes
(:cellnr bgcolor=aliceblue:) R: Sep 3 (:cell:) Exploratory Data Analysis (EDA): Numeric Attributes
(:cellnr:) M: Sep 7 – Labor Day Holiday
(:cellnr:) M: Sep 7 (:cell:) Labor Day Holiday
(:cellnr:) M: Sep 14 – Frequent Pattern Mining (FPM): Itemset Mining
(:cellnr:) M: Sep 14 (:cell:) Frequent Pattern Mining (FPM): Itemset Mining
(:cellnr bgcolor=aliceblue:) R: Sep 17 – Clustering (CLUS): Partitional
(:cellnr bgcolor=aliceblue:) R: Sep 17 (:cell:) Clustering (CLUS): Partitional
(:cellnr:) M: Sep 21 – Classification (CLASS): Decision Trees
(:cellnr:) M: Sep 21 (:cell:) Classification (CLASS): Decision Trees
(:cellnr bgcolor=aliceblue:) R: Sep 24 – EDA: High Dimensional Data
(:cellnr bgcolor=aliceblue:) R: Sep 24 (:cell:) EDA: High Dimensional Data
(:cellnr:) M: Sep 28 – EDA: Dimensionality Reduction (PCA/SVD)
(:cellnr:) M: Sep 28 (:cell:) EDA: Dimensionality Reduction (PCA/SVD)
(:cellnr bgcolor=aliceblue:) R: Oct 1 – EDA: Linear Discriminant Analysis (LDA)
(:cellnr bgcolor=aliceblue:) R: Oct 1 (:cell:) EDA: Linear Discriminant Analysis (LDA)
(:cellnr:) M: Oct 5– EXAM I (:cellnr bgcolor=aliceblue:) R: Oct 8 – FPM: Itemset Summaries
(:cellnr:) M: Oct 5 (:cell:) EXAM I (:cellnr bgcolor=aliceblue:) R: Oct 8 (:cell:) FPM: Itemset Summaries
(:cellnr:) T: Oct 13 – (Monday Schedule) FPM: Sequence Mining
(:cellnr:) T: Oct 13 (Monday Schedule) (:cell:) FPM: Sequence Mining
(:cellnr bgcolor=aliceblue:) R: Oct 15 – CLASS: Instance-based/Rule-based
(:cellnr bgcolor=aliceblue:) R: Oct 15 (:cell:) CLASS: Instance-based/Rule-based
(:cellnr:) M: Oct 19 – CLASS: Probabilistic
(:cellnr:) M: Oct 19 (:cell:) CLASS: Probabilistic
(:cellnr bgcolor=aliceblue:) R: Oct 22 – CLASS: Support Vector Machines (SVM)
(:cellnr bgcolor=aliceblue:) R: Oct 22 (:cell:) CLASS: Support Vector Machines (SVM)
(:cellnr:) M: Oct 26 – CLASS: SVM contd.
(:cellnr:) M: Oct 26 (:cell:) CLASS: SVM contd.
(:cellnr bgcolor=aliceblue:) R: Oct 29 – CLAS: Ensemble Methods
(:cellnr bgcolor=aliceblue:) R: Oct 29 (:cell:) CLAS: Ensemble Methods
(:cellnr:) M: Nov 2 – EXAM II (:cellnr bgcolor=aliceblue:) R: Nov 5 – CLUS: Hierarchical
(:cellnr:) M: Nov 2 (:cell:) EXAM II (:cellnr bgcolor=aliceblue:) R: Nov 5 (:cell:) CLUS: Hierarchical
(:cellnr:) M: Nov 9 – CLUS: Density-based
(:cellnr:) M: Nov 9 (:cell:) CLUS: Density-based
(:cellnr bgcolor=aliceblue:) R: Nov 12 – CLUS: Subspace
(:cellnr bgcolor=aliceblue:) R: Nov 12 (:cell:) CLUS: Subspace
(:cellnr:) M: Nov 16 – CLUS: Subspace contd.
(:cellnr:) M: Nov 16 (:cell:) CLUS: Subspace contd.
(:cellnr bgcolor=aliceblue:) R: Nov 19 – CLASS: Kernel Methods (Kernel SVM)
(:cellnr bgcolor=aliceblue:) R: Nov 19 (:cell:) CLASS: Kernel Methods (Kernel SVM)
(:cellnr:) M: Nov 23 – CLASS: Kernel PCA/LDA
(:cellnr:) M: Nov 23 (:cell:) CLASS: Kernel PCA/LDA
(:cellnr bgcolor=aliceblue:) R: Nov 26 – Thanksgiving Break
(:cellnr bgcolor=aliceblue:) R: Nov 26 (:cell:) Thanksgiving Break
(:cellnr:) M: Nov 30 - CLUS: Spectral Clustering
(:cellnr:) M: Nov 30 (:cell:) CLUS: Spectral Clustering
(:cellnr bgcolor=aliceblue:) R: Dec 3 – EXAM III
(:cellnr bgcolor=aliceblue:) R: Dec 3 (:cell:) EXAM III
(:cellnr:) M: Dec 7 – Social Network Analysis (SNA)
(:cellnr:) M: Dec 7 (:cell:) Social Network Analysis (SNA)
(:cellnr bgcolor=aliceblue:) R: Dec 10 - SNA: Graph Mining
(:cellnr bgcolor=aliceblue:) R: Dec 10 (:cell:) SNA: Graph Mining
(:table border=1 width=70% align=left:)
(:table border=1 width=100%:)
\\
(:table border=1 width=80% align=center:)
(:table border=1 width=70% align=left:)
(:table align=center border=1 width=80%:)
(:table border=1 width=80% align=center:)
(:table align=center border=1 width=100%:)
(:table align=center border=1 width=80%:)
(:cell:)Notes (:cell:)Video
(:cell bgcolor=lavender:)Notes (:cell bgcolor=lavender:)Video
[table align=center border=1 width=100%]
(:table align=center border=1 width=100%:) (:cellnr bgcolor=lavender:) Topic (:cell:)Notes (:cell:)Video
(:cellnr:) M: Aug 31 – Data Mining Overview
(:cell:) PDF
(:cell:)
(:cellnr bgcolor=aliceblue:) R: Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes
(:cell:) PDF
(:cell:) Video
[row bgcolor=lavender] [!c] Mondays [!c] Thursdays
(:cellnr:) M: Sep 7 – Labor Day Holiday (:cellnr bgcolor=aliceblue:) R: Sep 10 – EDA: Numeric & Categorical Attributes (:cell:) PDF (:cell:) Video
[row]
[l]Aug 31 – Data Mining Overview: (PDF
)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes(Notes(PDF))(Video)
(:cellnr:) M: Sep 14 – Frequent Pattern Mining (FPM): Itemset Mining (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Sep 17 – Clustering (CLUS): Partitional (:cell:) (:cell:)
[row] [l]Sep 7 – Labor Day Holiday [l]Sep 10 – EDA: Numeric & Categorical Attributes (Notes (PDF))(Video)
(:cellnr:) M: Sep 21 – Classification (CLASS): Decision Trees (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Sep 24 – EDA: High Dimensional Data (:cell:) (:cell:)
[row] [l]Sep 14 – Frequent Pattern Mining (FPM): Itemset Mining [l]Sep 17 – Clustering (CLUS): Partitional
(:cellnr:) M: Sep 28 – EDA: Dimensionality Reduction (PCA/SVD) (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 1 – EDA: Linear Discriminant Analysis (LDA) (:cell:) (:cell:)
[row] [l]Sep 21 – Classification (CLASS): Decision Trees [l]Sep 24 – EDA: High Dimensional Data
(:cellnr:) M: Oct 5– EXAM I (:cellnr bgcolor=aliceblue:) R: Oct 8 – FPM: Itemset Summaries (:cell:) (:cell:)
[row] [l]Sep 28 – EDA: Dimensionality Reduction (PCA/SVD) [l]Oct 1 – EDA: Linear Discriminant Analysis (LDA)
(:cellnr:) T: Oct 13 – (Monday Schedule) FPM: Sequence Mining (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 15 – CLASS: Instance-based/Rule-based (:cell:) (:cell:)
[row] [l]Oct 5– EXAM I [l]Oct 8 – FPM: Itemset Summaries
(:cellnr:) M: Oct 19 – CLASS: Probabilistic (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 22 – CLASS: Support Vector Machines (SVM) (:cell:) (:cell:)
[row] [l]Oct 13 – (Monday Schedule) FPM: Sequence Mining [l]Oct 15 – CLASS: Instance-based/Rule-based
(:cellnr:) M: Oct 26 – CLASS: SVM contd. (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Oct 29 – CLAS: Ensemble Methods (:cell:) (:cell:)
[row] [l]Oct 19 – CLASS: Probabilistic [l]Oct 22 – CLASS: Support Vector Machines (SVM)
(:cellnr:) M: Nov 2 – EXAM II (:cellnr bgcolor=aliceblue:) R: Nov 5 – CLUS: Hierarchical (:cell:) (:cell:)
[row] [l]Oct 26 – CLASS: SVM contd. [l]Oct 29 – CLAS: Ensemble Methods
(:cellnr:) M: Nov 9 – CLUS: Density-based (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Nov 12 – CLUS: Subspace (:cell:) (:cell:)
[row] [l]Nov 2 – EXAM II [l]Nov 5 – CLUS: Hierarchical
(:cellnr:) M: Nov 16 – CLUS: Subspace contd. (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Nov 19 – CLASS: Kernel Methods (Kernel SVM) (:cell:) (:cell:)
[row] [l]Nov 9 – CLUS: Density-based [l]Nov 12 – CLUS: Subspace
(:cellnr:) M: Nov 23 – CLASS: Kernel PCA/LDA (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Nov 26 – Thanksgiving Break
[row] [l]Nov 16 – CLUS: Subspace contd. [l]Nov 19 – CLASS: Kernel Methods (Kernel SVM)
(:cellnr:) M: Nov 30 - CLUS: Spectral Clustering (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Dec 3 – EXAM III
[row] [l]Nov 23 – CLASS: Kernel PCA/LDA [l]Nov 26 – Thanksgiving Break
(:cellnr:) M: Dec 7 – Social Network Analysis (SNA) (:cell:) (:cell:) (:cellnr bgcolor=aliceblue:) R: Dec 10 - SNA: Graph Mining (:cell:) (:cell:)
[row] [l]Nov 30 - CLUS: Spectral Clustering [l]Dec 3 – EXAM III
[row] [l]Dec 7 – Social Network Analysis (SNA) [l]Dec 10 - SNA: Graph Mining
[tableend]
(:tableend:)
- Sep 12: I have posted the notes below. They are time-stamped so that if I update them, you can check if your copy is the latest one or not.
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes(Notes (PDF))(Video)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes(Notes(PDF))(Video)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes (Notes (PDF)) (Video)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes(Notes (PDF))(Video)
[l]Sep 10 – EDA: Numeric & Categorical Attributes (Notes (PDF))(Video)
[l]Sep 10 – EDA: Numeric & Categorical Attributes (Notes (PDF))(Video)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes (Video)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes (Notes (PDF)) (Video)
[l]Sep 10 – EDA: Numeric & Categorical Attributes (Video)
[l]Sep 10 – EDA: Numeric & Categorical Attributes (Notes (PDF))(Video)
[l]Sep 14 – Frequent Pattern Mining (FPM): Itemset Mining
[l]Sep 14 – Frequent Pattern Mining (FPM): Itemset Mining
[l]Sep 21 – Classification (CLASS): Decision Trees [l]Sep 24 – EDA: High Dimensional Data
[l]Sep 21 – Classification (CLASS): Decision Trees [l]Sep 24 – EDA: High Dimensional Data
[l]Sep 10 – EDA: Numeric & Categorical Attributes Video)
[l]Sep 10 – EDA: Numeric & Categorical Attributes (Video)
[l]Sep 10 – Frequent Pattern Mining (FPM): Itemset Mining
[l]Sep 10 – EDA: Numeric & Categorical Attributes Video)
[l]Sep 14 – Clustering (CLUS): Partitional [l]Sep 17 – Classification (CLASS): Decision Trees
[l]Sep 14 – Frequent Pattern Mining (FPM): Itemset Mining [l]Sep 17 – Clustering (CLUS): Partitional
[l]Sep 21 – EDA: High Dimensional Data [l]Sep 24 – EDA: Dimensionality Reduction (PCA/SVD)
[l]Sep 21 – Classification (CLASS): Decision Trees [l]Sep 24 – EDA: High Dimensional Data
[l]Sep 28 – EDA: SVD contd.
[l]Sep 28 – EDA: Dimensionality Reduction (PCA/SVD)
- Sep 8: Assignment 1 has been posted. See the general R/pmwiki instruction at Assignments and see the specific assignment at Assign1
[l]Aug 31 – Data Mining Overview: PDF
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes Video
[l]Aug 31 – Data Mining Overview: (PDF
)
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes (Video)
Calendar & Lecture Notes
Calendar & Lecture Notes/Videos
[l]Aug 31 – Data Mining Overview
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric and Categorical
[l]Aug 31 – Data Mining Overview: PDF
[l]Sep 3 – Exploratory Data Analysis (EDA): Numeric Attributes Video
[table align=center border=1]
[table align=center border=1 width=100%]
Class: 10-11:50AM, MR, Low 3045
Class: 10-11:50AM, MR, Low 3045\\
Calendar
Calendar & Lecture Notes
[l]Aug 31 – Data Mining Overview
[l]Aug 31 – Data Mining Overview
- Sep 2: Passwords for the assignment submission wiki were sent out yesterday. Contact me if you did not get the email.
CSCI-4390/6390: Data Mining, Fall 2009
Class: 10-11:50AM, MR, Low 3045 Instructor Office Hours: 12-1PM, MR
Announcements
(:table border=1 bgcolor=aliceblue width=100%:) (:cell:) (:div style="height: 200px; overflow: auto; text-align: justify; padding-top: 10px; padding-left:10px; padding-right:10px;" :)
- Aug 30: Slight update of the syllabus.
- Aug 19: Course website is up, with the tentative calendar and syllabus.
(:divend:) (:tableend:)
Calendar
A tentative sequence of topics to be covered in the classes; changes are likely as the course progresses.
[table align=center border=1]
[row bgcolor=lavender] [!c] Mondays [!c] Thursdays
[row] [l]Aug 31 – Data Mining Overview [l]Sep 3 – Exploratory Data Analysis (EDA): Numeric and Categorical
[row] [l]Sep 7 – Labor Day Holiday [l]Sep 10 – Frequent Pattern Mining (FPM): Itemset Mining
[row] [l]Sep 14 – Clustering (CLUS): Partitional [l]Sep 17 – Classification (CLASS): Decision Trees
[row] [l]Sep 21 – EDA: High Dimensional Data [l]Sep 24 – EDA: Dimensionality Reduction (PCA/SVD)
[row] [l]Sep 28 – EDA: SVD contd. [l]Oct 1 – EDA: Linear Discriminant Analysis (LDA)
[row] [l]Oct 5– EXAM I [l]Oct 8 – FPM: Itemset Summaries
[row] [l]Oct 13 – (Monday Schedule) FPM: Sequence Mining [l]Oct 15 – CLASS: Instance-based/Rule-based
[row] [l]Oct 19 – CLASS: Probabilistic [l]Oct 22 – CLASS: Support Vector Machines (SVM)
[row] [l]Oct 26 – CLASS: SVM contd. [l]Oct 29 – CLAS: Ensemble Methods
[row] [l]Nov 2 – EXAM II [l]Nov 5 – CLUS: Hierarchical
[row] [l]Nov 9 – CLUS: Density-based [l]Nov 12 – CLUS: Subspace
[row] [l]Nov 16 – CLUS: Subspace contd. [l]Nov 19 – CLASS: Kernel Methods (Kernel SVM)
[row] [l]Nov 23 – CLASS: Kernel PCA/LDA [l]Nov 26 – Thanksgiving Break
[row] [l]Nov 30 - CLUS: Spectral Clustering [l]Dec 3 – EXAM III
[row] [l]Dec 7 – Social Network Analysis (SNA) [l]Dec 10 - SNA: Graph Mining
[tableend]
Syllabus
(:table border=1 bgcolor=aliceblue width=100%:) (:cell:) (:div style="height: 400px; overflow: auto; text-align: justify; padding-top: 10px; padding-left:10px; padding-right:10px;" :)
Introduction
Data mining is the process of automatic discovery of patterns, models, changes, associations and anomalies in massive databases. This course will provide an introduction to the main topics in data mining and knowledge discovery, including: statistical foundations, pattern mining, classification, and clustering. Emphasis will be laid on the algorithmic foundations.
Learning Objectives
After taking this course students will be
- knowledgeable about the fundamental data mining tasks like pattern mining, classification and clustering
- able to understand the key algorithms for the main tasks
- able to implement and apply the techniques to real world datasets
Prerequisites
The pre-requisites for this course include data structures and algorithms and discrete mathematics. Basics of linear algebra, and probability & statistics will be very useful as well. Assignments will require the use of the R software. Students are expected to learn R on their own. Assignments must be submitted online at the wiki site. Knowledge of pmwiki markup usage will be your responsibility.
Textbook
There is no required text for the course. Notes will be handed out in class.
The following text books are also good references:
- Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2006.
- Data Mining: Concepts and Techniques (2nd edition), by Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2006.
Grading Policy
Your grade will be a combination of the following items. Note that the final distribution is subject to some change depending on the number of assignments, but exams will be at least 60%.
- Assignments (40%): The assignments are meant to be practically oriented. You'll be asked to run some mining methods on some real datasets, or to implement some algorithms, to complement the theory. There will be roughly one assignment per week, to be submitted via the course wiki site. User accounts will be created after first day of class.
- Exams (60%): There will be three exams covering the main topics of the course. The tentative exam schedule is posted on the class schedule table. There is no comprehensive final exam.
Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes.
Academic Integrity
You may consult other members of the class on the homeworks, but you must submit your own work. Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
The school takes cases of academic dishonesty very seriously, resulting in an automatic "F" grade for the course. Students should familiarize themselves with the relevant portion of the Rensselaer Handbook of Student Rights and Responsibilities on this topic. (:divend:) (:tableend:)