This course will provide an introduction to the main topics in data mining and knowledge discovery, including: algebraic and statistical foundations, pattern mining, classification, regression, and clustering. Emphasis will be laid on the algorithmic approach.

**For CSCI-4390**:
After taking this course students will be

- able to describe the fundamental data mining tasks like pattern mining, classification, regression and clustering
- able to analyze the key algorithms for the main tasks
- able to implement and apply the techniques to real world datasets

**For CSCI-6390**:
After taking this course students will be

- able to describe the fundamental data mining tasks like pattern mining, classification, regression and clustering
- able to analyze the key algorithms for the main tasks
- able to implement and apply the techniques to real world datasets
- able to demonstrate understanding of more advanced topics in data mining
- able to implement more advanced algorithms

The pre-requisites for this course include data structures and algorithms and discrete mathematics. Linear algebra and probability & statistics are also pre-requisites, though an attempt will be made to review the basic concepts. Assignments will require the use of the python language, with NumPy package for numeric computations. You are expected to learn python on your own via web tutorials, etc.

The main **required** textbook for the course is:

- Data Mining and Analysis: Fundamental Concepts and Algorithms, Mohammed J. Zaki and Wagner Meira, Jr, Cambridge University Press, 2014.

Readings from the book will be posted on the course schedule, and supplementary material will be provided when necessary.

Your grade will be a combination of the following items.

- Exams (60%): There will be three exams covering the main topics of the course. The tentative exam dates are noted on the class schedule table. There is no comprehensive final exam. All exams are open book.
- Assignments (40%): The assignments are meant to be practically oriented, thought they may include other questions. You'll be asked to implement some algorithms and apply them to real datasets, to complement the theory. There will be roughly one assignment every two weeks (5-6 assignments in total). There may be a final project assignment on some real-world data analysis challenge if a suitable public challenge problem is made available in the latter half of the semester.

The grading for CSCS4390 and CSCI6390 will be done separately taking into account the more advanced material required for CSCI6390 -- this includes extra/in-depth questions on the exam, and implementation of more advanced algorithms for the assignments. Typically, the CSCI-6390 students will be graded out of 110 on each exam instead of 100 for the extra question(s). Likewise, on the assignments they will be graded out of higher total points, reflecting the extra programming component. Note that grades for CSCI6390 cannot receive D/D+/D- grades.

- Attendance: Students are strongly encouraged to participate in the class, and should try to attend all classes. Students are responsible for any topics and assignments for the missed classes.
- Late Assignments: Most assignments will be due just before midnight on the due date. Students can get an automatic one day extension for a 15% grade penalty. No late assignments will be accepted after the midnight following the due date.

Students must work independently on all course assignments. You may consult other members of the class on the assignments, but you must submit your own work. For instance you may discuss general approaches to solving a problem, but you must implement the solution on your own (similarity detection software may be used). Anytime you borrow material from the web or elsewhere, you must acknowledge the source. Copying and pasting from published sources or the internet is considered plagiarism and is not acceptable. Plagiarized work will receive an automatic grade of zero.

Student-teacher relationships are built on trust. Acts which violate this trust undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities and The Rensselaer Graduate Student Supplement define various forms of Academic Dishonesty and procedures for responding to them. Submission of any assignment that is in violation with these policies will result in a penalty that is deemed by the instructor to be appropriate to the infraction ranging from a grade of zero on the assignment in question, to failure of the class as a whole. The student will also be reported to the Dean of Students or the Dean of Graduate Education as appropriate. Note that academic dishonesty will be dealt with severely and will be reported to the Dean of Students. If you have any questions concerning this policy before submitting an assignment, please ask for clarification.

Retrieved from http://www.cs.rpi.edu/~zaki/www-new/pmwiki.php/Dmcourse/Syllabus

Page last modified on December 30, 2017, at 02:09 PM