CSCI-4390/6390 Data Mining
Fall 2008
Introduction
With the unprecedented rate at which data is being collected today in
almost all fields of human endeavor, there is an emerging economic and
scientific need to extract useful information from it. Data mining is
the process of automatic discovery of patterns, changes, associations
and anomalies in massive databases. This course will provide an
introduction to the main topics in data mining and knowledge
discovery, including: statistical foundations, association
discovery, classification, clustering, database support, and so
on. Emphasis will be laid on the algorithmic and systems issues, as
well as application of mining in real-world problems.
Textbook
There is no required text for the course. Notes will be handed out
in class. Class notes from previous years are available by following
the links on the course web page. For example, the notes from last
year are available at:
http://www.cs.rpi.edu/%7Ezaki/dmcourse/fall07/notes/ .
The following text books are also good references:
Grading Policy
The pre-requisites for this course include data structures and
algorithms. Basics of linear algebra, and probability &
statistics will be very useful as well.
Your grade will be a combination of the following items:
Academic
Integrity
You may consult other members of the class on the homeworks, but you must submit your own work. Anytime you borrow material from the web or elsewhere, you must acknowledge the source.
The school takes cases
of academic dishonesty very seriously, resulting in an automatic "F"
grade for the course. Students should familiarize
themselves with the relevant portion of the Rensselaer Handbook of
Student Rights and Responsibilities on
this topic.