CSCI4390-6390 Assign5
Assign5: Bayes Classifier
Due Date: Oct 31, before midnight (11:59:59PM)
Dataset
Download the Steel Industry Energy Consumption Dataset from the UCI Machine Learning repository. Extract the Steel_industry_data.csv datafile. You should parse and store the data as a data matrix, focusing only on the 6 continuous attributes (see datafile or link above for names/descriptions). Thus, your data matrix will have 35040 points in 6 dimensions. However, in addition you should record the last attribute (load type) for each point. We will use this as the class label.
You should first randomly shuffle all points, and then take the first 80% of the data as training and the remaining 20% as testing. You must do this via sklearn.train_test_split, using 42 as the random_state.
Part I: Bayes Classifier
Implement both the full Bayes classifier in Algo 18.1, and the naive Bayes classifier in Algo 18.2.
Students in CSCI6390, should in addition, implement the K-nearest-neighbor classifier, and present the best results trying different K values.
Estimate parameters using the training data, and report the accuracy of the testing set. You must report total accuracy, and the class-specific accuracy and recall values -- see Eq 22.3 and 22.4 for the later two. Report these values for all methods.
You may use scipy multivariate_normal.pdf to compute the normal probability density function.
Part II: Questions
Submit your solutions to the following questions:
- Chapter 18: Q3
What to submit
- Submit your notebook named as assign5.ipynb.
Policy on Academic Honesty
You are free to discuss how to tackle the assignment, but all coding must be your own. Any AI tool use must be declared. Any students caught violating the academic honesty principle (e.g., code similarity, or failure to disclose AI tools) will get an automatic F grade on the course and will be referred to the dean of students for disciplinary action.