From Data Mining and Analysis

Main: NumericDataAnalysis

Download the magic04.txt data file. More information on this dataset can be obtained from UCI ML Repository. The dataset has 10 real attributes, and the last one is simply the class label, which is categorical, and which you will ignore for this assignment. Assume that attributes are numbered starting from \(0\).

Write a script to answer the following questions.

  1. Compute the multivariate mean vector
  2. Compute the sample covariance matrix as inner products between the columns of the centered data matrix (see Eq. (2.30) in chapter 2).
  3. Compute the sample covariance matrix as outer product between the centered data points (see Eq. (2.31) in chapter 2)
  4. Compute the correlation between Attributes 1 and 2 by computing the cosine of the angle between the centered attribute vectors. Plot the scatter plot between these two attributes.
  5. Assuming that Attribute 1 is normally distributed, plot its probability density function.
  6. Which attribute has the largest variance, and which attribute has the smallest variance? Print these values.
  7. Which pair of attributes has the largest covariance, and which pair of attributes has the smallest covariance? Print these values.
Retrieved from http://www.cs.rpi.edu/~zaki/dataminingbook/pmwiki.php/Main/NumericDataAnalysis
Page last modified on September 06, 2014, at 01:03 PM