Search:

Assignments

# Assign5

## Assign 5: Due Date: 19th Nov, 2017, before midnight

Implement Algorithm 21.1 for the Dual SVM training based on SGA. Use hinge loss. After computing the alpha values, print support-vectors $$i, \alpha_i$$ such that $$\alpha_i > 0$$ and compute accuracy on test set.

CSCI6390 Only: In addition print $$\mathbf{w}$$ for linear and quadratic kernel. For this, you will have to map the points x into feature space, and then compute the weight vector.

## What to turn in

Write a script named assign5.py, which will be run as follows:
assign5.py TRAIN TEST C eps [linear OR quadratic OR gaussian ] spread
where TRAIN and TEST are the training and testing data file names, which will contain each point on a line, with comma separated attributes, and with the last attribute denoting the binary class; C is the regularization constant (a real number); linear, quadratic or gaussian denotes the kernel to use; and eps is the $$\epsilon$$ value for convergence. You can assume that quadratic means homogeneous quadratic kernel (see chap 5). If the kernel is a gaussian, you should also specify the spread parameter on the command line.

For the FILENAME dataset, you can assume that each line contains one feature vector, with "," as the separator, and the last feature/attribute denotes the class, which can be -1 or +1.

Save your output to a text file assign5.txt. It should contain the output of the print statements. Note that the accuracy is on the test set, i.e., after learning the $$\alpha_i$$ values, how many points are correctly classified divided by the total number of points.

Try your method on the following files:

assign5.py Attach:iris-virginica.txt 1 0.001 linear

assign5.py Attach:iris-versicolor.txt 1 0.001 linear (this dataset will not yield a good accuracy with linear kernel)

assign5.py Attach:iris-versicolor.txt 1 0.001 quadratic (with quadratic or gaussian we get good accuracy)

Finally, show your results on the following training and testing files: Attach:Concrete_Data_RNorm_Class_train.txt for training set, and Attach:Concrete_Data_RNorm_Class_test.txt for testing set. This is based on dataset as that at the UCI repository, but I have normalized the values to lie in the range 0 to 1. Try a "few" different values and report your results on the "best" combination of C & kernel (and spread if using gaussian).

Submit the script and output txt file via submitty.