# Assign3 (Due Date: 22nd Oct, before midnight)

This assignment has two parts, the first is for all sections, whereas the second is for CSCI6390 only.

## Logistic Regression (CSCI4390: 50 points, CSCI6390: 25 pts)

You will implement the stochastic gradient ascent (SGA) algorithm for logistic regression, given a training set and a testing set. You may assume that the last column is the class or dependent variable. The fields are all comma separated. The training data is thus a set of $$n$$ points $$\{(\mathbf{x}_i, y_i) \}$$ for $$i=1,2,...,n$$. You will implement the LogisticRegression-SGA algorithm (Alg 2.1) in the lecture notes posted on the Piazza resources page.

Once the model has been trained, predict the class for the points $$\mathbf{z}_j$$ in the testing set, using the following approach:

$$\hat{y}_j = 1$$, if $$\theta(\mathbf{w}^T \mathbf{z}_j) \ge 0.5$$ or
$$\hat{y}_j = 0$$, if $$\theta(\mathbf{w}^T \mathbf{z}_j) < 0.5$$, where $$\theta$$ is the logistic function.

After this compute the accuracy of the prediction using the true class $$y_i$$ for each test point $$\mathbf{z}_j$$. Accuracy is given as $$Acc = \frac{\text{ number of cases where } y_i = \hat{y}_i}{n}$$.

## Kernel Ridge Regression (CSCI6390 Only: 25 points)

You will implement kernel regression using linear, quadratic (homogeneous), or gaussian kernels.

Recall that the weight vector in feature space is given as $$\mathbf{w} = \sum_{i=1}^n c_i \phi(\mathbf{x}_i)$$

Let $$\mathbf{c} = (c_1, c_2, \cdots, c_n)^T$$. There is a closed form solution for computing the $$c_i$$ values, i.e., for $$\mathbf{c}$$ when considering all $$n$$ points, given as: $$\mathbf{c} = (\mathbf{K} + \alpha \cdot \mathbf{I})^{-1} \mathbf{y}$$ You can use np.linalg.inv to compute the above inverse.

After computing $$\mathbf{c}$$ use the testing set to predict the $$\hat{y}_j$$ values for each test point $$\mathbf{z}_j$$, as follows: $$\hat{y}_j = \sum_{i=1}^n c_i \mathbf{K}(\mathbf{x}_i, \mathbf{z}_j)$$

Next, predict the class as follows:

$$a_j = 1$$, if $$\hat{y}_j \ge 0.5$$ or
$$a_j = 0$$, if $$\hat{y}_j < 0.5$$.

Finally, compute the "accuracy" by checking if the predicted class $$a_j$$ matches the true value $$y_j$$ for test case $$\mathbf{z}_j$$; we are assuming here that the dependent attribute is a class variable with values $$\{1, 0\}$$, i.e., $$Acc = \frac{\text{ number of cases where } y_i = a_i}{n}$$

## What to turn in

Write a script named assign3.py (for part I) and submit via submitty. The script will be run as follows:
assign3.py TRAIN TEST eps eta

Print out the $$\mathbf{w}$$ value for logistic regression, and print out the accuracy value. Try different values of eps and eta and report the best results in your output.

CSCI6390 Write and submit another script named assign3-kernel.py, which will be run as follows:
The spread value is used only for gaussian kernel. Use $$\alpha=0.01$$ for the ridge value. Print the accuracy value on the test data.

Show your results on the following training and testing files: Attach:Concrete_Data_RNorm_Class_train.txt for training set, and Attach:Concrete_Data_RNorm_Class_test.txt for testing set. Save the results to a text file and submit as output.

For CSCI6390, for the gaussian kernel include the best value of accuracy by trying different spread values. Report accuracy values for linear and quadratic kernels as well in the output file.

You can develop your scripts using a smaller dataset: Attach:iris-virginica. On this the accuracy should be very high.

CSCI6390: For kernel ridge regression, you can evaluate your algorithms on the following dataset: Attach:iris-versicolor. The accuracy for the linear kernel will not be good, but it will be very good for the quadratic kernel, and even better for the gaussian kernel.