From Mohammed J. Zaki

Dmcourse: Assign3

Assign3 (Due Date: 19th Oct, before midnight)

This assignment has two parts, the first is for all sections, whereas the second is for CSCI6390 only.

Logistic Regression

You will implement the stochastic gradient descent (SGD) algorithm for logistic regression, given a training set and a testing set. You may assume that the last column is the class or dependent variable. The fields are all comma separated. The training data is thus a set of \(n\) points \(\{(\mathbf{x}_i, y_i) \} \) for \(i=1,2,...,n\). Or we can denote all points as the \(n \times d\) matrix \(\mathbf{X}\), and the dependent variable as the \(n\times 1\) vector \(\mathbf{y}\).

The gradient of the weight vector \(\mathbf{w}\) at a training point \(\mathbf{x_k}\) is given as $$ \nabla_k = f(-y_k \mathbf{w}^T \mathbf{x}_k)\; y_k\; \mathbf{x}_k $$ where \(f(z)\) is the logistic function, given as $$f(z) = \frac{1}{1+exp(-z)}$$

The SGD algorithm is therefore as follows:

Input: \(\mathbf{X}, \mathbf{y}, \epsilon, \eta\)
\(\mathbf{X} = (\mathbf{X}, \mathbf{1})\) //Map points to 1 higher dimension by adding a column of 1s
\(d = d+1\)
\(\mathbf{w}\) = random d dimensional vector in the range [0,1]
repeat
\(\mathbf{w}_{prev} = \mathbf{w}\)
for \(k=1,2,...,n\) in random order do
\(\mathbf{w} = \mathbf{w} + \eta \cdot \nabla_k\)
until \( \|\mathbf{w} - \mathbf{w}_{prev}\| \le \epsilon\)

Next, predict the class for the points \(\mathbf{z}_j \) in the testing set, using the following approach:

\(\hat{y}_j = +1\), if \(f(\mathbf{w}^T \mathbf{z}_j) \ge 0.5\) or
\(\hat{y}_j = -1\), if \(f(\mathbf{w}^T \mathbf{z}_j) < 0.5\).

After this compute the accuracy of the prediction using the true class \(y_i\) for each test point \(\mathbf{z}_j \). Accuracy is given as \( Acc = \frac{\text{ number of cases where } y_i = \hat{y}_i}{n} \).

Kernel Ridge Regression (CSCI6390 Only)

You will implement kernel regression using linear, quadratic (homogeneous), or gaussian kernels.

Recall that the weight vector in feature space is given as \(\mathbf{w} = \sum_{i=1}^n c_i \phi(\mathbf{x}_i) \)

Let \(\mathbf{c} = (c_1, c_2, \cdots, c_n)^T \). There is a closed form solution for computing the \(c_i\) values, i.e., for \(\mathbf{c}\) when considering all \(n\) points, given as: $$\mathbf{c} = (\mathbf{K} + \alpha \cdot \mathbf{I})^{-1} \mathbf{y}$$ You can use np.linalg.inv to compute the above inverse.

After computing \(\mathbf{c}\) use the testing set to predict the \(\hat{y}_j\) values for each test point \(\mathbf{z}_j\), as follows: $$ \hat{y}_j = \sum_{i=1}^n c_i \mathbf{K}(\mathbf{x}_i, \mathbf{z}_j) $$

Finally, compute the "accuracy" by checking if \(sign(\hat{y}_j)\) matches the true value \(y_j\) for test case \(\mathbf{z}_j\); we are assuming here that the dependent attribute is a class variable with values \(\{+1, -1\}\), i.e., \( Acc = \frac{\text{ number of cases where } y_i = sign(\hat{y}_i)}{n} \)

What to turn in

Write a script named assign3.py (for part I) and submit via submitty. The script will be run as follows:
assign3.py TRAIN TEST eps eta

Print out the \(\mathbf{w}\) value for logistic regression, and print out the accuracy value. Try different values of eps and eta and report the best results in your output.

CSCI6390 Write and submit another script named assign3-kernel.py, which will be run as follows:
assign3.py TRAIN TEST [linear | quadratic | gaussian] [spread]
The spread value is used only for gaussian kernel. Use \(\alpha=0.01\) for the ridge value. Print the accuracy value on the test data.

Show your results on the following training and testing files: Attach:Concrete_Data_RNorm_Class_train.txt for training set, and Attach:Concrete_Data_RNorm_Class_test.txt for testing set. Save the results to a text file and submit as output.

For CSCI6390, for the gaussian kernel include the best value of accuracy by trying different spread values. Report accuracy values for linear and quadratic kernels as well in the output file.


You can develop your scripts using a smaller dataset: Attach:iris-virginica. On this the accuracy should be very high.

CSCI6390: For kernel ridge regression, you can evaluate your algorithms on the following dataset: Attach:iris-versicolor. The accuracy for the linear kernel will not be good, but it will be very good for the quadratic kernel, and even better for the gaussian kernel.
Retrieved from http://www.cs.rpi.edu/~zaki/www-new/pmwiki.php/Dmcourse/Assign3
Page last modified on October 18, 2017, at 06:18 PM