Assignments |
Dmcourse /
## Assign4## Assign 4: Due Date: 2nd Nov, 2017, before midnightThe goal of this assignment is to implement a simplified version of the Sequential Minimal Optimization (SMO) algorithm by John Platt to train SVMs in the dual formulation. The SVM maximization problem is as follows: $$\max J(\mathbf{\alpha}) = \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i=1}^n\sum_{j=1}^n \alpha_i\alpha_j y_i y_j K(\mathbf{x}_i, \mathbf{x}_j)$$ subject to the constraints \(\sum_{i=1}^n \alpha_i y_i = 0\), and \(0 \le \alpha_i \le C\) for \(i=1,2,...,n\). The SMO algorithm solves the optimization problem for two values/points at a time, say \(\alpha_i\) and \(\alpha_j\), keeping the other values \(\alpha_k\) unchanged. This means it must maintain the following constraint before and after the update: $$\alpha_i y_i + \alpha_j y_j = const = \alpha'_i y_i + \alpha'_j y_j$$ where \(\alpha_i\) denotes the new value and \(\alpha'_i\) the old value. Assume that we update \(\alpha_j\) first. Then, since the above invariant has to be maintained, we get the following bounds on the value of \(\alpha_j\), so that \(L \le \alpha_j \le H\), where **case 1: \(y_i \ne y_j\)** $$L = \max(0, \alpha'_j - \alpha'_i)$$
$$H = \min(C, C - \alpha'_i + \alpha'_j)$$ **case 2: \(y_i = y_j\)** $$L = \max(0, \alpha'_i + \alpha'_j - C)$$
$$H = \min(C, \alpha'_i + \alpha'_j)$$ The update rule for \(\alpha_j\) is given as $$\alpha_j = \alpha'_j + \frac{y_j (E_i - E_j)}{\kappa_{ij}}$$ where $$\kappa_{ij} = K(\mathbf{x}_i, \mathbf{x}_i) + K(\mathbf{x}_j, \mathbf{x}_j) - 2K(\mathbf{x}_i, \mathbf{x}_j)$$ is the squared distance between the two points in feature space, and $$E_k = h(\mathbf{x}_k) - y_k = \Bigl(\sum_{j=1}^n \alpha_j y_j K(\mathbf{x}_j, \mathbf{x}_k) + b \Bigl) - y_k$$ is the difference between the predicted value and the true class for point \(\mathbf{x}_k\). One we have updated \(\alpha_j\) using the above equation, we have to clip its value to the interval \([L,H]\), and then we can update the value of \(\alpha_i\) as follows: $$ \alpha_i = \alpha'_i + y_i y_j (\alpha'_j - \alpha_j) $$ Note that even though \(E_k\) depends on \(b\), when we compute \(E_i - E_j\) above the \(b\) cancels out, so you do not need \(b\) when updating \(\alpha\)'s. So the basic SMO algorithm is to iterate over the points in the dataset for the choice of \(j\) and to then select random points \(i\), to create a possible pair of values \((i,j)\) to update. After one such round, we can compare the value of the new \(\mathbf{\alpha}\) compared to the previous set of values \(\mathbf{\alpha}'\), stopping when the distance between these two falls below some threshold \(\epsilon\). For computing \(b\), once you have found the \(\alpha\)'s, using Eq. (21.33) in the book. Also for computing the accuracy, use Eq. (22.2) in the book. Also, when choosing the points, we want to make sure that their \(\alpha\) value is not already close to the limit, i.e., we want to make sure that \(\alpha > 0\) and \(\alpha < C\). To tackle small precision issues, we use a \(tol = 10^{-5}\) value, and we make sure that \(\alpha \ge tol\) and \(\alpha \le C-tol\). Finally, we use the The complete algorithm in pseudo-code is given as follows:
\(\vec{\alpha}_{prev} = \vec{\alpha}\)
for \(j=1,2,...,n\) do
if tryall = False and ( \(\alpha_j - tol < 0 \text{ or } \alpha_j + tol > C\)) then
skip to next \(j\)
for \(i=1,2,...,n\) in random order such that \(i \ne j\) do
if tryall = False and ( \(\alpha_i - tol < 0 \text{ or } \alpha_i + tol > C\)) then
skip to next \(i\)
compute \(\kappa_{ij}\) based on kernel type (linear or quadratic)
if \(\kappa_{ij} = 0\) skip to next \(i\)
\(\alpha'_j = \alpha[j]\) and \(\alpha'_i = \alpha[i]\)
compute L and H based on the two cases
if \(L = H\) skip to next \(i\)
compute \(E_i\) and \(E_j\)
\(\alpha[j] = \alpha'_j + \frac{y_j (E_i - E_j)}{\kappa_{ij}}\)
if \(\alpha[j] < L\) then \(\alpha[j] = L\)
else if \(\alpha[j] > H\) then \(\alpha[j] = H\)
\(\alpha[i] = \alpha'_i + y_i y_j (\alpha'_j - \alpha[j])\)
end for
end for
if tryall then tryall = False
## What to turn inWrite a script named For the FILENAME dataset, you can assume that each line contains one feature vector, with "," as the separator, and the last feature/attribute denotes the class, which can be -1 or +1. Save your output to a text file Try your method on the following files: assign4.py Attach:iris-virginica.txt 1 0.001 linear assign4.py Attach:iris-versicolor.txt 1 0.001 linear (this dataset will not yield a good accuracy with linear kernel) assign4.py Attach:iris-versicolor.txt 1 0.001 quadratic (with quadratic or gaussian we get good accuracy) Finally, show your results on the following training and testing files: Attach:Concrete_Data_RNorm_Class_train.txt for training set, and Attach:Concrete_Data_RNorm_Class_test.txt for testing set. This is based on dataset as that at the UCI repository, but I have normalized the values to lie in the range 0 to 1. Try a "few" different values and report your results on the "best" combination of C & kernel (and spread if using gaussian). Submit the script and output txt file via submitty. |

Page last modified on October 30, 2017, at 11:34 PM