# Dmcourse: Assign5

## Assignment 5 (Neural Networks): Due Date: Mon 20th Nov, Before Midnight

In this assignment you will implement the backpropagation algorithm to train a three-layer neural network.

Given a comma-seprated datafile, whose last column denotes the class, you have to train a neural network. First, the size of the input layer $$N_i$$ is determined by the number of attributes (other than the class), and the size of the output layer $$N_o$$ is determined by the number of classes. You should input the size of the hidden layer $$N_h$$ from the command line. After training, you should compute the accuracy of the classifier on the testing set. Note that if there are $$k$$ classes, they should be coded as 1-of-k vectors, e.g., if there are three classes you should code them as $$\{1,0,0\}, \{0,1,0\}, \{0,0,1\}$$. Thus the classes are converted into binary vectors, i.e., a string/scalar value $$y_i$$ is converted into a vector $$\mathbf{y}_i$$.

The pseudo-code for feedforward and backpropagation based training is given below.

Input: Dataset: $$\{\mathbf{x}_i, \mathbf{y}_i\}_{i=1}^n, N_h$$, $$\eta$$, epochs
Output: weights (inner to hidden, hidden to output), accuracy

for $$e = 0,..., epochs-1$$

for each point $$\mathbf{x}_i$$ in random order do
$$\hat{\mathbf{y}}_i$$ = feedforward$$(\mathbf{x}_i)$$
$$E = \frac{1}{2} \| \hat{\mathbf{y}}_i - \mathbf{y}_i \|^2$$
if $$E > 0$$ do
backpropagation$$(\mathbf{x}_i, \mathbf{y}_i, \hat{\mathbf{y}}_i)$$

print the weight matrices (input-to-hidden and hidden-to-output)
compute accuracy on testing set
print accuracy

You must implement the feedforward and backpropagation methods. For initializing the synapse weights, set them randomly between -0.1 and +0.1.

Use the feedforward step to compute the $$net_j$$ value at each unit, then use the logistic sigmoid function to compute the output value $$o_j$$ at each hidden and output layer. Obviously the output of the input layer neurons is the value itself $$x_i$$. Make sure that you also add an extra input neuron for the bias terms for the hidden layer, and an extra hidden neuron for the biases at the output layer.

For the back propagation, compute the $$\delta_j$$ values at each output and hidden layer neuron. This way the weight $$w_{ij}$$ between neuron i and neuron j (either for input-hidden or hidden-output layers) can be updated as follows: $$w_{ij}^{new} = w_{ij} - \eta \cdot \nabla_{w_{ij}}$$ where $$\nabla_{w_{ij}} = o_i \cdot \delta_j$$ Once new weights have been computed you go back to the feedforward step as shown in the main algorithm.

For the back propagation step, first compute the $$\delta_j$$ values, and then update the weights $$w_{ij}$$. For the output layer, we have: $$\delta_j = (o_j - t_j) \cdot o_j \cdot (1 - o_j)$$ Here $$t_j$$ is the true output, and $$o_j$$ is the output of the j-th output neuron. For the hidden layer, we have: $$\delta_j = o_j \cdot (1 - o_j) \cdot \sum_{k=1}^{N_o} \delta_k w_{jk}$$ Here $$o_j$$ is the output of the j-th hidden neuron, and the $$\delta_k$$ are the values computed above for the $$N_o$$ output neurons.

## CSCI6390 Only:

In addition to using sigmoid activation for the hidden layer, you must implement the case when we use ReLU activation at the hidden layer only $$ReLU(net_i) = max \{0, net_i\}$$ The output layer will remain sigmoid.

So in the feedforward phase, you should use ReLU for hidden layer and sigmoid for output layer.

During backpropagation, the derivative of ReLU is either 0 or 1, based on the $$net_i$$ value, so that at the hidden layer we have $$\delta_j = deriv_j \cdot \sum_{k=1}^{N_o} \delta_k w_{jk}$$ where $$deriv_j = 1 \text{ if } net_j > 0, \text{ and } 0 \text { otherwise}$$

## What to turn in

Write a script named assign5.py that will be run as follows:
assign5.py TRAIN TEST $$N_h$$ $$\eta$$ epochs [sigmoid|relu]
where TRAIN/TEST are the training and testing files, which will contain each point on a line, with comma separated attributes, and with the last attribute denoting the class (do NOT assume that there are only 2 classes); $$N_h$$ is the size of the hidden layer, $$\eta$$ is the learning step size, and epochs is the number of runs through the entire dataset. Note that the last option is to be used only by CSCI6390; it is either "sigmoid" or "relu" denoting which activation function to use for the hidden layer.

Also, $$N_h$$ does not include the bias; you should add an extra bias hidden neuron (for the output layer neurons), and you should also include an extra bias input neuron (for the hidden layer neurons).

Save your output to a text file assign5.txt. It should contain the output of the weights (and biases), and the accuracy value.

Try different $$N_h$$ and $$\eta$$ values, and report the best accuracy obtained on the shuttle dataset that contains 7 classes: Attach:shuttle.trn.txt for training set, and Attach:shuttle.tst.txt for testing set. Note that it can take a while to converge, and you should try to use matrix operations to speed up the computations. You can test your code on the smaller Attach:iris-numeric.txt dataset.