Recent Changes - Search:

Main Page



Piazza Site


edit SideBar


Assignment 4 (Neural Networks): Due Date: Thurs 1st Nov, Before Midnight

In this assignment you will implement the MLP training Algorithm 3.1 on pg 63 in the (updated) regression chapters posted on the Piazza class resources. Note that Alg 3.1 uses sigmoid activations by default, which is fine for CSCI4390, but the students in CSCI6390 will have to modify the computation of the net gradient vectors (see below).

Given a comma-seprated datafile, whose last column denotes the class, you have to train a neural network. First, the size of the input layer \(d\) is determined by the number of attributes (other than the class), and the size of the output layer \(p\) is determined by the number of classes. You should input the size of the hidden layer \(m\) from the command line. You goal is to train a three-layer neural network, with a single hidden layer, as described in Alg 3.1.

After training, you should compute the accuracy of the classifier on the testing set. That is, the number of correct predictions divided by the total number of points. Note that if there are \(k\) classes, they should be coded as one-hot vectors, e.g., if there are three classes you should code them as \(\{1,0,0\}, \{0,1,0\}, \{0,0,1\} \).

CSCI4390 Only: For the activation function use sigmoid for both the hidden and output layer. This means you implement Alg 3.1 as given. The assumption is that you are using SSE error.

CSCI6390 Only: Use ReLU activation for the hidden layer, and softmax for the output layer. This means you have to modify how you compute the net gradient at the output and the hidden layers. For the hidden layer, to compute \(\mathbf{\delta}^h\) you have to use the derivative of the ReLU function as given in Sec 3.6.1. For the output layer, to compute \(\mathbf{\delta}^o\) you have to use the equation \(\partial \mathbf{F}^{h+1} \odot \partial {\cal E}\) as given in Eq 3.30 in Sec 3.6.2, subsection on Cross-Entropy Error (K outputs, softmax activation). The assumption is that you are using cross-entropy error.

What to turn in

Write a script named that will be run as follows: TRAIN TEST \(m\) \(\eta\) epochs
where TRAIN/TEST are the training and testing files, which will contain each point on a line, with comma separated attributes, and with the last attribute denoting the class (do NOT assume that there are only 2 classes); \(m\) is the size of the hidden layer, \(\eta\) is the learning step size, and epochs is the number of runs through the entire dataset.

Save your output to a text file assign4.txt. It should contain the weight matrices and bias vectors for both the hidden and output layers, and the accuracy value.

Try different \(m\) and \(\eta\) values, and report the best accuracy obtained on the shuttle dataset that contains 7 classes: Attach:shuttle.trn.txt for training set, and Attach:shuttle.tst.txt for testing set. You can test your code on the smaller Attach:iris-numeric.txt dataset.

Edit - History - Print - Recent Changes - Search
Page last modified on October 24, 2018, at 04:08 PM