Lecture 1:
Motivating examples. Concrete example - boolean function of 3 boolean variables. Outline of general problem.
(1.1-1.7,[B]), (Chapter 1,[DH]), (Chapter 1,[M]),
(Chapter 1,2,[H]), (Chapter 1,[R])
Lecture 2:
Formal setup. Probability tools; Bayes theorem and
example; Bayes optimal rule/decision function.
(1.8-1.10,[B]), (Chapter 2,[DH]), (Chapter 1,2,3,[N])
, (2.1-2.4,[R])
Lecture 3:
Review of Bayes optimal rule. Minimum error rate loss
matrix for 2 class/2 action problem. Gaussian class conditional density and derivation of nearest mu rule. Derivation of perceptron rule.
(3.1-3.5,[B]), (Chapter 2,[DH]), (Chapter 6.1-6.5,[M]),
(Chapter 1,2,3,[N]), (3.6,[R])
Lecture 4:
Approach to Bayes optimal rule by starting at the perceptron and determining v,v0 by minimizing R_emp. Expressions for R_emp.Perceptron learning algorithm. Overview of second approach by smoothening the surface.
(3.1-3.5,[B]), (Chapter 5,[DH]), (6.1-6.5,[M])
Lecture 5:
Perceptron learning model. Softening the threshold.
Minimizing R_emp. Gradient descent and normalized gradient descent.
Expression for the gradients of the perceptron.
(3.1-3.5,[B]), (Chapter 5,[DH]), (Chapter 4,[M]),
(Chapter 3,[H])
Lecture 6:
Problems with the Perceptron.
Generalization to the multilayer perceptron. Computation of the output,
forward propagation.
(4.1-4.6,[B]), (Chapter 4,[M]),
(Chapter 3,4,[H]), (Chapter 5,[R])
Lecture 7:
Gradients via Backpropagation. Algorithm for
minimization of R_emp(w).
(4.8-4.9,[B]), (Chapter 4,[M]), (Chapter 5,[R]),
(Chapter 4,[H])
Summary of minimizing R_emp with neural networks using gradient decent
(postscript)
(pdf)
Breif overview of course so far: The learning problem to the Neural Network
/ MLP
(postscript)
(pdf)
Lecture 8:
Universal approximation with neural networks. Summary. Introduction to the generalization question. Coin model for functions.
(9.9,[B])
Lecture 9:
Coin Model for functions. Generalization performance
of
learning model with a single function. Generalization Perfoemance
of learning model with
a finite number of functions.
(Notes), (Chapter 7,[M])
Lecture 10:
Axiom of Non-falsifiability. Definition of
m(N), the growth function for a learning model L.
(3.10-3.11,[V]), (Chapter 7,[M]), (2.8,[R]),
(Chapter 2,[H])
Lecture 11:
Computation of M(N) for various learning
models: 1) positive ray, 2) positive interval, 3) positive rectangle
4) convex subsets. Bound for M(N) - either exponential growth or at most
polynomial. Definition of the VC dimension d_vc.
(Chapter 4,[V]), (Chapter 7,[M]), (2.8,[R]),
(Chapter 2,[H])
Lecture 12:
Proof of bound on m(N). Separation of
learning models
into good learning models and bad ones. VC dimension for perceptron. Bound
on VC dimension for neural networks.
(Notes), (Chapter 4,[V]), (Chapter 2.8,[R]),
(Chapter 2,[H])
Lecture 13:
VC theorem: Bound on probability
of generalization error.
Computation of sample complexity. Test error bound. The complexity
approximatability tradeoff.
(Notes), (Chapter 4,[V]),
(Chapter 2,[H])
Notes on Generalization (postscript) (pdf)
Lecture 14:
Relationship between VC bound and NFL. Use of prior
information. Bias and Variance.
(9.1,[B])
Lecture 15:
Bias and Variance continued. Introduction to
regularization - general approaches. Early Stopping.
(9.1,9.2.4,[B]), (Chapter 5,[R])
Lecture 16:
Cross Validation.
(9.8.1,[B]), (2.6,[R]), (4.14,[H])
Lecture 17:
Complexity penalties. Use of noise models to obtain error functions - maximum
likelihood. "Derivation" of weight decay.
(6.0-6.1,9.2,9.4,9.5,[B]), (Chapter 5,[R])
Lecture 18:
Regularization by addition of penalty terms to the error function
including other complexity regularizers.
Enforcing hints and other prior information such as rotation/reflection
symmetry and monotonicity using penalty terms. Choice of error
functions from risk preferences.
Choice of regularization parameters. Bagging and
bootstrap.
(6.0-6.11,9.2,9.4,9.5,[B]), (4.15,[H]), (4.6.5, 4.8.1,[M]),
(2.7,[R])
Lecture 19:
Committees/Voting/Boosting. Road map and where we go from here.
(9.6,9.7,10.7,[B]), (Chapter 7,[H])
Lecture 20:
Weight initialization and input preprocessing.
When to stop training. Approach to optimization algorithms.
(Chapter 8, [B])
Lecture 21:
Optimization: Zeroth order model - exhaustive search. First order model -
Gradient descent with fixed and variable learning rate. Steepest descent.
(Chapter 7, [B]), (5.3, [R]), (4.17,4.18, [R])
Lecture 22:
Steepest descent, momentum and conjugate gradient.
(Chapter 7, [B]), (5.3, [R]), (4.17,4.18, [R])
Lecture 23:
Conjugate gradient. Second order methods: Newton step, Levenberg
Marquardt methods.
(Chapter 7, [B]), (5.3, [R]), (4.17,4.18, [R])
Notes on Optimization (postscript) (pdf)
Lecture 24:
The Nearest Neighbor Rule.
(6.2, [R]), (4.6, [DH]), (2.5.4, [B]),
(8.2, [M])
Notes on the K-Nearest Neighbor Method (postscript) (pdf)
Lecture 25:
Radial Basis Functions
(Chapter 5, [B]), (Chapter 5, [H])
Notes on Radial Basis Functions (postscript) (pdf)
Lecture 26:
Gaussian Processes
Lecture 27:
Support Vector Machines
Chris Burges' Tutorial on Support Vector Machines (postscript) (pdf)