Learning From Data: Lecture-Slides
The first 15 lecture-slides
are a companion to the textbook
Learning From Data,
by Abu-Mostafa, Magdon-Ismail, Lin.
Part Ⅰ: Foundations. (lectures 1-15)
Part Ⅱ: Techniques. (lectures 16-27)
(Part Ⅱ is based on `dynamic e-Chapters' available
on the book-forum; visit
www.amlbook.com for details.)
The slides can be
used for self study and are also available to
instructors
who wish to teach a course based on the book.
The slides are available as is with no explicit or implied warranties.
The copyright for all material remains with the original copyright
holder (in almost all cases the authors of the
"Learning From Data" book).
Ⅰ. Foundations
Lecture 1. |
Introduction and Motivation
A pattern exists; we don't know it; we have data to learn it. Netflix; Credit Approval,... |
LFD Chapter 1 (The Learning Problem) § 1.1 |
Lecture 2. |
General Setup and the Perceptron
The linear separator; types of learning: supervised, reinforcement, unsupervised; a puzzle. |
LFD Chapter 1 (The Learning Problem) § 1.1, 1.2 |
Lecture 3. |
Is Learning Feasible
Can we reach outside the data? Probability to the rescue - Hoeffding's lemma. |
LFD Chapter 1 (The Learning Problem) § 1.3 |
Lecture 4. |
Real Learning is Feasible
Real learning, the 2 step solution. Back to reality: error measures and noisy targets. |
LFD Chapter 1 (The Learning Problem) § 1.3, 1.4 |
Lecture 5. |
Training Versus Testing
Toward an "effective size" for infinite hypothesis sets: the growth function. |
LFD Chapter 2 (Training Versus Testing) § 2.1.1 |
Lecture 6. |
Bounding the Growth Function
A polynomial bound on the growth function and the VC generalization bound. |
LFD Chapter 2 (Training Versus Testing) § 2.1.2 |
Lecture 7. |
The VC Dimension, Bias and Variance
Approximation versus generalization; bias-variance analysis and learning curves. |
LFD Chapter 2 (Training Versus Testing) § 2.1.3, 2.2, 2.3 |
Lecture 8. |
Linear Classification,
Regression
Non-separable data; linear regression using the pseudo-ivnerse. |
LFD Chapter 3 (The Linear Model) § 3.1, 3.2 |
Lecture 9. |
Logistic Regression and
Gradient Descent
Estimating a probability: the cross entropy error and gradient descent minimization. |
LFD Chapter 3 (The Linear Model) § 3.3 |
Lecture 10. |
Non Linear Transforms
Nonlinear hypotheses using the non-linear transform. |
LFD Chapter 3 (The Linear Model) § 3.4 |
Lecture 11. |
Overfitting
When are simpler models better than complex ones? Deterministic and stochastic noise |
LFD Chapter 4 (Overfitting) § 4.1 |
Lecture 12. |
Regularization
Constraining a model toward simpler hypotheses to combat noise. |
LFD Chapter 4 (Overfitting) § 4.2 |
Lecture 13. |
Validation and Model Selection
Estimating out-of-sample error and its use to make high level choices in learning. |
LFD Chapter 4 (Overfitting) § 4.3 |
Lecture 14. |
Three Learning Principles
Occam's razor (choosing hypotheses); sampling bias (getting data); data snooping (handling data). |
LFD Chapter 5 (Three Learning Priniciples) § 5 |
Lecture 15. |
Reflecting on Our Path - Epilogue to Part I
What we learned; the ML jungle; the path forward. | LFD |
Ⅱ. Techniques
Lecture 16. |
Similarity and Nearest Neighbor
Similarity and the simplest learning rule of all: k-nearest neighbor. | LFD Dynamic e-Chapter 6 |
Lecture 17. |
Memory and Efficiency in Nearest Neighbor
Data condensing and branch and bound search for the nearest neighbor. | LFD Dynamic e-Chapter 6 |
Lecture 18. |
Radial Basis Functions
A "soft" generalization of the nearest neighbor method and our first truly nonlinear model. | LFD Dynamic e-Chapter 6 |
Lecture 19. |
A Peek at Unsupervised Learning
Two simple important unsupervised techniques: k-means and GMMs.. | LFD Dynamic e-Chapter 6 |
Lecture 20. |
The Multilayer Perceptron
The neural network (a generalization of the perceptron) is a powerful, biologically inspired model. | LFD Dynamic e-Chapter 7 |
Lecture 21. |
The Neural Network
Computing the hypothesis (forward propagation); learning the weights (backpropagation). | LFD Dynamic e-Chapter 7 |
Lecture 22. |
Neural Networks, Overfitting, and Minimizing In-Sample Error
Weight decay regularization and early stopping; improving upon gradient descent (variable learning rate, steepest descent, conjugate gradients). | LFD Dynamic e-Chapter 7 |
Lecture 23. |
Support Vector Machines: Maximizing the Margin
SVMs are perhaps the most popular technique, being robust to noise (automatic regularization). | LFD Dynamic e-Chapter 8 |
Lecture 24. |
The Optimal Hyperplane and Overfitting
Generalization and overfitting can be controled by parameters not explicitly related to dimension. | LFD Dynamic e-Chapter 8 |
Lecture 25. |
The Kernel Trick
Usings SVMs plus nonlinear transforms without physically transforming to the nonlinear Z-space. | LFD Dynamic e-Chapter 8 |
Lecture 26. |
Choosing Your Kernel Machine
Popular kernels, design choices and kernels in different applications. | LFD Dynamic e-Chapter 8 |
Lecture 27. |
Learning Aides
Additional tools that can be used with any learning technique. | LFD Dynamic e-Chapter 9 |