Support Vector Machines, the bias-variance tradeoff and a bit of kernel theory🔗

This class takes a geometrical approach to Machine Learning through the prism of Support Vector Machines. It covers linear classifiers for data separation first. On the way, it introduces the bias/variance tradeoff. Then it presents a bit of kernel theory and applies it in linear classifiers to reach non-linear SVMs. It then provides perspectives on multi-class classification, support vector regression and density estimation with one-class SVM. Two full practical examples are provided at the end.

Notebook (colab)

Pre-class refresher activities and solution
Summary card
Lecture notes

References🔗

On the general theory of SVMs for classification:
A tutorial on Support Vector Machines for Pattern Recognition.
C. J. C. Burges, Data Mining and Knowledge Discovery, 2, 131-167, (1998).

On support vector regression (and its extension to $\nu$-SVR):
A tutorial on Support Vector Regression.
A. J. Smola and B. Schölkopf, Journal of Statistics and Computing, 14(3), 199-222, (2004).
New support vector algorithms. B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. Neural computation, 12(5), 1207-1245, (2000).

On One-Class SVMs:
Support vector method for novelty detection.
B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and John C. Platt. Neural Information Processing Systems, 12, 582-588, (1999).

On multi-class SVMs:
On the algorithmic implementation of multiclass kernel-based vector machines.
K. Crammer and Y. Singer. Journal of machine learning research, 2, 265-292, (2001).