Finding the Optimal Boundary
Support Vector Machines (SVM) are powerful classification algorithms that find the optimal boundary (hyperplane) between classes. The key insight: among all possible boundaries, SVM finds the one with the maximum margin β the widest possible gap between classes. This makes it more robust and generalizable.
The Maximum Margin Principle
Many possible boundaries:
β
β β β
β β β
β β β β β Which one is best?
β β β
β β β
β
SVM finds the one with the WIDEST MARGIN:
β β β β β β β β
β β ββmarginββ β β
β β β β β β β β
β β
Support Support
Vectors Vectors
The margin is maximized β better generalization
Support Vectors
The data points closest to the decision boundary are called support vectors. These are the critical points that define where the boundary sits. Only these points matter β you could move all other points without changing the boundary. This makes SVM memory-efficient for prediction.
The Kernel Trick
What if the data isn't linearly separable? SVMs use the kernel trick to project data into higher dimensions where a linear boundary becomes possible.
Original Space (not separable): Higher Dimension:
β β β β β β β β
β β β β β β β β
β β β β β β ββkernelβββΊ ββββ
β β β β β β trick β β
β β β β β β β β
Can't draw a line to Now a plane can
separate β and β separate them!
Common kernels: Linear, Polynomial, RBF (Radial Basis Function), Sigmoid. RBF is the most commonly used default.
Pros and Cons
Pros: Effective in high-dimensional spaces, memory-efficient (uses only support vectors), versatile through different kernels, strong theoretical foundation.
Cons: Doesn't scale well to very large datasets, sensitive to feature scaling, doesn't directly provide probability estimates, less interpretable than tree-based models.