Support Vector Machines

Finding the optimal boundary between classes.

Finding the Optimal Boundary

Support Vector Machines (SVM) are powerful classification algorithms that find the optimal boundary (hyperplane) between classes. The key insight: among all possible boundaries, SVM finds the one with the maximum margin — the widest possible gap between classes. This makes it more robust and generalizable.

The Maximum Margin Principle


  Many possible boundaries:
        │
  ○     │     ●
    ○   │   ●
  ○  ○  │  ●  ●        Which one is best?
     ○  │ ●
  ○     │     ●
        │

  SVM finds the one with the WIDEST MARGIN:

  ○  ○  ○  │          │  ●  ●  ●
     ○  ○  │←margin→│  ●  ●
  ○  ○  ○  │          │  ●  ●  ●
             │          │
        Support     Support
        Vectors     Vectors

  The margin is maximized → better generalization

Support Vectors

The data points closest to the decision boundary are called support vectors. These are the critical points that define where the boundary sits. Only these points matter — you could move all other points without changing the boundary. This makes SVM memory-efficient for prediction.

The Kernel Trick

What if the data isn't linearly separable? SVMs use the kernel trick to project data into higher dimensions where a linear boundary becomes possible.


  Original Space (not separable):    Higher Dimension:

  ○ ● ○ ● ○ ●                       ○   ○
  ○ ● ○ ● ○ ●                       ○     ○
  ● ○ ● ○ ● ○        ──kernel──►        ●●●●
  ● ○ ● ○ ● ○         trick           ●     ●
  ○ ● ○ ○ ● ●                        ●   ●

  Can't draw a line to              Now a plane can
  separate ○ and ●                  separate them!

Common kernels: Linear, Polynomial, RBF (Radial Basis Function), Sigmoid. RBF is the most commonly used default.

Pros and Cons

Pros: Effective in high-dimensional spaces, memory-efficient (uses only support vectors), versatile through different kernels, strong theoretical foundation.

Cons: Doesn't scale well to very large datasets, sensitive to feature scaling, doesn't directly provide probability estimates, less interpretable than tree-based models.

🧪 Quick Quiz

What does SVM stand for?

← Previous K-Nearest Neighbors

Next → Naive Bayes Classifier