Naive Bayes Classifier

Probabilistic classification using Bayes' theorem.

Probabilistic Classification

The Naive Bayes classifier is based on Bayes' Theorem with a strong (naive) assumption that all features are independent of each other given the class label. Despite this simplification, it works surprisingly well in practice — especially for text classification problems like spam filtering and sentiment analysis.

Bayes' Theorem


  P(Class | Features) = P(Features | Class) × P(Class)
                        ─────────────────────────────
                               P(Features)

  In plain English:
  ┌──────────────────────────────────────────────────────┐
  │  Probability of class given features                 │
  │       =                                              │
  │  Probability of features given class                 │
  │       × probability of the class                     │
  │       ÷ probability of the features                  │
  └──────────────────────────────────────────────────────┘

  "Naive" assumption: features are independent
  P(f1, f2, f3 | Class) = P(f1|Class) × P(f2|Class) × P(f3|Class)

Example: Spam Detection


  Email contains: "free money winner"

  P(Spam | "free money winner")
    = P("free"|Spam) × P("money"|Spam) × P("winner"|Spam) × P(Spam)
    ────────────────────────────────────────────────────────────────
                              P("free money winner")

  Calculate for both Spam and Not Spam:
  P(Spam | words) = 0.92
  P(Not Spam | words) = 0.08

  Prediction: SPAM ✓

Types of Naive Bayes

Gaussian NB — Assumes features follow a normal distribution. Works with continuous data.
Multinomial NB — Works with discrete counts (like word frequencies). Great for text classification.
Bernoulli NB — Works with binary features (word present or absent). Also good for text.

Why "Naive" Still Works

The independence assumption is almost never true in real data. But Naive Bayes doesn't need the probabilities to be perfectly calibrated — it just needs to get the ranking right (which class has the higher probability). For classification, getting the ranking right is often enough.

Pros and Cons

Pros: Extremely fast to train and predict, works well with small training data, handles high-dimensional data, great for text classification, no complex optimization needed.

Cons: Independence assumption is unrealistic, can be outperformed by more sophisticated models, probability estimates can be poorly calibrated, struggles with correlated features.

🧪 Quick Quiz

What theorem does the Naive Bayes classifier rely on?

← Previous Support Vector Machines

Next → Neural Networks Basics