Decision Trees & Random Forests

Tree-based models for classification and regression.

Tree-Based Models

Decision Trees are intuitive, interpretable models that split data into branches based on feature values. Think of it like a flowchart — at each node, you ask a question and follow the branch that matches. Random Forests take this further by combining many trees into a powerful ensemble.

How Decision Trees Work


  Should I play tennis today?

                    ┌──────────┐
                    │  Outlook? │
                    └────┬─────┘
            ┌────────────┼────────────┐
            ▼            ▼            ▼
        ┌──────┐    ┌──────┐    ┌──────┐
        │Sunny │    │Overc.│    │ Rain │
        └──┬───┘    └──┬───┘    └──┬───┘
           │           │           │
      ┌────┴────┐      │      ┌────┴────┐
      │Humidity?│      │      │  Wind?  │
      └────┬────┘      │      └────┬────┘
     ┌─────┴─────┐    Yes    ┌─────┴─────┐
     ▼           ▼           ▼           ▼
  ┌──────┐  ┌──────┐   ┌──────┐   ┌──────┐
  │High  │  │Normal│   │  Yes │   │  No  │
  │  No  │  │ Yes  │   └──────┘   └──────┘
  └──────┘  └──────┘

Each internal node represents a test on a feature, each branch represents the outcome, and each leaf represents a class label. The tree learns which splits best separate the classes by maximizing information gain or minimizing Gini impurity.

The Problem with Single Trees

A single decision tree tends to overfit — it memorizes the training data, including its noise. It creates overly complex trees that don't generalize well. That's where Random Forests come in.

Random Forests: Strength in Numbers

A Random Forest builds hundreds or thousands of decision trees on random subsets of the data and features. Each tree votes on the prediction, and the majority wins. This bagging approach reduces overfitting and improves generalization.


  Random Forest:

  Tree 1:  🌳 → Predicts: Spam
  Tree 2:  🌳 → Predicts: Not Spam
  Tree 3:  🌳 → Predicts: Spam
  Tree 4:  🌳 → Predicts: Spam
  Tree 5:  🌳 → Predicts: Not Spam
  ... (hundreds more)
  Tree N:  🌳 → Predicts: Spam

  Majority Vote → Final: SPAM ✓

Feature Importance

Both Decision Trees and Random Forests can tell you which features are most important for predictions. This is valuable for understanding your data and explaining model decisions — a big advantage over "black box" models like neural networks.

When to Use Them

Tree-based models are great defaults for tabular (structured) data. They handle mixed feature types (numerical and categorical), require minimal preprocessing, are robust to outliers, and provide feature importance. Random Forests are often the best starting point for any classification or regression task on tabular data.

🧪 Quick Quiz

What is a Random Forest?

← Previous Logistic Regression

Next → K-Nearest Neighbors