Probability Basics

Random events, distributions, and Bayes' theorem.

Probability Basics

Probability is the math of uncertainty. In data science, you're constantly making predictions with incomplete information — probability gives you the framework to do that rigorously. Don't worry, we'll keep it practical.

What is Probability?

Probability measures how likely an event is to occur, ranging from 0 (impossible) to 1 (certain). A fair coin has a 0.5 probability of landing heads — meaning it happens half the time.


import random

outcomes = [random.choice(["H", "T"]) for _ in range(10000)]
heads_count = outcomes.count("H")
print(f"Heads: {heads_count/len(outcomes):.4f}")

This is the frequentist interpretation — probability is the long-run frequency of an event. Run the experiment enough times and the ratio stabilizes.

Try it Yourself →

Key Rules

There are a few rules that govern all probability calculations. Master these and you can solve almost any basic probability problem.


# Addition rule: P(A or B) = P(A) + P(B) - P(A and B)
p_rain = 0.3
p_cloudy = 0.5
p_both = 0.2
p_rain_or_cloudy = p_rain + p_cloudy - p_both
print(f"P(rain or cloudy): {p_rain_or_cloudy}")

# Multiplication rule for independent events
p_heads = 0.5
p_two_heads = p_heads * p_heads
print(f"P(two heads): {p_two_heads}")

The addition rule handles "or" situations. The multiplication rule handles "and" situations — but only when events are independent.

Try it Yourself →

Conditional Probability

Sometimes the probability of something depends on another event having already happened. That's conditional probability — the probability of A given B.


# P(rain | cloudy) = P(rain and cloudy) / P(cloudy)
p_rain_and_cloudy = 0.2
p_cloudy = 0.5
p_rain_given_cloudy = p_rain_and_cloudy / p_cloudy
print(f"P(rain | cloudy): {p_rain_given_cloudy}")

This is the foundation of Bayes' theorem, which lets you update beliefs as new evidence arrives. It's used everywhere from spam filters to medical diagnoses.

Try it Yourself →

Bayes' Theorem

Bayes' theorem is how you flip conditional probabilities. Given P(B|A), it tells you P(A|B). It's one of the most important equations in statistics.


# Disease test: 1% have disease, test is 99% accurate
p_disease = 0.01
p_positive_given_disease = 0.99
p_positive_given_healthy = 0.05

p_positive = (p_positive_given_disease * p_disease +
              p_positive_given_healthy * (1 - p_disease))

p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
print(f"P(disease | positive): {p_disease_given_positive:.4f}")

Surprise — even with a positive test, there's only about a 17% chance you actually have the disease. Base rates matter enormously.

Try it Yourself →

Key Takeaways

Probability ranges from 0 (impossible) to 1 (certain)
Addition rule handles "or" events, multiplication handles "and" events
Conditional probability changes when you have new information
Bayes' theorem lets you update probabilities with evidence

← Previous Descriptive Statistics

Next → Probability Distributions