Probability Distributions

Normal, binomial, Poisson — understanding data patterns.

Probability Distributions

A probability distribution describes how probabilities are spread across possible values. Different types of data follow different distributions. Understanding distributions helps you choose the right statistical tests and models.

Normal Distribution

The bell curve — the most important distribution in statistics. Heights, test scores, measurement errors — many natural phenomena follow a normal distribution. It's defined by two parameters: mean and standard deviation.


import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-4, 4, 1000)
y = (1 / np.sqrt(2 * np.pi)) * np.exp(-x**2 / 2)

plt.plot(x, y, linewidth=2)
plt.fill_between(x, y, alpha=0.3)
plt.title("Standard Normal Distribution")
plt.axvline(0, color="red", linestyle="--", label="Mean")
plt.legend()
plt.show()

About 68% of data falls within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3. This is the 68-95-99.7 rule — memorize it.

Try it Yourself →

Binomial Distribution

The binomial distribution counts successes in a fixed number of independent trials. Coin flips, yes/no decisions, pass/fail outcomes — all binomial.


from scipy.stats import binom
import matplotlib.pyplot as plt

n, p = 10, 0.5
x = range(n + 1)
probs = binom.pmf(x, n, p)

plt.bar(x, probs, color="#3498db")
plt.title(f"Binomial(n={n}, p={p})")
plt.xlabel("Heads")
plt.ylabel("Probability")
plt.show()

The pmf function gives you the probability of exactly k successes. Use cdf for cumulative probability (k or fewer successes).

Try it Yourself →

Poisson Distribution

The Poisson distribution counts events in a fixed interval. Customer arrivals per hour, emails per day, calls per minute — anything that happens randomly over time.


from scipy.stats import poisson
import matplotlib.pyplot as plt

x = range(15)
probs = poisson.pmf(x, mu=4)

plt.bar(x, probs, color="#2ecc71")
plt.title("Poisson Distribution (λ=4)")
plt.xlabel("Events")
plt.ylabel("Probability")
plt.show()

The single parameter λ (lambda) is both the mean and variance. If you expect 4 customers per hour, the Poisson distribution tells you the probability of getting exactly 0, 1, 2, 3, etc.

Try it Yourself →

Checking Normality

Many statistical tests assume normality. Here's how to check if your data is approximately normal.


from scipy import stats
import numpy as np

data = np.random.normal(50, 10, 200)

stat, p_value = stats.shapiro(data)
print(f"Shapiro-Wilk test p-value: {p_value:.4f}")

if p_value > 0.05:
    print("Data appears normally distributed")
else:
    print("Data does NOT appear normally distributed")

If the p-value is greater than 0.05, you fail to reject the null hypothesis that the data is normal. In other words, normality is a reasonable assumption.

Try it Yourself →

Key Takeaways

Normal distribution: bell curve, 68-95-99.7 rule
Binomial: counts successes in fixed trials
Poisson: counts random events over a time interval
Always check normality before using tests that assume it

← Previous Probability Basics

Next → Hypothesis Testing