Probability Distributions
A probability distribution describes how probabilities are spread across possible values. Different types of data follow different distributions. Understanding distributions helps you choose the right statistical tests and models.
Normal Distribution
The bell curve — the most important distribution in statistics. Heights, test scores, measurement errors — many natural phenomena follow a normal distribution. It's defined by two parameters: mean and standard deviation.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, 1000)
y = (1 / np.sqrt(2 * np.pi)) * np.exp(-x**2 / 2)
plt.plot(x, y, linewidth=2)
plt.fill_between(x, y, alpha=0.3)
plt.title("Standard Normal Distribution")
plt.axvline(0, color="red", linestyle="--", label="Mean")
plt.legend()
plt.show()
About 68% of data falls within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3. This is the 68-95-99.7 rule — memorize it.
Try it Yourself →Binomial Distribution
The binomial distribution counts successes in a fixed number of independent trials. Coin flips, yes/no decisions, pass/fail outcomes — all binomial.
from scipy.stats import binom
import matplotlib.pyplot as plt
n, p = 10, 0.5
x = range(n + 1)
probs = binom.pmf(x, n, p)
plt.bar(x, probs, color="#3498db")
plt.title(f"Binomial(n={n}, p={p})")
plt.xlabel("Heads")
plt.ylabel("Probability")
plt.show()
The pmf function gives you the probability of exactly k successes. Use cdf for cumulative probability (k or fewer successes).
Poisson Distribution
The Poisson distribution counts events in a fixed interval. Customer arrivals per hour, emails per day, calls per minute — anything that happens randomly over time.
from scipy.stats import poisson
import matplotlib.pyplot as plt
x = range(15)
probs = poisson.pmf(x, mu=4)
plt.bar(x, probs, color="#2ecc71")
plt.title("Poisson Distribution (λ=4)")
plt.xlabel("Events")
plt.ylabel("Probability")
plt.show()
The single parameter λ (lambda) is both the mean and variance. If you expect 4 customers per hour, the Poisson distribution tells you the probability of getting exactly 0, 1, 2, 3, etc.
Try it Yourself →Checking Normality
Many statistical tests assume normality. Here's how to check if your data is approximately normal.
from scipy import stats
import numpy as np
data = np.random.normal(50, 10, 200)
stat, p_value = stats.shapiro(data)
print(f"Shapiro-Wilk test p-value: {p_value:.4f}")
if p_value > 0.05:
print("Data appears normally distributed")
else:
print("Data does NOT appear normally distributed")
If the p-value is greater than 0.05, you fail to reject the null hypothesis that the data is normal. In other words, normality is a reasonable assumption.
Try it Yourself →Key Takeaways
- Normal distribution: bell curve, 68-95-99.7 rule
- Binomial: counts successes in fixed trials
- Poisson: counts random events over a time interval
- Always check normality before using tests that assume it