Hypothesis Testing

A/B testing, p-values, and statistical significance.

Hypothesis Testing

Hypothesis testing is how you make decisions with data. Instead of guessing, you formalize your assumptions and let the data tell you if they're reasonable. It's the backbone of scientific analysis and A/B testing.

The Framework

Every hypothesis test starts with two competing statements: the null hypothesis (H0) and the alternative hypothesis (H1). You assume the null is true and check if the data provides enough evidence to reject it.


import numpy as np
from scipy import stats

control = np.random.normal(100, 15, 50)
treatment = np.random.normal(105, 15, 50)

t_stat, p_value = stats.ttest_ind(control, treatment)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")

The p-value tells you how likely the data is if the null hypothesis were true. A small p-value (typically less than 0.05) means the data is unlikely under the null, so you reject it.

Try it Yourself to

Types of Tests

Different scenarios call for different tests. The t-test is the most common, but there are others you should know about.


from scipy import stats
import numpy as np

data = np.random.normal(50, 10, 100)
stat, p_value = stats.ttest_1samp(data, popmean=52)
print(f"One-sample t-test p-value: {p_value:.4f}")

before = np.random.normal(45, 8, 30)
after = before + np.random.normal(5, 3, 30)
stat, p_value = stats.ttest_rel(before, after)
print(f"Paired t-test p-value: {p_value:.4f}")

Use the one-sample test to compare a sample to a known value. Use the paired test when you have before-and-after measurements on the same subjects.

Try it Yourself to

Common Mistakes

Hypothesis testing is powerful but easy to misuse. Here are the most common pitfalls.


from scipy import stats
import numpy as np

np.random.seed(42)
group_a = np.random.normal(50, 10, 30)
group_b = np.random.normal(52, 10, 30)

stat, p_value = stats.ttest_ind(group_a, group_b)
alpha = 0.05

if p_value less than alpha:
    print("Reject null hypothesis")
else:
    print("Fail to reject null hypothesis")

Remember: failing to reject the null does NOT prove it's true. It just means you don't have enough evidence against it. Also, a significant p-value does not mean the effect is practically important.

Try it Yourself to

Key Takeaways

The null hypothesis assumes no effect or difference
A p-value less than 0.05 typically indicates statistical significance
Choose the right test for your data type and experimental design
Statistical significance does not always mean practical significance

← Previous Probability Distributions

Next → Correlation & Regression