Linear Regression

Predicting continuous values with a straight line.

The Simplest ML Algorithm

Linear Regression is the bread and butter of machine learning. It models the relationship between variables by fitting a straight line (or hyperplane) to the data. Despite its simplicity, it's used everywhere — from economics to engineering to healthcare.

The Math Behind It

The equation for simple linear regression is:


  y = mx + b

  Where:
    y = predicted value (dependent variable)
    x = input feature (independent variable)
    m = slope (weight) — how much y changes for each unit of x
    b = intercept (bias) — the value of y when x = 0

  Multiple Linear Regression (more features):
  y = m₁x₁ + m₂x₂ + m₃x₃ + ... + b

Finding the Best Line

The algorithm finds the line that minimizes the sum of squared errors (SSE) — the vertical distance between each data point and the line. This is called Ordinary Least Squares (OLS).


  Data points and best-fit line:

  Price ($)
    │         ×
    │       ×   ×
    │     ×  ╱──×
    │   ×  ╱×
    │  ×╱×
    │ ×╱
    │╱×
    └──────────────── Square Footage

  The line minimizes the sum of squared
  vertical distances (residuals) from
  each point to the line.

Assumptions of Linear Regression

Linearity — The relationship between features and target is linear
Independence — Observations are independent of each other
Homoscedasticity — Constant variance of errors
Normality — Errors are normally distributed
No multicollinearity — Features aren't highly correlated with each other

Pros and Cons

Pros: Simple, interpretable, fast to train, works well when the relationship is actually linear.

Cons: Can't capture complex non-linear relationships, sensitive to outliers, assumes linearity. If your data has curves, you'll need polynomial regression or a different algorithm entirely.

🧪 Quick Quiz

What type of problem does Linear Regression solve?

← Previous Model Evaluation & Metrics

Next → Logistic Regression