The Simplest ML Algorithm
Linear Regression is the bread and butter of machine learning. It models the relationship between variables by fitting a straight line (or hyperplane) to the data. Despite its simplicity, it's used everywhere โ from economics to engineering to healthcare.
The Math Behind It
The equation for simple linear regression is:
y = mx + b
Where:
y = predicted value (dependent variable)
x = input feature (independent variable)
m = slope (weight) โ how much y changes for each unit of x
b = intercept (bias) โ the value of y when x = 0
Multiple Linear Regression (more features):
y = mโxโ + mโxโ + mโxโ + ... + b
Finding the Best Line
The algorithm finds the line that minimizes the sum of squared errors (SSE) โ the vertical distance between each data point and the line. This is called Ordinary Least Squares (OLS).
Data points and best-fit line:
Price ($)
โ ร
โ ร ร
โ ร โฑโโร
โ ร โฑร
โ รโฑร
โ รโฑ
โโฑร
โโโโโโโโโโโโโโโโโ Square Footage
The line minimizes the sum of squared
vertical distances (residuals) from
each point to the line.
Assumptions of Linear Regression
- Linearity โ The relationship between features and target is linear
- Independence โ Observations are independent of each other
- Homoscedasticity โ Constant variance of errors
- Normality โ Errors are normally distributed
- No multicollinearity โ Features aren't highly correlated with each other
Pros and Cons
Pros: Simple, interpretable, fast to train, works well when the relationship is actually linear.
Cons: Can't capture complex non-linear relationships, sensitive to outliers, assumes linearity. If your data has curves, you'll need polynomial regression or a different algorithm entirely.