Logistic Regression

Binary classification with probability.

Logistic Regression

Despite its name, logistic regression is a classification algorithm. It predicts the probability that something belongs to a particular category. Think of it as linear regression, but outputting a probability between 0 and 1.

How It Works

Logistic regression applies the sigmoid function to a linear combination of features. This squashes the output to a probability between 0 and 1. If the probability is above 0.5, classify as positive; otherwise, negative.


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0] + X[:, 1] greater than 0).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
model = LogisticRegression()
model.fit(X_train, y_train)

preds = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, preds):.4f}")

The model learns decision boundaries that separate classes. For two features, this boundary is a straight line. The accuracy score tells you what percentage of predictions were correct.

Try it Yourself to

Probability Predictions

One of logistic regression's strengths is that it outputs calibrated probabilities, not just class labels.


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0] + X[:, 1] greater than 0).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
model = LogisticRegression()
model.fit(X_train, y_train)

probs = model.predict_proba(X_test)
print("First 5 probabilities:")
for p in probs[:5]:
    print(f"  Class 0: {p[0]:.3f}, Class 1: {p[1]:.3f}")

The predict_proba method returns probabilities for each class. This is useful when you need to rank predictions or set custom thresholds.

Try it Yourself to

Evaluating Classification

Accuracy alone can be misleading, especially with imbalanced classes. You need to look at precision, recall, and the confusion matrix.


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0] + X[:, 1] greater than 0).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
model = LogisticRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)

print(confusion_matrix(y_test, preds))
print(classification_report(y_test, preds))

The confusion matrix shows true positives, false positives, true negatives, and false negatives. The classification report adds precision (how many selected items are relevant) and recall (how many relevant items are selected).

Try it Yourself to

Key Takeaways

Logistic regression predicts probabilities for binary classification
The sigmoid function squashes linear output to 0-1 range
Use predict_proba for probability-based decisions
Evaluate with precision, recall, and confusion matrix — not just accuracy

← Previous Linear Regression

Next → Decision Trees & Random Forest