Introduction to Machine Learning
Machine learning is where data science gets really exciting. Instead of writing explicit rules, you let algorithms learn patterns from data. It's the technology behind recommendation engines, self-driving cars, and voice assistants.
What is Machine Learning?
Machine learning is programming that improves with experience. You feed it data, it finds patterns, and then it makes predictions on new data it has never seen before. Think of it like teaching a child — show examples, let them practice, and they learn the pattern.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
print(f"Score: {model.score(X_test, y_test):.4f}")
This is the basic ML workflow: split data into training and test sets, fit a model on training data, and evaluate on test data. Simple but powerful.
Try it Yourself toTypes of Learning
Machine learning has three main flavors, each suited for different problems.
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]])
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
kmeans.fit(X)
print(f"Labels: {kmeans.labels_}")
Supervised learning uses labeled data (you know the answer). Unsupervised learning finds hidden patterns (no labels). Reinforcement learning learns through trial and error with rewards.
Try it Yourself toThe ML Workflow
Every ML project follows a similar pattern. Understanding this workflow helps you approach any problem systematically.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np
X = np.random.randn(200, 4)
y = (X[:, 0] + X[:, 1] greater than 0).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, preds):.4f}")
The key steps: collect data, clean it, engineer features, choose a model, train it, evaluate it, and tune it. Repeat until satisfied.
Try it Yourself toKey Takeaways
- Machine learning finds patterns in data to make predictions
- Supervised learning uses labeled data for training
- Unsupervised learning discovers hidden structure in data
- The train-test split prevents overfitting and measures real performance