Merge & Join
Let me give you the tools to combine multiple DataFrames. This is like SQL joins but in Python — and honestly, once you get the hang of it, it's way easier.
Basic Merge
Merge two DataFrames on a common column:
import pandas as pd
employees = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Dept_ID': [1, 2, 1]
})
departments = pd.DataFrame({
'Dept_ID': [1, 2, 3],
'Dept_Name': ['Sales', 'HR', 'Engineering']
})
result = pd.merge(employees, departments, on='Dept_ID')
print(result)
This matches rows where Dept_ID is the same. It's like VLOOKUP in Excel, but more powerful.
Different Join Types
Here's where it gets interesting. You can control which rows to keep:
result_left = pd.merge(employees, departments, on='Dept_ID', how='left')
print(result_left)
`how='left'` keeps all rows from the left DataFrame (employees). `how='right'` keeps all from the right. `how='outer'` keeps everything. `how='inner'` (default) keeps only matches.
Merge on Different Column Names
When column names don't match:
employees = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'department_id': [1, 2]
})
result = pd.merge(employees, departments,
left_on='department_id',
right_on='Dept_ID')
print(result)
One thing that confused me at first was the `left_on` and `right_on` parameters. Think of it as specifying which column from each DataFrame to match on.
Try it Yourself →Key Takeaways
- `pd.merge()` combines DataFrames on common columns
- `on` specifies the column to merge on
- `how` controls join type: left, right, inner, outer
- Use `left_on` and `right_on` when column names differ