Labs ICT
Pro Login

Merge & Join

Combining multiple DataFrames together.

Merge & Join

Let me give you the tools to combine multiple DataFrames. This is like SQL joins but in Python — and honestly, once you get the hang of it, it's way easier.

Basic Merge

Merge two DataFrames on a common column:


import pandas as pd

employees = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Dept_ID': [1, 2, 1]
})

departments = pd.DataFrame({
    'Dept_ID': [1, 2, 3],
    'Dept_Name': ['Sales', 'HR', 'Engineering']
})

result = pd.merge(employees, departments, on='Dept_ID')
print(result)
    

This matches rows where Dept_ID is the same. It's like VLOOKUP in Excel, but more powerful.

Different Join Types

Here's where it gets interesting. You can control which rows to keep:


result_left = pd.merge(employees, departments, on='Dept_ID', how='left')
print(result_left)
    

`how='left'` keeps all rows from the left DataFrame (employees). `how='right'` keeps all from the right. `how='outer'` keeps everything. `how='inner'` (default) keeps only matches.

Merge on Different Column Names

When column names don't match:


employees = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'department_id': [1, 2]
})

result = pd.merge(employees, departments,
                  left_on='department_id',
                  right_on='Dept_ID')
print(result)
    

One thing that confused me at first was the `left_on` and `right_on` parameters. Think of it as specifying which column from each DataFrame to match on.

Try it Yourself →

Key Takeaways

  • `pd.merge()` combines DataFrames on common columns
  • `on` specifies the column to merge on
  • `how` controls join type: left, right, inner, outer
  • Use `left_on` and `right_on` when column names differ