Labs ICT
Pro Login

Data Transformation

Merging, grouping, pivoting — reshaping your data.

Data Transformation

Raw data rarely comes in the format you need. Data transformation is about reshaping, aggregating, and combining datasets to prepare them for analysis. Think of it as sculpting — you start with a rough block and carve it into something useful.

Merging Datasets

In the real world, data is spread across multiple tables. Merging (or joining) combines them based on common columns. It's just like SQL joins, but in Python.


import pandas as pd

customers = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"]
})

orders = pd.DataFrame({
    "customer_id": [1, 2, 2, 3],
    "product": ["A", "B", "A", "C"]
})

merged = pd.merge(customers, orders, left_on="id", right_on="customer_id")
print(merged)
    

Use left_on and right_on when column names differ. Use how to control the join type: "inner", "left", "right", or "outer".

Try it Yourself →

Grouping and Aggregating

Grouping is one of the most powerful operations in pandas. It splits your data into groups, applies a function to each, and combines the results.


import pandas as pd

sales = pd.DataFrame({
    "region": ["East", "West", "East", "West", "East"],
    "product": ["A", "A", "B", "B", "A"],
    "revenue": [100, 150, 200, 120, 180]
})

result = sales.groupby("region")["revenue"].sum()
print(result)

result2 = sales.groupby(["region", "product"])["revenue"].mean()
print(result2)
    

You can group by multiple columns and apply different aggregation functions. Common ones: .sum(), .mean(), .count(), .max().

Try it Yourself →

Pivoting Data

Pivoting reshapes data from long format to wide format. Think of it like rotating a table — rows become columns and vice versa.


import pandas as pd

long_data = pd.DataFrame({
    "date": ["2024-01", "2024-01", "2024-02", "2024-02"],
    "product": ["A", "B", "A", "B"],
    "sales": [100, 200, 150, 250]
})

pivoted = long_data.pivot(index="date", columns="product", values="sales")
print(pivoted)
    

Pivoting is essential when you need to create crosstabs or compare categories side by side. The opposite operation is .melt(), which goes from wide to long format.

Try it Yourself →

Key Takeaways

  • Merging combines datasets based on common columns
  • GroupBy splits data into groups and applies aggregations
  • Pivoting reshapes data from long to wide format
  • Always inspect the result after transformation — mistakes are common