Labs ICT
Pro Login

Pandas GroupBy

Split-apply-combine for grouped analysis.

Pandas GroupBy

So you have a messy dataset and want to split it into groups, do something with each group, and combine the results. That is exactly what groupby does. Think of it like a split-apply-combine workflow.

Basic GroupBy

The simplest form of groupby groups rows by a column and applies an aggregation function. Here is how it works:


import pandas as pd

sales = pd.DataFrame({
    'region': ['North', 'South', 'North', 'South', 'North'],
    'product': ['A', 'B', 'A', 'B', 'A'],
    'revenue': [100, 200, 150, 250, 120]
})

print(sales.groupby('region')['revenue'].sum())
print(sales.groupby('region')['revenue'].mean())
    

Multiple Aggregations

You can apply multiple aggregation functions at once using the agg method. This gives you a rich summary of your grouped data.


print(sales.groupby('region')['revenue'].agg(['sum', 'mean', 'count']))

print(sales.groupby(['region', 'product']).agg(
    total_revenue=('revenue', 'sum'),
    avg_revenue=('revenue', 'mean'),
    transaction_count=('revenue', 'count')
))
    

Transform

Transform returns a result the same size as the input. This is useful for adding group-level calculations as new columns without collapsing the data.


sales['region_avg'] = sales.groupby('region')['revenue'].transform('mean')
sales['deviation'] = sales['revenue'] - sales['region_avg']
print(sales)
    

Filter

Filter keeps or drops entire groups based on a condition. This is useful for focusing on specific subsets of your data.


high_revenue = sales.groupby('region').filter(lambda x: x['revenue'].sum() > 300)
print(high_revenue)
    
Try it Yourself →

Key Takeaways

  • GroupBy follows a split-apply-combine pattern
  • Use agg for multiple summary statistics at once
  • Use transform to add group-level calculations without collapsing rows
  • Use filter to keep only groups that meet a condition