Aggregation
Let me give you the full power of aggregation functions. GroupBy is great, but knowing how to aggregate properly makes it unstoppable.
agg() with Multiple Functions
Apply different functions to different columns:
import pandas as pd
data = {'Department': ['Sales', 'Sales', 'HR', 'HR'],
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Salary': [50000, 60000, 55000, 65000],
'Age': [25, 30, 35, 28]}
df = pd.DataFrame(data)
print(df.groupby('Department').agg({
'Salary': ['mean', 'max'],
'Age': ['min', 'mean']
}))
This calculates different stats for different columns. Think of it like asking "what's the average salary and youngest age in each department?"
Named Aggregation
Here is the cleaner way to do this — named aggregation:
print(df.groupby('Department').agg(
avg_salary=('Salary', 'mean'),
max_salary=('Salary', 'max'),
min_age=('Age', 'min')
))
This is way more readable. You specify the new column name, then the original column and function. Trust me, once you use named aggregation, you won't go back.
Custom Aggregation Functions
Need something custom? Pass a function:
def salary_range(group):
return group.max() - group.min()
print(df.groupby('Department')['Salary'].agg(salary_range))
One thing that confused me at first was the difference between `apply()` and `agg()`. Use `agg()` when you want to apply functions to columns. Use `apply()` when you need to work with entire groups as DataFrames.
Try it Yourself →Key Takeaways
- `agg()` applies multiple functions to grouped data
- Named aggregation creates readable column names
- You can pass custom functions to `agg()`
- Use `agg()` for column operations, `apply()` for group-level operations