Aggregation Functions

sum, mean, count, and custom aggregations.

Aggregation

Let me give you the full power of aggregation functions. GroupBy is great, but knowing how to aggregate properly makes it unstoppable.

agg() with Multiple Functions

Apply different functions to different columns:


import pandas as pd

data = {'Department': ['Sales', 'Sales', 'HR', 'HR'],
        'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
        'Salary': [50000, 60000, 55000, 65000],
        'Age': [25, 30, 35, 28]}
df = pd.DataFrame(data)

print(df.groupby('Department').agg({
    'Salary': ['mean', 'max'],
    'Age': ['min', 'mean']
}))

This calculates different stats for different columns. Think of it like asking "what's the average salary and youngest age in each department?"

Named Aggregation

Here is the cleaner way to do this — named aggregation:


print(df.groupby('Department').agg(
    avg_salary=('Salary', 'mean'),
    max_salary=('Salary', 'max'),
    min_age=('Age', 'min')
))

This is way more readable. You specify the new column name, then the original column and function. Trust me, once you use named aggregation, you won't go back.

Custom Aggregation Functions

Need something custom? Pass a function:


def salary_range(group):
    return group.max() - group.min()

print(df.groupby('Department')['Salary'].agg(salary_range))

One thing that confused me at first was the difference between `apply()` and `agg()`. Use `agg()` when you want to apply functions to columns. Use `apply()` when you need to work with entire groups as DataFrames.

Try it Yourself →

Key Takeaways

`agg()` applies multiple functions to grouped data
Named aggregation creates readable column names
You can pass custom functions to `agg()`
Use `agg()` for column operations, `apply()` for group-level operations

← Previous GroupBy Basics

Next → Merge & Join