GroupBy Basics

Split-apply-combine — the most powerful pattern.

GroupBy Basics

Let me give you the most powerful feature in Pandas — GroupBy. Think of it as "split, apply, combine." You split data into groups, do something to each group, then combine the results.

GroupBy One Column

The simplest GroupBy operation:


import pandas as pd

data = {'Department': ['Sales', 'Sales', 'HR', 'HR', 'Engineering'],
        'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
        'Salary': [50000, 60000, 55000, 65000, 80000]}
df = pd.DataFrame(data)

print(df.groupby('Department')['Salary'].mean())

This groups by Department, selects Salary, and calculates the mean for each group. It's like a pivot table in Excel but way more flexible.

Multiple Aggregations

Want more than just the mean?


print(df.groupby('Department')['Salary'].agg(['mean', 'sum', 'count']))

The `agg()` method lets you apply multiple functions at once. One function that confused me at first was `count()` — it counts non-null values, not total rows.

GroupBy Multiple Columns

Need to group by more than one column? Just pass a list:


data = {'Department': ['Sales', 'Sales', 'HR', 'HR'],
        'Level': ['Senior', 'Junior', 'Senior', 'Junior'],
        'Salary': [60000, 50000, 65000, 55000]}
df = pd.DataFrame(data)

print(df.groupby(['Department', 'Level'])['Salary'].mean())

This creates a multi-level index (hierarchical). Think of it as grouping by Department first, then by Level within each department.

Try it Yourself →

Key Takeaways

`groupby()` splits data into groups based on column values
Chain with aggregation functions like `mean()`, `sum()`, `count()`
`agg()` applies multiple functions at once
Group by multiple columns using a list

← Previous fillna() & dropna()

Next → Aggregation Functions