describe() — Statistical Summary
Let me give you the fastest way to understand your numeric data. The `describe()` method calculates key statistics for you in one shot.
The Basics
Run `describe()` on any DataFrame with numbers and you'll get a beautiful summary:
import pandas as pd
data = {'Age': [25, 30, 35, 28, 32],
'Salary': [50000, 60000, 75000, 55000, 65000]}
df = pd.DataFrame(data)
print(df.describe())
This gives you count, mean, standard deviation, min, max, and the 25th/50th/75th percentiles. It's like having a statistics professor on demand.
Why Percentiles Matter
One thing that confused me at first was the percentiles. The 50th percentile is the median — the middle value. The 25th and 75th percentiles tell you where the middle 50% of your data falls. If the gap between them is huge, your data is spread out.
Including Non-Numeric Data
Want stats on string columns too?
print(df.describe(include='all'))
This adds count, unique values, and top (most frequent) value for string columns. Super useful when you want the complete picture.
Try it Yourself →Key Takeaways
- `describe()` gives statistical summary for numeric columns
- Shows count, mean, std, min, max, and percentiles
- The 50th percentile is the median
- Use `include='all'` to include non-numeric columns