Data Exploration
Once your data is loaded, the fun begins. Data exploration is about understanding what you have — the shape, the types, the distributions, the weird anomalies. Think of it like meeting someone new: you ask questions, observe, and form initial impressions.
First Look at Your Data
Always start with .head(), .info(), and .shape. These three give you a comprehensive first impression in seconds.
import pandas as pd
df = pd.read_csv("customers.csv")
print(df.shape)
print(df.head())
print(df.info())
.shape tells you rows and columns. .head() shows the first few rows. .info() reveals column types and missing values. These are your first three moves, every single time.
Statistical Summary
The .describe() method gives you a statistical summary of all numeric columns. It's like a cheat sheet for your data's distribution.
import pandas as pd
df = pd.read_csv("sales.csv")
print(df.describe())
print(df["revenue"].mean())
print(df["revenue"].median())
print(df["revenue"].std())
You get count, mean, standard deviation, min, max, and percentiles. This tells you a lot about whether your data makes sense or needs cleaning.
Try it Yourself →Value Counts and Unique Values
For categorical columns, you want to know how many unique values exist and how often each appears. This helps you spot imbalanced data or unexpected categories.
import pandas as pd
df = pd.read_csv("users.csv")
print(df["country"].nunique())
print(df["country"].value_counts())
print(df["subscription"].value_counts(normalize=True))
The normalize=True parameter gives you percentages instead of counts. Super useful for understanding proportions.
Key Takeaways
- Always start with .head(), .info(), and .shape
- .describe() gives a quick statistical summary of numeric columns
- .value_counts() reveals the distribution of categorical data
- Exploration helps you form hypotheses before deeper analysis