Data Exploration

head(), describe(), info() — understanding your data.

Data Exploration

Once your data is loaded, the fun begins. Data exploration is about understanding what you have — the shape, the types, the distributions, the weird anomalies. Think of it like meeting someone new: you ask questions, observe, and form initial impressions.

First Look at Your Data

Always start with .head(), .info(), and .shape. These three give you a comprehensive first impression in seconds.


import pandas as pd

df = pd.read_csv("customers.csv")
print(df.shape)
print(df.head())
print(df.info())

.shape tells you rows and columns. .head() shows the first few rows. .info() reveals column types and missing values. These are your first three moves, every single time.

Try it Yourself →

Statistical Summary

The .describe() method gives you a statistical summary of all numeric columns. It's like a cheat sheet for your data's distribution.


import pandas as pd

df = pd.read_csv("sales.csv")
print(df.describe())

print(df["revenue"].mean())
print(df["revenue"].median())
print(df["revenue"].std())

You get count, mean, standard deviation, min, max, and percentiles. This tells you a lot about whether your data makes sense or needs cleaning.

Try it Yourself →

Value Counts and Unique Values

For categorical columns, you want to know how many unique values exist and how often each appears. This helps you spot imbalanced data or unexpected categories.


import pandas as pd

df = pd.read_csv("users.csv")
print(df["country"].nunique())
print(df["country"].value_counts())
print(df["subscription"].value_counts(normalize=True))

The normalize=True parameter gives you percentages instead of counts. Super useful for understanding proportions.

Try it Yourself →

Key Takeaways

Always start with .head(), .info(), and .shape
.describe() gives a quick statistical summary of numeric columns
.value_counts() reveals the distribution of categorical data
Exploration helps you form hypotheses before deeper analysis

← Previous Loading Data

Next → Data Cleaning