Missing Values
Let me give you the first step in handling missing data. Before you can fix missing values, you need to find them. This is where most data cleaning projects start.
Finding Missing Values
Use `isnull()` to detect missing values:
import pandas as pd
import numpy as np
data = {'Name': ['Alice', 'Bob', None],
'Age': [25, np.nan, 35],
'Score': [88, 92, np.nan]}
df = pd.DataFrame(data)
print(df.isnull())
This returns a DataFrame of True/False values. True means the value is missing. It's like having a highlighter for empty cells.
Counting Missing Values
How many missing values do you have? Add `sum()`:
print(df.isnull().sum())
This gives you a count per column. Super useful for seeing where the problems are.
Percentage Calculation
Want to know the percentage of missing data?
missing_pct = df.isnull().sum() / len(df) * 100
print(missing_pct)
Here is the thing — if a column has more than 50% missing values, you might want to drop it entirely. Less than 10%? You can probably fill those in without much trouble.
Try it Yourself →Key Takeaways
- `isnull()` creates a boolean mask of missing values
- `sum()` counts missing values per column
- Calculate percentages with `sum() / len(df) * 100`
- High missing percentage may warrant dropping the column