Labs ICT
Pro Login

Handling Missing Values

NaN is everywhere — how to deal with it.

Missing Values

Let me give you the first step in handling missing data. Before you can fix missing values, you need to find them. This is where most data cleaning projects start.

Finding Missing Values

Use `isnull()` to detect missing values:


import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', None],
        'Age': [25, np.nan, 35],
        'Score': [88, 92, np.nan]}
df = pd.DataFrame(data)

print(df.isnull())
    

This returns a DataFrame of True/False values. True means the value is missing. It's like having a highlighter for empty cells.

Counting Missing Values

How many missing values do you have? Add `sum()`:


print(df.isnull().sum())
    

This gives you a count per column. Super useful for seeing where the problems are.

Percentage Calculation

Want to know the percentage of missing data?


missing_pct = df.isnull().sum() / len(df) * 100
print(missing_pct)
    

Here is the thing — if a column has more than 50% missing values, you might want to drop it entirely. Less than 10%? You can probably fill those in without much trouble.

Try it Yourself →

Key Takeaways

  • `isnull()` creates a boolean mask of missing values
  • `sum()` counts missing values per column
  • Calculate percentages with `sum() / len(df) * 100`
  • High missing percentage may warrant dropping the column