Labs ICT
Pro Login

fillna() & dropna()

The two main strategies for missing data.

fillna() & dropna()

Let me give you the two main strategies for handling missing data. You can either fill in the gaps or remove the incomplete rows. Which one you choose depends on your situation.

dropna() — Just Remove Them

The simplest approach — drop any row with missing values:


import pandas as pd
import numpy as np

data = {'Name': ['Alice', 'Bob', None, 'Diana'],
        'Age': [25, np.nan, 35, 28]}
df = pd.DataFrame(data)

df_clean = df.dropna()
print(df_clean)
    

This removes any row with even one missing value. It's clean but you might lose a lot of data.

fillna() — Fill with a Value

Instead of removing, fill in the gaps:


df['Name'] = df['Name'].fillna('Unknown')
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)
    

Fill strings with a default value, or fill numbers with the mean/median. This preserves your data while fixing the gaps.

Different Strategies

Here is the thing — there's no one-size-fits-all answer:


df['Age'] = df['Age'].fillna(method='ffill')
print(df)
    

Forward fill (`ffill`) uses the previous value to fill the gap. Backward fill (`bfill`) uses the next value. These are great for time series data where values tend to follow trends.

One thing that confused me at first was when to use which. My rule of thumb: if less than 5% is missing, drop it. If it's a numeric column, fill with mean. If it's categorical, fill with the mode. If it's time series, use ffill.

Try it Yourself →

Key Takeaways

  • `dropna()` removes rows with missing values
  • `fillna()` fills gaps with a specific value
  • Fill numeric columns with mean/median, strings with mode
  • `ffill` and `bfill` are great for time series data