Changing Data Types
Let me give you the tools to fix data types. Wrong data types cause more bugs than you'd think. Numbers stored as strings? Dates stored as objects? This section fixes all of that.
astype() — The General Converter
The most flexible way to change types:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': ['25', '30', '35'],
'Score': ['88.5', '92.3', '79.8']}
df = pd.DataFrame(data)
df['Age'] = df['Age'].astype(int)
df['Score'] = df['Score'].astype(float)
print(df.dtypes)
Convert strings to integers, floats, or whatever you need. Think of it like changing the format of cells in Excel.
to_datetime() — For Dates
Dates are special. Use `to_datetime()` for proper conversion:
data = {'Date': ['2024-01-15', '2024-02-20', '2024-03-25']}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)
Once converted, you can access the year, month, day, etc. using the `.dt` accessor. It's like unlocking superpowers for date manipulation.
to_numeric() — Handling Messy Numbers
When numbers have weird characters, `to_numeric()` saves the day:
data = {'Price': ['10.5', 'N/A', '15.3', 'missing']}
df = pd.DataFrame(data)
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
print(df)
The `errors='coerce'` parameter turns problematic values into NaN instead of crashing. One thing that confused me at first was why this mattered — but trust me, messy data will crash your code without it.
Try it Yourself →Key Takeaways
- `astype()` converts to any data type
- `pd.to_datetime()` properly converts date strings
- `pd.to_numeric()` handles messy numeric data
- `errors='coerce'` turns invalid values into NaN