Time Series
Let me give you the basics of working with dates and times in Pandas. Time series data is everywhere — stock prices, weather data, website traffic. Pandas makes handling it surprisingly easy.
Converting to Datetime
First things first — convert strings to proper datetime objects:
import pandas as pd
data = {
'Date': ['2024-01-15', '2024-02-20', '2024-03-25'],
'Sales': [100, 150, 200]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)
Once you have proper datetime objects, you unlock the `.dt` accessor. It's like having a Swiss Army knife for dates.
The dt Accessor
Extract date components easily:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day_name()
print(df)
This creates new columns with year, month number, and day name. One thing that confused me at first was `day_name()` — it returns strings like "Monday" instead of numbers.
Resampling
Need to aggregate time series data by time periods?
df.set_index('Date', inplace=True)
monthly = df['Sales'].resample('M').sum()
print(monthly)
`resample('M')` groups data by month. You can use 'D' for daily, 'W' for weekly, 'Y' for yearly. Think of it like groupby but for time periods.
Try it Yourself →Key Takeaways
- `pd.to_datetime()` converts strings to datetime objects
- The `.dt` accessor provides year, month, day components
- `resample()` aggregates data by time periods
- Set the Date column as index before resampling