String Operations
Let me show you how to work with text data in Pandas. String operations are essential because real-world data is messy — names have weird capitalization, extra spaces everywhere.
The .str Accessor
Pandas gives you access to string methods through `.str`:
import pandas as pd
data = {'Name': [' alice ', 'BOB', 'charlie'],
'Email': ['alice@test.com', 'BOB@TEST.COM', 'charlie@test.com']}
df = pd.DataFrame(data)
df['Name'] = df['Name'].str.strip().str.title()
print(df)
`strip()` removes whitespace. `title()` capitalizes properly. You can chain these together — clean and efficient.
Common String Methods
Here are the ones you'll use constantly:
df['Name_lower'] = df['Name'].str.lower()
df['Name_upper'] = df['Name'].str.upper()
df['Name_len'] = df['Name'].str.len()
print(df)
`lower()` and `upper()` change case. `len()` counts characters. These are your bread and butter for text cleaning.
Checking for Patterns
Need to check if text contains something?
df['Has_test'] = df['Email'].str.contains('test', case=False)
print(df)
`contains()` returns True/False for each row. It's like a filter but for text patterns. One thing that confused me at first was `case=False` — it makes the search case-insensitive.
Try it Yourself →Key Takeaways
- Use `.str` accessor to access string methods on columns
- `strip()`, `lower()`, `upper()` are essential for cleaning
- `len()` counts characters in each string
- `contains()` checks for patterns and returns booleans