String Operations

Cleaning text data with .str accessor.

String Operations

Let me show you how to work with text data in Pandas. String operations are essential because real-world data is messy — names have weird capitalization, extra spaces everywhere.

The .str Accessor

Pandas gives you access to string methods through `.str`:


import pandas as pd

data = {'Name': ['  alice ', 'BOB', 'charlie'],
        'Email': ['alice@test.com', 'BOB@TEST.COM', 'charlie@test.com']}
df = pd.DataFrame(data)

df['Name'] = df['Name'].str.strip().str.title()
print(df)

`strip()` removes whitespace. `title()` capitalizes properly. You can chain these together — clean and efficient.

Common String Methods

Here are the ones you'll use constantly:


df['Name_lower'] = df['Name'].str.lower()
df['Name_upper'] = df['Name'].str.upper()
df['Name_len'] = df['Name'].str.len()
print(df)

`lower()` and `upper()` change case. `len()` counts characters. These are your bread and butter for text cleaning.

Checking for Patterns

Need to check if text contains something?


df['Has_test'] = df['Email'].str.contains('test', case=False)
print(df)

`contains()` returns True/False for each row. It's like a filter but for text patterns. One thing that confused me at first was `case=False` — it makes the search case-insensitive.

Try it Yourself →

Key Takeaways

Use `.str` accessor to access string methods on columns
`strip()`, `lower()`, `upper()` are essential for cleaning
`len()` counts characters in each string
`contains()` checks for patterns and returns booleans

← Previous apply() & map()

Next → Handling Missing Values