Labs ICT
Pro Login

String Operations

Cleaning text data with .str accessor.

String Operations

Let me show you how to work with text data in Pandas. String operations are essential because real-world data is messy — names have weird capitalization, extra spaces everywhere.

The .str Accessor

Pandas gives you access to string methods through `.str`:


import pandas as pd

data = {'Name': ['  alice ', 'BOB', 'charlie'],
        'Email': ['alice@test.com', 'BOB@TEST.COM', 'charlie@test.com']}
df = pd.DataFrame(data)

df['Name'] = df['Name'].str.strip().str.title()
print(df)
    

`strip()` removes whitespace. `title()` capitalizes properly. You can chain these together — clean and efficient.

Common String Methods

Here are the ones you'll use constantly:


df['Name_lower'] = df['Name'].str.lower()
df['Name_upper'] = df['Name'].str.upper()
df['Name_len'] = df['Name'].str.len()
print(df)
    

`lower()` and `upper()` change case. `len()` counts characters. These are your bread and butter for text cleaning.

Checking for Patterns

Need to check if text contains something?


df['Has_test'] = df['Email'].str.contains('test', case=False)
print(df)
    

`contains()` returns True/False for each row. It's like a filter but for text patterns. One thing that confused me at first was `case=False` — it makes the search case-insensitive.

Try it Yourself →

Key Takeaways

  • Use `.str` accessor to access string methods on columns
  • `strip()`, `lower()`, `upper()` are essential for cleaning
  • `len()` counts characters in each string
  • `contains()` checks for patterns and returns booleans