Pandas DataFrames In Depth
We introduced DataFrames in the pandas intro, but there's so much more to them. Let's go deeper — indexing, selecting, filtering, and modifying data like a pro. This is where you build real fluency.
Indexing with loc and iloc
Pandas has two primary indexing methods: .loc (label-based) and .iloc (position-based). Mixing them up is the number one source of bugs for beginners.
import pandas as pd
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"score": [85, 92, 78]
}, index=["x", "y", "z"])
print(df.loc["x", "name"])
print(df.iloc[0, 1])
print(df.loc["x":"y", "score"])
Use .loc when you know the labels. Use .iloc when you want position. For example, .iloc[0] always gives you the first row, regardless of what the index is called.
Adding and Removing Columns
DataFrames are mutable — you can add, remove, and modify columns freely. This is essential for feature engineering later on.
import pandas as pd
df = pd.DataFrame({
"price": [10, 20, 30],
"quantity": [5, 3, 8]
})
df["total"] = df["price"] * df["quantity"]
df["discount"] = 0.1
df = df.drop("discount", axis=1)
print(df)
You can create new columns with simple arithmetic. The axis=1 parameter in .drop() tells pandas to remove a column (not a row).
Sorting Data
Sorting helps you find extremes — highest sales, oldest records, most frequent categories. Pandas makes this straightforward.
import pandas as pd
df = pd.DataFrame({
"employee": ["Alice", "Bob", "Charlie", "Diana"],
"salary": [75000, 62000, 95000, 88000]
})
print(df.sort_values("salary", ascending=False))
print(df.sort_values(["salary"], ascending=[False]).head(2))
Use ascending=False for descending order. You can sort by multiple columns by passing a list. The .head(2) grabs the top 2 after sorting.
Key Takeaways
- .loc uses labels, .iloc uses positions — never confuse them
- New columns are created with simple assignment
- .drop() removes rows or columns depending on the axis parameter
- .sort_values() orders data by any column or combination of columns