Pandas DataFrames

Deep dive into DataFrames — rows, columns, indexing.

Pandas DataFrames In Depth

We introduced DataFrames in the pandas intro, but there's so much more to them. Let's go deeper — indexing, selecting, filtering, and modifying data like a pro. This is where you build real fluency.

Indexing with loc and iloc

Pandas has two primary indexing methods: .loc (label-based) and .iloc (position-based). Mixing them up is the number one source of bugs for beginners.


import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "score": [85, 92, 78]
}, index=["x", "y", "z"])

print(df.loc["x", "name"])
print(df.iloc[0, 1])
print(df.loc["x":"y", "score"])

Use .loc when you know the labels. Use .iloc when you want position. For example, .iloc[0] always gives you the first row, regardless of what the index is called.

Try it Yourself →

Adding and Removing Columns

DataFrames are mutable — you can add, remove, and modify columns freely. This is essential for feature engineering later on.


import pandas as pd

df = pd.DataFrame({
    "price": [10, 20, 30],
    "quantity": [5, 3, 8]
})

df["total"] = df["price"] * df["quantity"]
df["discount"] = 0.1
df = df.drop("discount", axis=1)
print(df)

You can create new columns with simple arithmetic. The axis=1 parameter in .drop() tells pandas to remove a column (not a row).

Try it Yourself →

Sorting Data

Sorting helps you find extremes — highest sales, oldest records, most frequent categories. Pandas makes this straightforward.


import pandas as pd

df = pd.DataFrame({
    "employee": ["Alice", "Bob", "Charlie", "Diana"],
    "salary": [75000, 62000, 95000, 88000]
})

print(df.sort_values("salary", ascending=False))
print(df.sort_values(["salary"], ascending=[False]).head(2))

Use ascending=False for descending order. You can sort by multiple columns by passing a list. The .head(2) grabs the top 2 after sorting.

Try it Yourself →

Key Takeaways

.loc uses labels, .iloc uses positions — never confuse them
New columns are created with simple assignment
.drop() removes rows or columns depending on the axis parameter
.sort_values() orders data by any column or combination of columns

← Previous Data Cleaning

Next → Pandas GroupBy