Data Frame Operations

Creating a data frame is step one. The real work is slicing, dicing, adding, and cleaning. R gives you several ways to grab the data you need, add new columns, and handle those frustrating missing values that every real dataset has.

Subsetting with [] and $

Use $ to pull out a single column as a vector. Use square brackets with [rows, columns] — just like matrices. You can filter rows with a logical condition and pick columns by name or position.

students <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(22, 19, 24),
  grade = c("A", "B", "A")
)

# Access a column with $
students$name

# Rows where age > 20
students[students$age > 20, ]

# Specific rows and columns
students[1:2, c("name", "grade")]

Try it Yourself →

Using subset() for Cleaner Filtering

Writing students[students$age > 20, ] works, but it's repetitive. The subset() function lets you write the condition without repeating the data frame name. You can also pick which columns to keep.

students <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(22, 19, 24),
  grade = c("A", "B", "A")
)

# Cleaner filtering
subset(students, age > 20)

# Keep only specific columns
subset(students, grade == "A", select = c(name, age))

Try it Yourself →

Adding Columns and Handling NAs

Add a new column simply by assigning to a name that doesn't exist yet — use $ or []. Real data has missing values, represented as NA. Use na.omit() to drop rows with any missing values.

students <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(22, 19, 24),
  grade = c("A", "B", "A")
)

# Add a column
students$passed <- c(TRUE, TRUE, TRUE)

# A dataset with missing values
data <- data.frame(
  x = c(1, 2, NA, 4),
  y = c(NA, 5, 6, 7)
)

# Remove rows with any NA
na.omit(data)

Try it Yourself →

← Data Frames

If & Else →