Creating a data frame is step one. The real work is slicing, dicing, adding, and cleaning. R gives you several ways to grab the data you need, add new columns, and handle those frustrating missing values that every real dataset has.
Subsetting with [] and $
Use $ to pull out a single column as a vector. Use square brackets with [rows, columns] — just like matrices. You can filter rows with a logical condition and pick columns by name or position.
students <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(22, 19, 24),
grade = c("A", "B", "A")
)
# Access a column with $
students$name
# Rows where age > 20
students[students$age > 20, ]
# Specific rows and columns
students[1:2, c("name", "grade")]
Try it Yourself →
Using subset() for Cleaner Filtering
Writing students[students$age > 20, ] works, but it's repetitive. The subset() function lets you write the condition without repeating the data frame name. You can also pick which columns to keep.
students <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(22, 19, 24),
grade = c("A", "B", "A")
)
# Cleaner filtering
subset(students, age > 20)
# Keep only specific columns
subset(students, grade == "A", select = c(name, age))
Try it Yourself →
Adding Columns and Handling NAs
Add a new column simply by assigning to a name that doesn't exist yet — use $ or []. Real data has missing values, represented as NA. Use na.omit() to drop rows with any missing values.
students <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(22, 19, 24),
grade = c("A", "B", "A")
)
# Add a column
students$passed <- c(TRUE, TRUE, TRUE)
# A dataset with missing values
data <- data.frame(
x = c(1, 2, NA, 4),
y = c(NA, 5, 6, 7)
)
# Remove rows with any NA
na.omit(data)
Try it Yourself →