Loading Data
The first step in any data project is getting data into your environment. Data lives in many formats — CSV files, Excel spreadsheets, JSON APIs, SQL databases, and more. Let's learn how to load them all with pandas.
CSV Files
CSV (Comma-Separated Values) is the most common data format. It's simple, human-readable, and every tool supports it. Pandas makes loading CSVs a one-liner.
import pandas as pd
df = pd.read_csv("sales_data.csv")
print(df.head())
df_custom = pd.read_csv("data.csv", sep=";", encoding="utf-8")
print(df_custom.shape)
The separator isn't always a comma — some files use semicolons or tabs. The sep parameter handles that. Always check your data's actual delimiter before loading.
Excel Files
Many business users prefer Excel. Pandas can read Excel files directly, including specific sheets within a workbook.
import pandas as pd
df = pd.read_excel("report.xlsx")
print(df.columns.tolist())
df_sheet = pd.read_excel("report.xlsx", sheet_name="Q4")
print(df_sheet.head())
If your Excel file has multiple sheets, you need to specify which one to read. Use sheet_name with the sheet's name or index.
JSON Files
JSON is the standard format for APIs and web data. It's hierarchical, which makes it trickier to load into a flat table, but pandas handles it well.
import pandas as pd
df = pd.read_json("users.json")
print(df.head())
df_api = pd.json_normalize(data, record_path="items")
print(df_api.head())
When JSON has nested structures, json_normalize flattens them into a table. This is essential when working with API responses.
SQL Databases
Most companies store their data in SQL databases. You can query them directly from pandas and load the results into a DataFrame.
import pandas as pd
import sqlite3
conn = sqlite3.connect("company.db")
query = "SELECT * FROM employees WHERE salary > 50000"
df = pd.read_sql(query, conn)
print(df.head())
conn.close()
This is incredibly powerful — you can combine SQL's querying power with pandas' analysis capabilities in one workflow.
Try it Yourself →Key Takeaways
- read_csv() is the most commonly used data loading function
- Always check the separator and encoding when loading CSVs
- Excel files may have multiple sheets — specify which one to read
- JSON normalization flattens nested data into table format
- pandas can query SQL databases directly