Labs ICT
Pro Login

Loading Data

CSV, Excel, JSON, databases — reading data from everywhere.

Loading Data

The first step in any data project is getting data into your environment. Data lives in many formats — CSV files, Excel spreadsheets, JSON APIs, SQL databases, and more. Let's learn how to load them all with pandas.

CSV Files

CSV (Comma-Separated Values) is the most common data format. It's simple, human-readable, and every tool supports it. Pandas makes loading CSVs a one-liner.


import pandas as pd

df = pd.read_csv("sales_data.csv")
print(df.head())

df_custom = pd.read_csv("data.csv", sep=";", encoding="utf-8")
print(df_custom.shape)
    

The separator isn't always a comma — some files use semicolons or tabs. The sep parameter handles that. Always check your data's actual delimiter before loading.

Try it Yourself →

Excel Files

Many business users prefer Excel. Pandas can read Excel files directly, including specific sheets within a workbook.


import pandas as pd

df = pd.read_excel("report.xlsx")
print(df.columns.tolist())

df_sheet = pd.read_excel("report.xlsx", sheet_name="Q4")
print(df_sheet.head())
    

If your Excel file has multiple sheets, you need to specify which one to read. Use sheet_name with the sheet's name or index.

Try it Yourself →

JSON Files

JSON is the standard format for APIs and web data. It's hierarchical, which makes it trickier to load into a flat table, but pandas handles it well.


import pandas as pd

df = pd.read_json("users.json")
print(df.head())

df_api = pd.json_normalize(data, record_path="items")
print(df_api.head())
    

When JSON has nested structures, json_normalize flattens them into a table. This is essential when working with API responses.

Try it Yourself →

SQL Databases

Most companies store their data in SQL databases. You can query them directly from pandas and load the results into a DataFrame.


import pandas as pd
import sqlite3

conn = sqlite3.connect("company.db")
query = "SELECT * FROM employees WHERE salary > 50000"
df = pd.read_sql(query, conn)
print(df.head())
conn.close()
    

This is incredibly powerful — you can combine SQL's querying power with pandas' analysis capabilities in one workflow.

Try it Yourself →

Key Takeaways

  • read_csv() is the most commonly used data loading function
  • Always check the separator and encoding when loading CSVs
  • Excel files may have multiple sheets — specify which one to read
  • JSON normalization flattens nested data into table format
  • pandas can query SQL databases directly