AI/ML/NLP Research Track · Lesson 5 of 16
Pandas Detailed Tutorial
Pandas Detailed Tutorial (Beginner to Advanced)
Pandas helps you load, clean, transform, and analyze tabular data efficiently. Most real-world Python data tasks use Pandas daily.
1) Setup
Bash
pip install pandasPython
import pandas as pd2) Series and DataFrame Basics
Python
data = {
"name": ["Ali", "Sara", "John"],
"age": [22, 25, 28],
"city": ["Lahore", "Karachi", "Berlin"]
}
df = pd.DataFrame(data)
print(df.head())Key objects:
Series: single columnDataFrame: 2D table
3) Load and Save Data
Python
df = pd.read_csv("sales.csv")
df.to_csv("sales_clean.csv", index=False)Also common:
read_excel,read_json,read_parquet
4) Select, Filter, Sort
Python
df["revenue"] # single column
df[["name", "revenue"]] # multiple columns
df[df["revenue"] > 1000] # row filter
df.sort_values("revenue", ascending=False)Use .loc for label-based access:
Python
df.loc[df["city"] == "Lahore", ["name", "age"]]5) Missing Data Handling
Python
df.isna().sum()
df = df.dropna(subset=["customer_id"])
df["discount"] = df["discount"].fillna(0)Always inspect missing values before dropping rows blindly.
6) Feature Engineering
Python
df["profit"] = df["revenue"] - df["cost"]
df["profit_margin"] = (df["profit"] / df["revenue"]).round(3)Use assign for chained transformations:
Python
df = df.assign(net=lambda x: x["revenue"] - x["cost"])7) GroupBy and Aggregation
Python
summary = (
df.groupby("category")
.agg(
total_revenue=("revenue", "sum"),
avg_profit=("profit", "mean"),
orders=("order_id", "count")
)
.reset_index()
)This is one of the most important Pandas skills for analytics jobs.
8) Merge and Join
Python
orders = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")
merged = orders.merge(customers, on="customer_id", how="left")Join types:
left: keep all left rowsinner: matching rows onlyouter: union of both sides
9) Date and Time Analysis
Python
df["order_date"] = pd.to_datetime(df["order_date"])
df["month"] = df["order_date"].dt.to_period("M")
monthly = df.groupby("month")["revenue"].sum()Time-based grouping is essential for trend reports.
10) Performance Tips
- Use vectorized operations, avoid
forloops - Prefer
categorydtype for low-cardinality text fields - Read only needed columns (
usecols=...) - Use Parquet for large datasets when possible
11) End-to-End Mini Project
Build a sales analysis report:
- Load raw order data
- Clean nulls and invalid rows
- Add
profit,margin,month - Create KPI table by category and city
- Export summary to CSV
12) Common Mistakes
SettingWithCopyWarningfrom chained assignments- Incorrect join keys producing duplicates
- Ignoring dtypes before aggregations
- Converting datetime too late
Next Step
Visualize your analysis using: