Learnixo

AI/ML/NLP Research Track · Lesson 5 of 16

Pandas Detailed Tutorial

Pandas Detailed Tutorial (Beginner to Advanced)

Pandas helps you load, clean, transform, and analyze tabular data efficiently. Most real-world Python data tasks use Pandas daily.


1) Setup

Bash
pip install pandas
Python
import pandas as pd

2) Series and DataFrame Basics

Python
data = {
    "name": ["Ali", "Sara", "John"],
    "age": [22, 25, 28],
    "city": ["Lahore", "Karachi", "Berlin"]
}
df = pd.DataFrame(data)
print(df.head())

Key objects:

  • Series: single column
  • DataFrame: 2D table

3) Load and Save Data

Python
df = pd.read_csv("sales.csv")
df.to_csv("sales_clean.csv", index=False)

Also common:

  • read_excel, read_json, read_parquet

4) Select, Filter, Sort

Python
df["revenue"]                  # single column
df[["name", "revenue"]]        # multiple columns
df[df["revenue"] > 1000]       # row filter
df.sort_values("revenue", ascending=False)

Use .loc for label-based access:

Python
df.loc[df["city"] == "Lahore", ["name", "age"]]

5) Missing Data Handling

Python
df.isna().sum()
df = df.dropna(subset=["customer_id"])
df["discount"] = df["discount"].fillna(0)

Always inspect missing values before dropping rows blindly.


6) Feature Engineering

Python
df["profit"] = df["revenue"] - df["cost"]
df["profit_margin"] = (df["profit"] / df["revenue"]).round(3)

Use assign for chained transformations:

Python
df = df.assign(net=lambda x: x["revenue"] - x["cost"])

7) GroupBy and Aggregation

Python
summary = (
    df.groupby("category")
      .agg(
          total_revenue=("revenue", "sum"),
          avg_profit=("profit", "mean"),
          orders=("order_id", "count")
      )
      .reset_index()
)

This is one of the most important Pandas skills for analytics jobs.


8) Merge and Join

Python
orders = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")

merged = orders.merge(customers, on="customer_id", how="left")

Join types:

  • left: keep all left rows
  • inner: matching rows only
  • outer: union of both sides

9) Date and Time Analysis

Python
df["order_date"] = pd.to_datetime(df["order_date"])
df["month"] = df["order_date"].dt.to_period("M")

monthly = df.groupby("month")["revenue"].sum()

Time-based grouping is essential for trend reports.


10) Performance Tips

  • Use vectorized operations, avoid for loops
  • Prefer category dtype for low-cardinality text fields
  • Read only needed columns (usecols=...)
  • Use Parquet for large datasets when possible

11) End-to-End Mini Project

Build a sales analysis report:

  1. Load raw order data
  2. Clean nulls and invalid rows
  3. Add profit, margin, month
  4. Create KPI table by category and city
  5. Export summary to CSV

12) Common Mistakes

  • SettingWithCopyWarning from chained assignments
  • Incorrect join keys producing duplicates
  • Ignoring dtypes before aggregations
  • Converting datetime too late

Next Step

Visualize your analysis using: