Back to blog
AI Systemsbeginner

Pandas Detailed Tutorial: Data Cleaning, Analysis, and Real Workflows

Learn Pandas from beginner to advanced with DataFrame fundamentals, cleaning, joins, groupby, time series, and practical end-to-end analysis workflows.

Asma HafeezMay 6, 20263 min read
PandasPythonData AnalysisData CleaningDataFrameGroupByTime Series
Share:𝕏

Pandas Detailed Tutorial (Beginner to Advanced)

Pandas helps you load, clean, transform, and analyze tabular data efficiently. Most real-world Python data tasks use Pandas daily.


1) Setup

Bash
pip install pandas
Python
import pandas as pd

2) Series and DataFrame Basics

Python
data = {
    "name": ["Ali", "Sara", "John"],
    "age": [22, 25, 28],
    "city": ["Lahore", "Karachi", "Berlin"]
}
df = pd.DataFrame(data)
print(df.head())

Key objects:

  • Series: single column
  • DataFrame: 2D table

3) Load and Save Data

Python
df = pd.read_csv("sales.csv")
df.to_csv("sales_clean.csv", index=False)

Also common:

  • read_excel, read_json, read_parquet

4) Select, Filter, Sort

Python
df["revenue"]                  # single column
df[["name", "revenue"]]        # multiple columns
df[df["revenue"] > 1000]       # row filter
df.sort_values("revenue", ascending=False)

Use .loc for label-based access:

Python
df.loc[df["city"] == "Lahore", ["name", "age"]]

5) Missing Data Handling

Python
df.isna().sum()
df = df.dropna(subset=["customer_id"])
df["discount"] = df["discount"].fillna(0)

Always inspect missing values before dropping rows blindly.


6) Feature Engineering

Python
df["profit"] = df["revenue"] - df["cost"]
df["profit_margin"] = (df["profit"] / df["revenue"]).round(3)

Use assign for chained transformations:

Python
df = df.assign(net=lambda x: x["revenue"] - x["cost"])

7) GroupBy and Aggregation

Python
summary = (
    df.groupby("category")
      .agg(
          total_revenue=("revenue", "sum"),
          avg_profit=("profit", "mean"),
          orders=("order_id", "count")
      )
      .reset_index()
)

This is one of the most important Pandas skills for analytics jobs.


8) Merge and Join

Python
orders = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")

merged = orders.merge(customers, on="customer_id", how="left")

Join types:

  • left: keep all left rows
  • inner: matching rows only
  • outer: union of both sides

9) Date and Time Analysis

Python
df["order_date"] = pd.to_datetime(df["order_date"])
df["month"] = df["order_date"].dt.to_period("M")

monthly = df.groupby("month")["revenue"].sum()

Time-based grouping is essential for trend reports.


10) Performance Tips

  • Use vectorized operations, avoid for loops
  • Prefer category dtype for low-cardinality text fields
  • Read only needed columns (usecols=...)
  • Use Parquet for large datasets when possible

11) End-to-End Mini Project

Build a sales analysis report:

  1. Load raw order data
  2. Clean nulls and invalid rows
  3. Add profit, margin, month
  4. Create KPI table by category and city
  5. Export summary to CSV

12) Common Mistakes

  • SettingWithCopyWarning from chained assignments
  • Incorrect join keys producing duplicates
  • Ignoring dtypes before aggregations
  • Converting datetime too late

Next Step

Visualize your analysis using:

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.