AI Systemsbeginner
Pandas Detailed Tutorial: Data Cleaning, Analysis, and Real Workflows
Learn Pandas from beginner to advanced with DataFrame fundamentals, cleaning, joins, groupby, time series, and practical end-to-end analysis workflows.
Asma HafeezMay 6, 20263 min read
PandasPythonData AnalysisData CleaningDataFrameGroupByTime Series
Pandas Detailed Tutorial (Beginner to Advanced)
Pandas helps you load, clean, transform, and analyze tabular data efficiently. Most real-world Python data tasks use Pandas daily.
1) Setup
Bash
pip install pandasPython
import pandas as pd2) Series and DataFrame Basics
Python
data = {
"name": ["Ali", "Sara", "John"],
"age": [22, 25, 28],
"city": ["Lahore", "Karachi", "Berlin"]
}
df = pd.DataFrame(data)
print(df.head())Key objects:
Series: single columnDataFrame: 2D table
3) Load and Save Data
Python
df = pd.read_csv("sales.csv")
df.to_csv("sales_clean.csv", index=False)Also common:
read_excel,read_json,read_parquet
4) Select, Filter, Sort
Python
df["revenue"] # single column
df[["name", "revenue"]] # multiple columns
df[df["revenue"] > 1000] # row filter
df.sort_values("revenue", ascending=False)Use .loc for label-based access:
Python
df.loc[df["city"] == "Lahore", ["name", "age"]]5) Missing Data Handling
Python
df.isna().sum()
df = df.dropna(subset=["customer_id"])
df["discount"] = df["discount"].fillna(0)Always inspect missing values before dropping rows blindly.
6) Feature Engineering
Python
df["profit"] = df["revenue"] - df["cost"]
df["profit_margin"] = (df["profit"] / df["revenue"]).round(3)Use assign for chained transformations:
Python
df = df.assign(net=lambda x: x["revenue"] - x["cost"])7) GroupBy and Aggregation
Python
summary = (
df.groupby("category")
.agg(
total_revenue=("revenue", "sum"),
avg_profit=("profit", "mean"),
orders=("order_id", "count")
)
.reset_index()
)This is one of the most important Pandas skills for analytics jobs.
8) Merge and Join
Python
orders = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")
merged = orders.merge(customers, on="customer_id", how="left")Join types:
left: keep all left rowsinner: matching rows onlyouter: union of both sides
9) Date and Time Analysis
Python
df["order_date"] = pd.to_datetime(df["order_date"])
df["month"] = df["order_date"].dt.to_period("M")
monthly = df.groupby("month")["revenue"].sum()Time-based grouping is essential for trend reports.
10) Performance Tips
- Use vectorized operations, avoid
forloops - Prefer
categorydtype for low-cardinality text fields - Read only needed columns (
usecols=...) - Use Parquet for large datasets when possible
11) End-to-End Mini Project
Build a sales analysis report:
- Load raw order data
- Clean nulls and invalid rows
- Add
profit,margin,month - Create KPI table by category and city
- Export summary to CSV
12) Common Mistakes
SettingWithCopyWarningfrom chained assignments- Incorrect join keys producing duplicates
- Ignoring dtypes before aggregations
- Converting datetime too late
Next Step
Visualize your analysis using:
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.