Data EngineeringIntermediate → SeniorNEW

Databricks — Delta Lake & PySpark

Build production data pipelines on Databricks — Delta Lake ACID transactions and MERGE upserts, Medallion Architecture, Delta Live Tables, Auto Loader, Structured Streaming, advanced PySpark optimizations, MLflow model lifecycle, Unity Catalog governance, and Feature Store.

4.9rating1,020 students6h 30m total4 lessons

Start Course

What you'll learn

Create Delta tables with ACID transactions, time travel, schema enforcement, and MERGE upserts

Build Medallion Architecture pipelines: Bronze append → Silver MERGE → Gold aggregation

Use Delta Live Tables with @dlt.table and @dlt.expect for declarative quality pipelines

Ingest files incrementally with Auto Loader and stream Kafka into Delta tables

Use dbutils for secrets, file operations, and parameterized notebook workflows

Optimize PySpark with Photon, Liquid Clustering, AQE, broadcast joins, and salting

Track ML experiments with MLflow and register models with Unity Catalog

Govern data with Unity Catalog: column masks, row filters, Delta Sharing, and lineage

Final Project

Build a full Databricks data platform: Auto Loader → DLT Bronze/Silver/Gold pipeline → MLflow model tracking → Unity Catalog governance

Curriculum

4 lessons · 6h 30m

Databricks Platform: Architecture & First Pipeline

60 min

Delta Lake: MERGE, Time Travel & Medallion Pipeline

55 min

Advanced PySpark: DLT, Auto Loader & Optimization

55 min

MLflow & Unity Catalog: Models, Governance & Sharing

60 min

Course Info

Lessons4 lessons

Total time6h 30m

LevelIntermediate → Senior

Students1,020

Rating4.9 / 5.0

Start Course — Free