All Courses
Big Data EngineeringIntermediate → SeniorNEW
PySpark & Apache Spark
Master PySpark for large-scale data engineering — Spark architecture, DataFrames, SQL, window functions, Delta Lake MERGE, Structured Streaming from Kafka, and production performance optimization.
4.9rating0 students3h 45m total4 lessons
What you'll learn
Understand Spark architecture: driver, executors, DAG scheduler, and lazy evaluation
Read and write CSV, Parquet, JSON, and Delta Lake with explicit schemas
Transform DataFrames with select, filter, window functions, and complex joins
Write Python UDFs and high-performance pandas UDFs
Stream events from Kafka into Delta Lake with watermarking and fault tolerance
Debug performance with the Spark UI and optimize with broadcast joins and AQE
Implement SCD Type 2 and MERGE upserts with Delta Lake
Final Project
Build a streaming Medallion Architecture pipeline: Kafka → PySpark Structured Streaming → Bronze/Silver/Gold Delta tables with data quality checks
Curriculum
4 lessons · 3h 45mCourse Info
Lessons4 lessons
Total time3h 45m
LevelIntermediate → Senior
Students0
Rating4.9 / 5.0