Learnixo
All Courses
Big Data EngineeringIntermediate → SeniorNEW

PySpark & Apache Spark

Master PySpark for large-scale data engineering — Spark architecture, DataFrames, SQL, window functions, Delta Lake MERGE, Structured Streaming from Kafka, and production performance optimization.

4.9rating0 students3h 45m total4 lessons

What you'll learn

Understand Spark architecture: driver, executors, DAG scheduler, and lazy evaluation
Read and write CSV, Parquet, JSON, and Delta Lake with explicit schemas
Transform DataFrames with select, filter, window functions, and complex joins
Write Python UDFs and high-performance pandas UDFs
Stream events from Kafka into Delta Lake with watermarking and fault tolerance
Debug performance with the Spark UI and optimize with broadcast joins and AQE
Implement SCD Type 2 and MERGE upserts with Delta Lake

Final Project

Build a streaming Medallion Architecture pipeline: Kafka → PySpark Structured Streaming → Bronze/Silver/Gold Delta tables with data quality checks

Curriculum

4 lessons · 3h 45m

Course Info

Lessons4 lessons
Total time3h 45m
LevelIntermediate → Senior
Students0
Rating4.9 / 5.0
Start Course — Free