All Courses
Data Quality & MonitoringIntermediateNEW
Statistics for Data Engineers
Apply statistics to build smarter data pipelines — distributions, outlier detection, drift testing with KS and chi-squared tests, time series decomposition, CUSUM, and anomaly detection with Isolation Forest and Prophet.
4.8rating0 students2h 30m total3 lessons
What you'll learn
Use mean, median, IQR, and percentiles to understand pipeline metric distributions
Detect outliers with z-scores, IQR fences, and Tukey bounds
Test for data drift between pipeline runs with Kolmogorov-Smirnov and chi-squared
Build a DataQualityChecker class with statistical validation in Python
Decompose time series metrics into trend, seasonality, and residuals
Detect anomalies with CUSUM for gradual drift and Isolation Forest for multivariate data
Use Prophet to forecast pipeline data volumes and alert on SLA breaches
Final Project
Build a production pipeline monitoring system: statistical outlier detection, drift testing, time series anomaly alerts
Curriculum
3 lessons · 2h 30mCourse Info
Lessons3 lessons
Total time2h 30m
LevelIntermediate
Students0
Rating4.8 / 5.0