Learnixo
All Courses
Data Quality & MonitoringIntermediateNEW

Statistics for Data Engineers

Apply statistics to build smarter data pipelines — distributions, outlier detection, drift testing with KS and chi-squared tests, time series decomposition, CUSUM, and anomaly detection with Isolation Forest and Prophet.

4.8rating0 students2h 30m total3 lessons

What you'll learn

Use mean, median, IQR, and percentiles to understand pipeline metric distributions
Detect outliers with z-scores, IQR fences, and Tukey bounds
Test for data drift between pipeline runs with Kolmogorov-Smirnov and chi-squared
Build a DataQualityChecker class with statistical validation in Python
Decompose time series metrics into trend, seasonality, and residuals
Detect anomalies with CUSUM for gradual drift and Isolation Forest for multivariate data
Use Prophet to forecast pipeline data volumes and alert on SLA breaches

Final Project

Build a production pipeline monitoring system: statistical outlier detection, drift testing, time series anomaly alerts

Curriculum

3 lessons · 2h 30m

Course Info

Lessons3 lessons
Total time2h 30m
LevelIntermediate
Students0
Rating4.8 / 5.0
Start Course — Free