Machine Learning Foundations · Lesson 5 of 70

ML Terminology Quick-Reference for Interviews

Core Learning Concepts

| Term | Definition | |---|---| | Training | Process of adjusting model weights to minimize loss on labeled data | | Inference | Using a trained model to make predictions on new, unseen data | | Epoch | One complete pass through the entire training dataset | | Batch | A subset of training data processed in one forward/backward pass | | Mini-batch gradient descent | Update weights after each small batch (default in practice) | | Convergence | State where loss stops decreasing — training is complete | | Loss function | Measures how wrong predictions are; the thing being minimized | | Gradient descent | Algorithm for minimizing loss by moving weights opposite to gradient | | Learning rate | Step size for weight updates; too high = unstable, too low = slow | | Generalization | Ability to perform well on unseen data, not just training data |

Data and Splits

| Term | Definition | |---|---| | Feature (X) | Input variable used by the model | | Label (y) | The correct output value the model predicts | | Training set | Data the model learns from (weights are updated on this) | | Validation set | Used during development to tune hyperparameters | | Test set | Used once at the end to report final, unbiased performance | | Data leakage | When information from the future or test set contaminates training | | Stratified split | Split that preserves class distribution across all subsets | | Cross-validation | Technique where multiple train/val splits are used for more reliable evaluation | | k-fold CV | k splits, train on k-1, validate on 1, repeat k times |

Model Behavior

| Term | Definition | |---|---| | Overfitting | Model performs well on training data but poorly on unseen data — memorized instead of learned | | Underfitting | Model performs poorly on both training and validation — too simple | | Bias | Error from wrong assumptions; underfitting model has high bias | | Variance | Sensitivity to noise in training data; overfitting model has high variance | | Bias-variance tradeoff | Reducing one typically increases the other; goal is finding the sweet spot | | Regularization | Technique to reduce overfitting by penalizing model complexity | | L1 (Lasso) | Regularization that drives some weights to exactly zero (feature selection) | | L2 (Ridge) | Regularization that shrinks all weights toward zero but rarely to exactly zero | | Dropout | Neural network regularization: randomly zero out neurons during training | | Early stopping | Stop training when validation loss starts increasing |

Model Types

| Term | Definition | |---|---| | Supervised learning | Training with labeled data (X, y pairs) | | Unsupervised learning | Finding structure in unlabeled data | | Reinforcement learning | Learning through trial and reward from environment | | Classification | Predicting a discrete category label | | Regression | Predicting a continuous numeric value | | Binary classification | Two classes: yes/no, spam/not spam | | Multi-class classification | More than two mutually exclusive classes | | Multi-label classification | Multiple labels can be true simultaneously | | Ensemble | Combining multiple models for better performance | | Bagging | Train models on random data subsets, average predictions (Random Forest) | | Boosting | Train models sequentially, each correcting the previous (XGBoost, LightGBM) |

Evaluation Metrics

| Term | Definition | |---|---| | Accuracy | Fraction of correct predictions; misleading on imbalanced datasets | | Precision | Of all positive predictions, what fraction are correct? TP / (TP + FP) | | Recall (Sensitivity) | Of all actual positives, what fraction were caught? TP / (TP + FN) | | F1 score | Harmonic mean of precision and recall; good when both matter | | AUC-ROC | Area under the ROC curve; measures discrimination ability across all thresholds | | Confusion matrix | Table showing TP, TN, FP, FN for classification | | MSE | Mean Squared Error — average squared prediction error (regression) | | MAE | Mean Absolute Error — average absolute prediction error (regression) | | R² | Fraction of variance in y explained by the model |

Hyperparameters vs Parameters

| Term | Definition | Set By | |---|---|---| | Parameters / Weights | Values the model learns during training | Training algorithm | | Hyperparameters | Settings that control the training process | You, before training | | Grid search | Try all hyperparameter combinations exhaustively | | Random search | Sample hyperparameter combinations randomly — often better than grid | | Bayesian optimization | Use past results to guide where to search next |

Production and MLOps

| Term | Definition | |---|---| | Data drift | Input feature distribution changes from training distribution | | Concept drift | The relationship between X and y changes over time | | Model monitoring | Tracking model performance metrics in production | | Feature store | Centralized repository for features used across ML models | | Pipeline | End-to-end automated sequence: data → features → training → serving | | A/B testing | Splitting traffic between two model versions to compare performance | | Inference latency | Time to generate a prediction at serving time | | Throughput | Number of predictions the model can handle per second |

Embeddings and Representations

| Term | Definition | |---|---| | Embedding | Dense vector representation of a data point (text, image, etc.) | | Dimensionality reduction | Compressing high-dimensional features into fewer dimensions | | PCA | Principal Component Analysis — linear dimensionality reduction | | t-SNE | Non-linear dimensionality reduction used for visualization | | Cosine similarity | Angle-based similarity between two vectors — common for embeddings | | Euclidean distance | Straight-line distance between two points in feature space |

Interview One-Liners

Overfitting: "The model memorized training noise — low training loss, high validation loss."

Bias-variance tradeoff: "Simple models underfit (high bias), complex models overfit (high variance). Regularization, more data, or model selection balances them."

Precision vs recall: "Precision: when you say positive, how often are you right? Recall: of all actual positives, how many did you find?"

Data leakage: "Using information during training that wouldn't be available at prediction time — makes training metrics misleadingly optimistic."

F1 score: "Use it when both false positives and false negatives are costly and the dataset is imbalanced."

How Does a Model Actually Learn?

Next Lesson

What is Supervised Learning?