Learnixo
Back to blog
AI Systemsintermediate

ML Terminology Quick-Reference for Interviews

A comprehensive ML vocabulary reference for AI engineering interviews: every term from features and loss to regularization, ensembles, and production concepts — with concise, interview-ready definitions.

Asma Hafeez KhanMay 16, 20266 min read
Machine LearningTerminologyInterviewDefinitionsReference
Share:𝕏

Core Learning Concepts

| Term | Definition | |---|---| | Training | Process of adjusting model weights to minimize loss on labeled data | | Inference | Using a trained model to make predictions on new, unseen data | | Epoch | One complete pass through the entire training dataset | | Batch | A subset of training data processed in one forward/backward pass | | Mini-batch gradient descent | Update weights after each small batch (default in practice) | | Convergence | State where loss stops decreasing — training is complete | | Loss function | Measures how wrong predictions are; the thing being minimized | | Gradient descent | Algorithm for minimizing loss by moving weights opposite to gradient | | Learning rate | Step size for weight updates; too high = unstable, too low = slow | | Generalization | Ability to perform well on unseen data, not just training data |


Data and Splits

| Term | Definition | |---|---| | Feature (X) | Input variable used by the model | | Label (y) | The correct output value the model predicts | | Training set | Data the model learns from (weights are updated on this) | | Validation set | Used during development to tune hyperparameters | | Test set | Used once at the end to report final, unbiased performance | | Data leakage | When information from the future or test set contaminates training | | Stratified split | Split that preserves class distribution across all subsets | | Cross-validation | Technique where multiple train/val splits are used for more reliable evaluation | | k-fold CV | k splits, train on k-1, validate on 1, repeat k times |


Model Behavior

| Term | Definition | |---|---| | Overfitting | Model performs well on training data but poorly on unseen data — memorized instead of learned | | Underfitting | Model performs poorly on both training and validation — too simple | | Bias | Error from wrong assumptions; underfitting model has high bias | | Variance | Sensitivity to noise in training data; overfitting model has high variance | | Bias-variance tradeoff | Reducing one typically increases the other; goal is finding the sweet spot | | Regularization | Technique to reduce overfitting by penalizing model complexity | | L1 (Lasso) | Regularization that drives some weights to exactly zero (feature selection) | | L2 (Ridge) | Regularization that shrinks all weights toward zero but rarely to exactly zero | | Dropout | Neural network regularization: randomly zero out neurons during training | | Early stopping | Stop training when validation loss starts increasing |


Model Types

| Term | Definition | |---|---| | Supervised learning | Training with labeled data (X, y pairs) | | Unsupervised learning | Finding structure in unlabeled data | | Reinforcement learning | Learning through trial and reward from environment | | Classification | Predicting a discrete category label | | Regression | Predicting a continuous numeric value | | Binary classification | Two classes: yes/no, spam/not spam | | Multi-class classification | More than two mutually exclusive classes | | Multi-label classification | Multiple labels can be true simultaneously | | Ensemble | Combining multiple models for better performance | | Bagging | Train models on random data subsets, average predictions (Random Forest) | | Boosting | Train models sequentially, each correcting the previous (XGBoost, LightGBM) |


Evaluation Metrics

| Term | Definition | |---|---| | Accuracy | Fraction of correct predictions; misleading on imbalanced datasets | | Precision | Of all positive predictions, what fraction are correct? TP / (TP + FP) | | Recall (Sensitivity) | Of all actual positives, what fraction were caught? TP / (TP + FN) | | F1 score | Harmonic mean of precision and recall; good when both matter | | AUC-ROC | Area under the ROC curve; measures discrimination ability across all thresholds | | Confusion matrix | Table showing TP, TN, FP, FN for classification | | MSE | Mean Squared Error — average squared prediction error (regression) | | MAE | Mean Absolute Error — average absolute prediction error (regression) | | | Fraction of variance in y explained by the model |


Hyperparameters vs Parameters

| Term | Definition | Set By | |---|---|---| | Parameters / Weights | Values the model learns during training | Training algorithm | | Hyperparameters | Settings that control the training process | You, before training | | Grid search | Try all hyperparameter combinations exhaustively | | Random search | Sample hyperparameter combinations randomly — often better than grid | | Bayesian optimization | Use past results to guide where to search next |


Production and MLOps

| Term | Definition | |---|---| | Data drift | Input feature distribution changes from training distribution | | Concept drift | The relationship between X and y changes over time | | Model monitoring | Tracking model performance metrics in production | | Feature store | Centralized repository for features used across ML models | | Pipeline | End-to-end automated sequence: data → features → training → serving | | A/B testing | Splitting traffic between two model versions to compare performance | | Inference latency | Time to generate a prediction at serving time | | Throughput | Number of predictions the model can handle per second |


Embeddings and Representations

| Term | Definition | |---|---| | Embedding | Dense vector representation of a data point (text, image, etc.) | | Dimensionality reduction | Compressing high-dimensional features into fewer dimensions | | PCA | Principal Component Analysis — linear dimensionality reduction | | t-SNE | Non-linear dimensionality reduction used for visualization | | Cosine similarity | Angle-based similarity between two vectors — common for embeddings | | Euclidean distance | Straight-line distance between two points in feature space |


Interview One-Liners

Overfitting: "The model memorized training noise — low training loss, high validation loss."

Bias-variance tradeoff: "Simple models underfit (high bias), complex models overfit (high variance). Regularization, more data, or model selection balances them."

Precision vs recall: "Precision: when you say positive, how often are you right? Recall: of all actual positives, how many did you find?"

Data leakage: "Using information during training that wouldn't be available at prediction time — makes training metrics misleadingly optimistic."

F1 score: "Use it when both false positives and false negatives are costly and the dataset is imbalanced."

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.