Deep Learning for AI Interviews · Lesson 56 of 56

Deep Learning Interview Strategy & Common Traps

The PREP Framework for DL Questions

P — Principle: What is the core concept?
R — Reason: Why does it work / what problem does it solve?
E — Example: Concrete code or calculation
P — Pitfall: What goes wrong if you misuse it?

Example: "Explain Batch Normalisation"
  P: Normalises layer inputs to zero mean, unit variance per mini-batch
  R: Reduces internal covariate shift → enables higher learning rates
  E: model.train() vs model.eval() — different behaviour
  P: Using model.train() during validation → noisy validation metrics

Answering "Explain X to a Non-Expert"

Interviewers often ask you to explain technical concepts to a clinical audience.
Structure: intuition → benefit → limitation → when I'd use it

Example: "Explain transfer learning to a cardiologist"

  Intuition: "Imagine a radiologist who has read 100,000 chest X-rays.
  When they look at a cardiac MRI for the first time, they don't start from scratch —
  they use their existing knowledge of anatomy, normal vs abnormal, and what
  to look for. Transfer learning does the same: we start with a model
  that has already 'seen' millions of natural images and adapt it to your data."
  
  Benefit: "We need far less labelled ECG/imaging data to achieve good performance,
  because the model already knows how to detect edges, textures, and patterns."
  
  Limitation: "The original model was trained on photos of cats and cars,
  not medical scans. Some domain-specific patterns need to be learned from scratch."
  
  When I'd use it: "Any time we have fewer than 50,000 labelled clinical examples —
  which is almost always — transfer learning should be the first thing we try."

The Clinical AI System Design Template

When asked to design a clinical AI system, structure around 5 concerns:

1. Data
   - Source: EHR, DICOM, waveforms, text?
   - Volume: how many labelled examples?
   - Quality: noise level, missing values, class imbalance?
   - Splits: train/val/test — stratify by site, time, and demographic

2. Model
   - Architecture: MLP (tabular), CNN (imaging), LSTM (time-series), Transformer (text)?
   - Pre-training: ImageNet, ClinicalBERT, domain-specific?
   - Baseline: always compare to logistic regression / XGBoost first

3. Training
   - Loss: BCE + pos_weight for imbalance; Focal loss for extreme imbalance
   - Regularisation: AdamW + weight decay, Dropout, early stopping
   - Monitoring: train/val loss, AUC, gradient norms

4. Evaluation
   - Metrics: AUC-ROC, AUPRC (for imbalanced), calibration (ECE)
   - Subgroups: age, sex, ethnicity, site — detect bias
   - Clinical utility: net benefit analysis, decision curve analysis

5. Deployment
   - Integration: REST API, HL7/FHIR, EHR plugin?
   - Monitoring: prediction distribution drift, feedback loop
   - Governance: HIPAA/GDPR compliance, audit logs, model versioning
   - Explainability: SHAP values, Grad-CAM for clinician trust

Common Interview Traps and Correct Answers

Python

# TRAP 1: "Just use accuracy for the classification metric"
# CORRECT: Accuracy is misleading for imbalanced clinical data

n_negative = 950   # healthy
n_positive = 50    # readmitted
# A model that always predicts "healthy" gets 95% accuracy but AUC = 0.5

import numpy as np
from sklearn.metrics import roc_auc_score, average_precision_score

y_true = np.array([0]*950 + [1]*50)
y_pred_trivial = np.zeros(1000)   # always predict negative

accuracy = (y_pred_trivial == y_true).mean()
print(f"Trivial accuracy: {accuracy:.1%}")   # 95% — misleading!

# AUC and AUPRC reveal the truth
# AUPRC is especially important for rare events

Python

# TRAP 2: "Apply sigmoid before CrossEntropyLoss"
import torch
import torch.nn as nn

logits = torch.randn(8, 5)
labels = torch.randint(0, 5, (8,))

# WRONG: double softmax (CE applies log_softmax internally)
wrong_probs = torch.softmax(logits, dim=-1)
# wrong_loss = nn.CrossEntropyLoss()(wrong_probs, labels)  # BAD

# CORRECT: raw logits
correct_loss = nn.CrossEntropyLoss()(logits, labels)
print(f"Correct CE loss: {correct_loss.item():.4f}")

# Same trap with binary: don't apply sigmoid before BCEWithLogitsLoss
logit = torch.randn(8)
label = torch.randint(0, 2, (8,)).float()
correct_bce = nn.BCEWithLogitsLoss()(logit, label)   # internal sigmoid

Python

# TRAP 3: "Set model to train mode always"
import torch.nn as nn

model = nn.Sequential(nn.Linear(10, 32), nn.BatchNorm1d(32), nn.Dropout(0.5), nn.Linear(32, 1))
X = torch.randn(32, 10)

model.train()
out_train_1 = model(X)
out_train_2 = model(X)
# These are DIFFERENT because Dropout is stochastic
print(f"Train mode outputs match: {torch.allclose(out_train_1, out_train_2)}")  # False

model.eval()
out_eval_1 = model(X)
out_eval_2 = model(X)
# These are IDENTICAL — deterministic inference
print(f"Eval mode outputs match: {torch.allclose(out_eval_1, out_eval_2)}")   # True

Handling Unknown Questions

Four-step approach for questions you don't know:

1. Define what you DO know
   "So attention in transformers — let me start with what attention fundamentally is:
   a mechanism for weighting the importance of different input tokens..."

2. Apply first principles
   "The gradient for this operation would need to flow back through the softmax.
   Since softmax is a smooth function, gradients should flow well..."

3. Give a related example
   "I haven't implemented cross-attention specifically, but self-attention is similar —
   the difference is just that Q comes from one sequence and K, V from another."

4. Acknowledge the gap and connect to what matters
   "I'm not certain about the exact implementation detail, but I know it's used in
   the decoder of T5 to attend to encoder outputs — which is why T5 can do both
   encoding and generation."

Never say "I don't know" as a complete answer.
Show thinking, not just final answers — interviewers want to see your process.

Interview Practice Script

Python

# Practice this for any deep learning concept:

def explain_concept(concept: str) -> dict:
    """Structure any concept as: what, why, how, when, pitfall."""
    templates = {
        "dropout": {
            "what":    "Randomly zeroes p fraction of activations during training",
            "why":     "Prevents co-adaptation: each neuron must be useful without relying on others",
            "how":     "inverted dropout: scale remaining activations by 1/(1-p) so expected output unchanged",
            "when":    "MLPs and transformer heads; not between BN layers in CNNs",
            "pitfall": "Forgetting model.eval() in validation — Dropout is still active!",
        },
        "batch_norm": {
            "what":    "Normalise activations to zero mean, unit variance per mini-batch",
            "why":     "Reduces internal covariate shift → enables higher lr, faster convergence",
            "how":     "μ_B, σ²_B → x̂ = (x-μ)/(σ+ε) → γx̂ + β (learnable γ, β)",
            "when":    "After linear/conv, before activation; CNN body (not after GAP)",
            "pitfall": "batch_size=1 fails in train mode; model.train() during validation = BN uses batch stats",
        },
        "residual_connection": {
            "what":    "Add input to output: H(x) = F(x) + x",
            "why":     "Provides gradient highway; learning F(x)=0 is easy (identity)",
            "how":     "out = relu(layer(x) + shortcut(x)) where shortcut adjusts dimensions",
            "when":    "Any deep network (> 10 layers); standard in ResNet, Transformers, modern MLPs",
            "pitfall": "Dimensions must match: use 1×1 conv or linear shortcut when channels change",
        },
    }
    return templates.get(concept, {"note": "Build your own template for this concept"})

for concept in ["dropout", "batch_norm", "residual_connection"]:
    info = explain_concept(concept)
    print(f"\n=== {concept.upper()} ===")
    for key, val in info.items():
        print(f"  {key:10s}: {val}")

Final Answer Checklist

Before finalising any interview answer:

✓ Did I define the core concept precisely?
✓ Did I explain WHY it works (mechanism, not just definition)?
✓ Did I give a concrete example or code sketch?
✓ Did I mention a common pitfall or edge case?
✓ Did I connect to clinical/real-world implications if applicable?
✓ Did I stay concise? (2–3 minutes spoken, not an essay)

Clinical AI bonus points:
✓ Mention class imbalance and how it affects the choice of loss/metric
✓ Acknowledge the need for external validation, not just dev-set AUC
✓ Address calibration — clinicians use probabilities for decisions
✓ Mention HIPAA/GDPR compliance, PHI, audit logging
✓ Show awareness of bias: subgroup analysis, demographic parity

Interview Answer

"Successful deep learning interviews require demonstrating both technical depth and practical judgement. Use the PREP framework: Principle (what), Reason (why it works), Example (code/calculation), Pitfall (what breaks it). Common traps: using accuracy for imbalanced clinical data (use AUC/AUPRC), applying sigmoid/softmax before the loss function (double activation bug), and forgetting model.eval() during validation (Dropout and BatchNorm behave differently). For system design questions, frame your answer around the five concerns: data quality and splits, model architecture and pre-training, training with appropriate loss and regularisation, evaluation on subgroups beyond aggregate AUC, and deployment with monitoring and governance. For unknown questions: start with what you know, reason from first principles, give a related example, and acknowledge the gap — interviewers value reasoning under uncertainty. In clinical AI roles: always connect technical choices to patient outcomes, data constraints, and regulatory requirements."

RNN and LSTM: Sequential Data Processing

Course Complete!

Back to All Courses