Deep Learning for AI Interviews · Lesson 5 of 56
Interview: DL vs ML — When Would You Choose Each?
Q1: You have a clinical dataset with 2000 labelled patient records and 50 engineered features. Should you use deep learning?
"No, not as the first choice. 2000 labelled examples with 50 features is a regime where gradient boosted trees (XGBoost or LightGBM) typically outperform neural networks. XGBoost handles mixed feature types well, is robust to unscaled features, doesn't require a GPU, trains in seconds, and is more interpretable — which matters for clinical model validation. I'd establish XGBoost as the baseline, then possibly try a small neural network (2–3 layers, with dropout) to see if there's any gain. If the gain is less than 1–2% AUC, the added complexity of DL is not justified."
Q2: When would you use deep learning on tabular clinical data?
"Three scenarios: large scale (> 100K examples, where neural networks start to benefit from their representational capacity); high cardinality categorical features (like free-text ICD codes, drug codes — embeddings handle these better than one-hot encoding in trees); and multimodal fusion (combining tabular features with images, text, or time series signals, where a shared neural architecture is cleaner than separate models). Architectures like TabNet, FT-Transformer, and simple MLP with embedding layers can compete with gradient boosting at scale, but they require more tuning."
Q3: What's the trade-off between training a model from scratch vs fine-tuning a pre-trained model?
"Training from scratch: need large labelled datasets (100K+ for images, billions of tokens for LLMs), high compute, weeks of training. The result is a model maximally adapted to your domain. Fine-tuning a pre-trained model: much faster and cheaper (hours instead of weeks), works with moderate labelled data (thousands), but the model's initial representations are shaped by its pre-training distribution. For clinical NLP, I'd use a domain-specific pre-trained model (BiomedBERT, ClinicalBERT) and fine-tune on clinical notes rather than starting from scratch — the biomedical pre-training gives the model clinical vocabulary and domain knowledge for free."
Q4: A team member says "just use a bigger neural network — it always wins." How do you respond?
"That's partially true at scale, but in practice the answer is more nuanced. Bigger models win when: data is abundant (the model can use the extra capacity); the problem is genuinely complex (simple problems are already solved by small models); and compute is available. Bigger models lose when: you have limited data (larger model overfits, smaller model generalises better); the task is structurally simple (logistic regression gives interpretable results that are equally good); or when latency/cost is constrained at inference time. In clinical AI, I'd also add: a model that's 2% more accurate but a black box may be rejected for deployment in favour of a less accurate but interpretable model that clinicians can validate."
Q5: How do you explain to a non-technical clinician why deep learning models need so much data?
"Traditional models work like a scoring form — you tell them which measurements matter (age, BP, INR), and they learn the weights from a few hundred examples. Deep learning models learn what measurements to look at from scratch. Imagine teaching someone to read an ECG: instead of telling them 'look at the QRS complex', you show them 100,000 ECG pairs with known diagnoses and let them figure out which patterns matter. It takes many more examples because they're learning the features, not just the weights on pre-specified features. This is both the power (they can discover patterns humans didn't think to look for) and the limitation (they need more teaching examples)."
Q6: What's the difference between ML model accuracy and clinical utility?
"A critical distinction. ML accuracy (AUC, F1) measures statistical performance on a test set. Clinical utility measures whether the model actually improves patient outcomes. A model can have high AUC and zero clinical utility. Classic example: a sepsis alert model with AUC=0.85 might generate 50 alerts per day, of which 40 are false alarms. Clinicians alert-fatigue and start ignoring all alerts — worse than no model at all. Conversely, a model with AUC=0.72 that alerts on only the 5 highest-risk patients per shift, with 70% positive predictive value, might change 3 patient outcomes per month. Clinical utility requires: high positive predictive value, actionable alerts, workflow integration, and prospective validation. I always ask 'what decision will a clinician make differently because of this prediction?' before choosing to build a model."
Interview Answer Summary
"The DL vs ML choice is always empirical: start with the simplest model that could work (XGBoost for tabular, TF-IDF for text), measure performance, then justify any increase in complexity. Deep learning wins on raw unstructured inputs (images, text, signals) and at scale (hundreds of thousands of examples). Traditional ML wins on small-to-medium tabular data, when interpretability is required, or when compute is constrained. Pre-trained models change the equation: fine-tuning BERT on 2000 clinical notes often outperforms training a neural network from scratch on the same data. And always distinguish between ML performance metrics and clinical utility — the latter is what actually justifies deployment."