Deep Learning for AI Interviews · Lesson 3 of 56

Why Deep Learning Reduces Feature Engineering

What Feature Engineering Is

Feature engineering is the process of using domain knowledge to transform raw data into informative inputs for a model.

Raw data: timestamps of patient events (admissions, labs, medications)

Manually engineered features:
  - days_since_last_admission
  - number_of_medications
  - INR_trend (last 3 readings: increasing/stable/decreasing)
  - age_category (under_40, 40–65, over_65)
  - has_AF (boolean from diagnosis codes)
  - creatinine_to_baseline_ratio

The model (logistic regression, XGBoost) then learns
which combination of these features predicts the outcome.

What Deep Learning Does Instead

Instead of asking humans to define informative features,
deep learning learns to extract them from raw data.

Raw ECG signal → CNN → automatically discovers:
  - Local wave shapes (P wave, QRS complex, T wave)
  - Rhythm patterns (regular vs irregular)
  - Amplitude features
  - Temporal correlations

The model didn't need a human to say "measure QRS duration."
It discovered that QRS-related patterns matter from training data.

This is called representation learning or feature learning.
The early layers of a deep network learn features;
the later layers combine them into the prediction.

Pros and Cons of Each

Feature Engineering:
  Pros:
    ✓ Works with small data (features reduce dimensionality)
    ✓ Faster to iterate (model training is cheap)
    ✓ Interpretable features — you know what each one means
    ✓ Domain knowledge enforced (impossible values prevented)
    ✓ Models train on CPU
  
  Cons:
    ✗ Time-consuming — requires domain expert involvement
    ✗ May miss non-obvious patterns that data could reveal
    ✗ Features may be brittle (poorly generalise to new sites)
    ✗ Hard to scale to very high-dimensional inputs (e.g., images)

Deep Learning:
  Pros:
    ✓ Scales to raw inputs (images, text, signals)
    ✓ Can discover unexpected patterns in data
    ✓ Transfer learning reuses features learned on large datasets
    ✓ End-to-end optimisation (features and prediction jointly optimised)
  
  Cons:
    ✗ Requires large labelled datasets (or pre-training)
    ✗ GPU required for practical training
    ✗ Learned features are not human-interpretable
    ✗ Harder to debug and validate

Hybrid Approaches

Feature engineering and deep learning are not mutually exclusive:

Python

import torch
import torch.nn as nn
import numpy as np

class HybridClinicalModel(nn.Module):
    """
    Combines:
      - Hand-crafted clinical features (tabular)
      - Automatic feature learning from ECG signal (1D CNN)
    """
    def __init__(self, n_tabular_features: int, ecg_length: int):
        super().__init__()
        
        # Branch 1: deep learning on raw ECG
        self.ecg_cnn = nn.Sequential(
            nn.Conv1d(12, 32, kernel_size=7, padding=3),  # 12-lead ECG
            nn.ReLU(),
            nn.MaxPool1d(4),
            nn.Conv1d(32, 64, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.AdaptiveAvgPool1d(1),                       # global average pool
        )  # output: (batch, 64)
        
        # Branch 2: clinical features (engineered)
        self.clinical_branch = nn.Sequential(
            nn.Linear(n_tabular_features, 64),
            nn.ReLU(),
            nn.Dropout(0.3),
        )  # output: (batch, 64)
        
        # Fusion: combine both representations
        self.head = nn.Sequential(
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid(),
        )
    
    def forward(self, ecg: torch.Tensor, clinical: torch.Tensor) -> torch.Tensor:
        ecg_features = self.ecg_cnn(ecg).squeeze(-1)       # (batch, 64)
        clinical_features = self.clinical_branch(clinical)  # (batch, 64)
        combined = torch.cat([ecg_features, clinical_features], dim=1)  # (batch, 128)
        return self.head(combined)

When Feature Engineering Still Wins

1. Small labelled dataset:
   With 500 examples, a logistic regression on 20 engineered features
   is more statistically tractable than a neural network
   
2. Features encode hard constraints:
   INR_out_of_range (boolean) encodes clinical knowledge that a DL model
   might not learn from data if the relationship is rare in training

3. Regulatory approval:
   FDA approves a model with interpretable features
   "INR > 3.5 → high bleeding risk" is defensible
   "Layer 7 activation 0.87" is not

4. Distribution shift detection:
   Feature values can be monitored directly for drift
   "Average age in deployment = 70 vs 65 in training" is actionable
   Embedding drift is harder to detect and interpret

Transfer Learning: Best of Both Worlds

Pre-trained deep learning models have already performed feature learning
on millions of examples.

Fine-tuning on your smaller dataset:
  Early layers: frozen (keep learned general features)
  Later layers: updated (adapt to your specific task)

Effectively: DL feature learning from massive data
             + small dataset adaptation
             
Examples:
  BERT pre-trained on Wikipedia → fine-tune on clinical notes (n=5000)
  ResNet pre-trained on ImageNet → fine-tune on chest X-rays (n=2000)
  BiomedBERT pre-trained on PubMed → fine-tune on discharge summaries

Interview Answer

"Feature engineering uses domain expertise to transform raw data into informative inputs — explicit, interpretable, but time-consuming and potentially missing non-obvious patterns. Deep learning learns features automatically from raw inputs — scalable to images and text, but requires more data, compute, and produces opaque representations. For clinical tabular data (structured EHR features), feature engineering plus XGBoost often wins because clinical domain knowledge is valuable, data is limited, and interpretability matters. For raw signals (ECG, X-ray, clinical notes), deep learning is necessary because manual feature engineering is infeasible at scale. Hybrid models combine both: hand-crafted tabular features fused with DL-extracted signal features, getting the best of interpretability and raw signal modelling."

When to Use Deep Learning vs Classical ML

Next Lesson

Compute, Data, and Scale Requirements for DL