Feature Engineering vs Deep Learning
How manual feature engineering differs from deep learning's automatic feature learning, when each approach is better, and how they can be combined.
What Feature Engineering Is
Feature engineering is the process of using domain knowledge to transform raw data into informative inputs for a model.
Raw data: timestamps of patient events (admissions, labs, medications)
Manually engineered features:
- days_since_last_admission
- number_of_medications
- INR_trend (last 3 readings: increasing/stable/decreasing)
- age_category (under_40, 40–65, over_65)
- has_AF (boolean from diagnosis codes)
- creatinine_to_baseline_ratio
The model (logistic regression, XGBoost) then learns
which combination of these features predicts the outcome.What Deep Learning Does Instead
Instead of asking humans to define informative features,
deep learning learns to extract them from raw data.
Raw ECG signal → CNN → automatically discovers:
- Local wave shapes (P wave, QRS complex, T wave)
- Rhythm patterns (regular vs irregular)
- Amplitude features
- Temporal correlations
The model didn't need a human to say "measure QRS duration."
It discovered that QRS-related patterns matter from training data.
This is called representation learning or feature learning.
The early layers of a deep network learn features;
the later layers combine them into the prediction.Pros and Cons of Each
Feature Engineering:
Pros:
✓ Works with small data (features reduce dimensionality)
✓ Faster to iterate (model training is cheap)
✓ Interpretable features — you know what each one means
✓ Domain knowledge enforced (impossible values prevented)
✓ Models train on CPU
Cons:
✗ Time-consuming — requires domain expert involvement
✗ May miss non-obvious patterns that data could reveal
✗ Features may be brittle (poorly generalise to new sites)
✗ Hard to scale to very high-dimensional inputs (e.g., images)
Deep Learning:
Pros:
✓ Scales to raw inputs (images, text, signals)
✓ Can discover unexpected patterns in data
✓ Transfer learning reuses features learned on large datasets
✓ End-to-end optimisation (features and prediction jointly optimised)
Cons:
✗ Requires large labelled datasets (or pre-training)
✗ GPU required for practical training
✗ Learned features are not human-interpretable
✗ Harder to debug and validateHybrid Approaches
Feature engineering and deep learning are not mutually exclusive:
import torch
import torch.nn as nn
import numpy as np
class HybridClinicalModel(nn.Module):
"""
Combines:
- Hand-crafted clinical features (tabular)
- Automatic feature learning from ECG signal (1D CNN)
"""
def __init__(self, n_tabular_features: int, ecg_length: int):
super().__init__()
# Branch 1: deep learning on raw ECG
self.ecg_cnn = nn.Sequential(
nn.Conv1d(12, 32, kernel_size=7, padding=3), # 12-lead ECG
nn.ReLU(),
nn.MaxPool1d(4),
nn.Conv1d(32, 64, kernel_size=5, padding=2),
nn.ReLU(),
nn.AdaptiveAvgPool1d(1), # global average pool
) # output: (batch, 64)
# Branch 2: clinical features (engineered)
self.clinical_branch = nn.Sequential(
nn.Linear(n_tabular_features, 64),
nn.ReLU(),
nn.Dropout(0.3),
) # output: (batch, 64)
# Fusion: combine both representations
self.head = nn.Sequential(
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 1),
nn.Sigmoid(),
)
def forward(self, ecg: torch.Tensor, clinical: torch.Tensor) -> torch.Tensor:
ecg_features = self.ecg_cnn(ecg).squeeze(-1) # (batch, 64)
clinical_features = self.clinical_branch(clinical) # (batch, 64)
combined = torch.cat([ecg_features, clinical_features], dim=1) # (batch, 128)
return self.head(combined)When Feature Engineering Still Wins
1. Small labelled dataset:
With 500 examples, a logistic regression on 20 engineered features
is more statistically tractable than a neural network
2. Features encode hard constraints:
INR_out_of_range (boolean) encodes clinical knowledge that a DL model
might not learn from data if the relationship is rare in training
3. Regulatory approval:
FDA approves a model with interpretable features
"INR > 3.5 → high bleeding risk" is defensible
"Layer 7 activation 0.87" is not
4. Distribution shift detection:
Feature values can be monitored directly for drift
"Average age in deployment = 70 vs 65 in training" is actionable
Embedding drift is harder to detect and interpretTransfer Learning: Best of Both Worlds
Pre-trained deep learning models have already performed feature learning
on millions of examples.
Fine-tuning on your smaller dataset:
Early layers: frozen (keep learned general features)
Later layers: updated (adapt to your specific task)
Effectively: DL feature learning from massive data
+ small dataset adaptation
Examples:
BERT pre-trained on Wikipedia → fine-tune on clinical notes (n=5000)
ResNet pre-trained on ImageNet → fine-tune on chest X-rays (n=2000)
BiomedBERT pre-trained on PubMed → fine-tune on discharge summariesInterview Answer
"Feature engineering uses domain expertise to transform raw data into informative inputs — explicit, interpretable, but time-consuming and potentially missing non-obvious patterns. Deep learning learns features automatically from raw inputs — scalable to images and text, but requires more data, compute, and produces opaque representations. For clinical tabular data (structured EHR features), feature engineering plus XGBoost often wins because clinical domain knowledge is valuable, data is limited, and interpretability matters. For raw signals (ECG, X-ray, clinical notes), deep learning is necessary because manual feature engineering is infeasible at scale. Hybrid models combine both: hand-crafted tabular features fused with DL-extracted signal features, getting the best of interpretability and raw signal modelling."
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.