Deep Learning for AI Interviews · Lesson 7 of 56
Input, Hidden, and Output Layers Explained
What a Layer Is
A layer is a collection of neurons that receive the same inputs and produce a set of outputs in parallel:
Input layer:
Not a layer of neurons — just the raw input features
Passes data to the first hidden layer
Hidden layers:
Intermediate layers between input and output
Each neuron computes: output = activation(w · x + b)
"Hidden" because their outputs are not directly observed
Output layer:
Produces the final prediction
Activation depends on task:
Regression: linear (no activation)
Binary classification: sigmoid
Multi-class: softmax
Multi-label: sigmoid per outputLayer Shapes
A layer with n_in inputs and n_out outputs:
Weight matrix W: shape (n_out, n_in) — one weight per input per neuron
Bias vector b: shape (n_out,) — one bias per neuron
Forward pass for a batch of m examples:
Input X: shape (m, n_in)
Output Z = X @ W.T + b: shape (m, n_out) [linear]
Output A = activation(Z): shape (m, n_out) [after activation]
Parameter count for this layer:
weights: n_out × n_in
biases: n_out
total: n_out × (n_in + 1)Building a Network in PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
# Option 1: Sequential (for simple stacks)
model_simple = nn.Sequential(
nn.Linear(10, 64), # input → hidden1
nn.ReLU(),
nn.Linear(64, 32), # hidden1 → hidden2
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(32, 1), # hidden2 → output
nn.Sigmoid(), # for binary classification
)
# Option 2: Module class (for complex architectures)
class ClinicalMLP(nn.Module):
def __init__(self, n_features: int, n_classes: int = 1):
super().__init__()
self.input_norm = nn.BatchNorm1d(n_features)
self.fc1 = nn.Linear(n_features, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64, n_classes)
self.dropout = nn.Dropout(0.3)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.input_norm(x) # normalise inputs
x = F.relu(self.fc1(x)) # hidden layer 1
x = self.dropout(x)
x = F.relu(self.fc2(x)) # hidden layer 2
x = self.fc3(x) # output (logit)
return torch.sigmoid(x) # probability
# Count parameters
model = ClinicalMLP(n_features=50, n_classes=1)
n_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Trainable parameters: {n_params:,}")
# 50×128 + 128 + 128×64 + 64 + 64×1 + 1 = 14,977
# Forward pass
x_batch = torch.randn(32, 50) # batch of 32 examples with 50 features
output = model(x_batch)
print(f"Output shape: {output.shape}") # (32, 1)Common Layer Types
nn.Linear (fully connected):
Every input connected to every neuron
Parameter count: n_out × (n_in + 1)
Use for: tabular data, hidden layers in MLP
nn.Conv2d (2D convolution):
Each neuron sees a local patch of the input
Fewer parameters than fully connected (shared weights)
Use for: images, 2D spatial data
nn.Conv1d (1D convolution):
Sliding window over a sequence
Use for: time series, audio, ECG signals
nn.LSTM / nn.GRU:
Recurrent layers with memory
Use for: sequence data where order matters
nn.MultiheadAttention:
The Transformer building block
Use for: NLP, long-range dependencies
nn.BatchNorm1d / nn.LayerNorm:
Normalisation layers — not neurons, but transform activations
Stabilise training
nn.Dropout:
Randomly zeroes outputs during training
Regularisation — reduces overfittingInformation Flow Example
Clinical prediction: "Will this patient be readmitted within 30 days?"
Features: [age=65, INR=2.8, n_meds=8, systolic_BP=140, ...] (50 features)
Layer 0 (input): 50 values
↓ nn.Linear(50, 128) + BatchNorm + ReLU
Layer 1 (hidden): 128 activations
↓ nn.Dropout(0.3)
↓ nn.Linear(128, 64) + ReLU
Layer 2 (hidden): 64 activations
↓ nn.Linear(64, 1)
Layer 3 (output): 1 logit
↓ nn.Sigmoid()
Output: probability ∈ [0, 1] → 0.23 (23% readmission risk)
What each layer learns (conceptually):
Layer 1: combinations of raw features (e.g., "elderly + high INR + many meds")
Layer 2: higher-order patterns (e.g., "high-risk patient profile")
Layer 3: final risk score weightingInterview Answer
"A neural network is a stack of layers, where each layer applies a linear transformation (W·x + b) followed by a non-linear activation. The linear part lets each neuron compute a weighted combination of inputs; the activation function enables the network to represent non-linear relationships. Hidden layers transform the representation at each step — early layers detect simple patterns, deeper layers combine them into complex features. The output layer's activation depends on the task: sigmoid for binary, softmax for multi-class, linear for regression. In PyTorch, layers are defined as nn.Module subclasses; composing them with nn.Sequential or in a custom forward() method builds the computation graph for automatic differentiation."