Neural Network Layers Explained

What a Layer Is

A layer is a collection of neurons that receive the same inputs and produce a set of outputs in parallel:

Input layer:
  Not a layer of neurons — just the raw input features
  Passes data to the first hidden layer

Hidden layers:
  Intermediate layers between input and output
  Each neuron computes: output = activation(w · x + b)
  "Hidden" because their outputs are not directly observed

Output layer:
  Produces the final prediction
  Activation depends on task:
    Regression:           linear (no activation)
    Binary classification: sigmoid
    Multi-class:          softmax
    Multi-label:          sigmoid per output

Layer Shapes

A layer with n_in inputs and n_out outputs:
  Weight matrix W: shape (n_out, n_in)   — one weight per input per neuron
  Bias vector b:   shape (n_out,)         — one bias per neuron

Forward pass for a batch of m examples:
  Input X: shape (m, n_in)
  Output Z = X @ W.T + b:  shape (m, n_out)   [linear]
  Output A = activation(Z): shape (m, n_out)   [after activation]

Parameter count for this layer:
  weights: n_out × n_in
  biases:  n_out
  total:   n_out × (n_in + 1)

Building a Network in PyTorch

Python

import torch
import torch.nn as nn
import torch.nn.functional as F

# Option 1: Sequential (for simple stacks)
model_simple = nn.Sequential(
    nn.Linear(10, 64),    # input → hidden1
    nn.ReLU(),
    nn.Linear(64, 32),    # hidden1 → hidden2
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(32, 1),     # hidden2 → output
    nn.Sigmoid(),         # for binary classification
)

# Option 2: Module class (for complex architectures)
class ClinicalMLP(nn.Module):
    def __init__(self, n_features: int, n_classes: int = 1):
        super().__init__()
        self.input_norm = nn.BatchNorm1d(n_features)
        self.fc1 = nn.Linear(n_features, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, n_classes)
        self.dropout = nn.Dropout(0.3)
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.input_norm(x)         # normalise inputs
        x = F.relu(self.fc1(x))        # hidden layer 1
        x = self.dropout(x)
        x = F.relu(self.fc2(x))        # hidden layer 2
        x = self.fc3(x)                 # output (logit)
        return torch.sigmoid(x)        # probability

# Count parameters
model = ClinicalMLP(n_features=50, n_classes=1)
n_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Trainable parameters: {n_params:,}")
# 50×128 + 128 + 128×64 + 64 + 64×1 + 1 = 14,977

# Forward pass
x_batch = torch.randn(32, 50)   # batch of 32 examples with 50 features
output = model(x_batch)
print(f"Output shape: {output.shape}")   # (32, 1)

Common Layer Types

nn.Linear (fully connected):
  Every input connected to every neuron
  Parameter count: n_out × (n_in + 1)
  Use for: tabular data, hidden layers in MLP

nn.Conv2d (2D convolution):
  Each neuron sees a local patch of the input
  Fewer parameters than fully connected (shared weights)
  Use for: images, 2D spatial data

nn.Conv1d (1D convolution):
  Sliding window over a sequence
  Use for: time series, audio, ECG signals

nn.LSTM / nn.GRU:
  Recurrent layers with memory
  Use for: sequence data where order matters

nn.MultiheadAttention:
  The Transformer building block
  Use for: NLP, long-range dependencies

nn.BatchNorm1d / nn.LayerNorm:
  Normalisation layers — not neurons, but transform activations
  Stabilise training

nn.Dropout:
  Randomly zeroes outputs during training
  Regularisation — reduces overfitting

Information Flow Example

Clinical prediction: "Will this patient be readmitted within 30 days?"
Features: [age=65, INR=2.8, n_meds=8, systolic_BP=140, ...]  (50 features)

Layer 0 (input):      50 values
         ↓  nn.Linear(50, 128) + BatchNorm + ReLU
Layer 1 (hidden):     128 activations
         ↓  nn.Dropout(0.3)
         ↓  nn.Linear(128, 64) + ReLU
Layer 2 (hidden):     64 activations
         ↓  nn.Linear(64, 1)
Layer 3 (output):     1 logit
         ↓  nn.Sigmoid()
Output:               probability ∈ [0, 1]  →  0.23 (23% readmission risk)

What each layer learns (conceptually):
  Layer 1: combinations of raw features (e.g., "elderly + high INR + many meds")
  Layer 2: higher-order patterns (e.g., "high-risk patient profile")
  Layer 3: final risk score weighting

Interview Answer

"A neural network is a stack of layers, where each layer applies a linear transformation (W·x + b) followed by a non-linear activation. The linear part lets each neuron compute a weighted combination of inputs; the activation function enables the network to represent non-linear relationships. Hidden layers transform the representation at each step — early layers detect simple patterns, deeper layers combine them into complex features. The output layer's activation depends on the task: sigmoid for binary, softmax for multi-class, linear for regression. In PyTorch, layers are defined as nn.Module subclasses; composing them with nn.Sequential or in a custom forward() method builds the computation graph for automatic differentiation."

Neural Network Layers Explained

What a Layer Is

Layer Shapes

Building a Network in PyTorch

Common Layer Types

Information Flow Example

Interview Answer

Enjoyed this article?

Leave a comment