System Design · Lesson 5 of 26
Load Balancing — Distribute Traffic Without Thinking About It
A load balancer is the front door of any scaled system. It sits between clients and servers and distributes incoming requests so no single server bears the entire load. Simple concept — complex reality.
What a Load Balancer Does
Without load balancer:
Client → Server (single point of failure, capacity ceiling)
With load balancer:
┌─────────────────────┐
Client ──────────→ │ Load Balancer │
└──────────┬──────────┘
┌───────┼───────┐
┌────▼──┐ ┌──▼───┐ ┌─▼─────┐
│ App 1 │ │App 2 │ │ App 3 │
└───────┘ └──────┘ └───────┘A load balancer does more than just distribute traffic:
- Routes requests to healthy instances
- Health checks — removes unhealthy instances from the pool
- TLS termination — decrypts HTTPS at the edge so backend servers handle plain HTTP
- Connection management — maintains persistent connections to backends
- Observability — request logging, metrics, request tracing headers
L4 vs L7 Load Balancing
This is one of the most important distinctions. It's about which layer of the network stack the load balancer operates at.
L4 — Transport Layer (TCP/UDP)
The load balancer sees IP addresses and port numbers. It does not look at the content of the request.
L4 sees:
Source IP: 192.168.1.50
Dest IP: 10.0.0.1
Dest Port: 443
Routing decision: based only on IP/portCharacteristics:
- Very fast (minimal processing per packet)
- No understanding of HTTP, WebSocket, gRPC
- Cannot route based on URL paths or headers
- Lower CPU overhead
- Works with any TCP/UDP protocol
Use when: Raw throughput is critical, or when you're using non-HTTP protocols (database connections, custom TCP protocols).
L7 — Application Layer (HTTP)
The load balancer understands HTTP. It can inspect URLs, headers, cookies, and request bodies.
L7 sees:
GET /api/users/123 HTTP/2
Host: api.example.com
Authorization: Bearer eyJ...
Cookie: session=abc123
Routing decision: based on URL, headers, cookiesCharacteristics:
- Slower per request (parses HTTP)
- Content-based routing (route
/api/*to API servers,/static/*to CDN) - Can inject/rewrite headers
- Can terminate TLS and re-encrypt
- Sticky sessions via cookies
- More observability (knows request paths, status codes)
Use when: HTTP/HTTPS traffic (which is most web applications).
Load Balancing Algorithms
Round Robin
Requests are distributed in order: Server 1, Server 2, Server 3, Server 1, Server 2...
Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (wraps around)Pros: Simple. Uniform distribution (assuming equal server capacity).
Cons: Doesn't account for server load. A slow server may accumulate requests.
Use when: All servers have equal capacity and requests are roughly equal in processing time.
Weighted Round Robin
Servers with higher capacity get proportionally more requests.
Server 1 weight=3, Server 2 weight=1
Request 1 → Server 1
Request 2 → Server 1
Request 3 → Server 1
Request 4 → Server 2
Request 5 → Server 1
...Use when: Servers have different hardware specs, or you're doing a gradual canary rollout (new version starts at weight=1, gradually increased).
Least Connections
New request goes to the server with the fewest active connections.
Server 1: 50 active connections
Server 2: 10 active connections ← next request goes here
Server 3: 35 active connectionsPros: Better for long-running requests. Naturally handles slow servers.
Use when: Long-lived connections (WebSockets, file uploads), or mixed workloads where some requests are much slower.
IP Hash (Session Stickiness via Hashing)
The client's IP is hashed to always map to the same server.
hash(client_ip) % num_servers = server_index
192.168.1.50 → hash → mod 3 = 1 → always Server 2Pros: Session affinity without cookies. Consistent routing for the same client.
Cons: Uneven distribution if many clients come from the same IP (e.g., corporate NAT). If a server dies, all hashed clients re-route and lose state.
Random
Randomly select a server for each request.
Use when: Servers are truly identical and you want simplicity without round-robin's state management. Converges to even distribution at scale due to the law of large numbers.
Sticky Sessions
Sticky sessions ensure that a client always hits the same backend server.
First request:
Client → LB → Server A
LB sets cookie: SERVERID=server-a
Subsequent requests:
Client → LB (reads SERVERID cookie) → Server A (always)Why sticky sessions exist: Stateful applications that store session data in memory need the same server to handle all requests from the same user.
Why sticky sessions hurt scalability:
- If Server A crashes, all its sticky clients lose their sessions
- Load becomes uneven as users "pile up" on certain servers
- Adding new servers doesn't immediately rebalance existing clients
- Prevents true horizontal scaling
The fix: eliminate the need for sticky sessions by externalizing session state to Redis. Then any server can handle any request.
Without Redis (needs sticky sessions):
Server A: {session:abc → {cart: [item1, item2]}}
Server B: {session:def → {cart: [item3]}}
With Redis (no sticky sessions needed):
Redis: {
session:abc → {cart: [item1, item2]},
session:def → {cart: [item3]}
}
Any server reads from Redis on every requestHealth Checks
Load balancers must detect unhealthy instances and stop routing to them.
Active Health Checks
The load balancer periodically sends test requests to each backend.
Every 10 seconds:
LB → Server 1: GET /health → 200 OK ✓
LB → Server 2: GET /health → 200 OK ✓
LB → Server 3: GET /health → timeout ✗ (remove from pool)Your /health endpoint should check:
- Application is running
- Database connection is alive
- Key dependencies are reachable
- Not check things that could cause a cascade (don't call external APIs)
app.MapHealthChecks("/health", new HealthCheckOptions
{
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
}).AllowAnonymous();Passive Health Checks
The load balancer monitors real traffic responses. If a server returns 5xx errors, it's marked unhealthy.
More responsive (no polling interval), but requires real user traffic to detect failures.
Most production LBs use both: active checks for basic liveness, passive checks for quality degradation.
Software Load Balancers
NGINX
The most widely used. Can be both a web server and a reverse proxy/load balancer.
upstream api_servers {
least_conn; # algorithm
server app1.internal:8080 weight=3;
server app2.internal:8080 weight=1;
server app3.internal:8080 backup; # only used if others fail
keepalive 32; # connection pool to backends
}
server {
listen 443 ssl;
location /api/ {
proxy_pass http://api_servers;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}HAProxy
High Availability Proxy — often preferred for TCP load balancing and high-performance HTTP.
frontend http-in
bind *:80
default_backend web-servers
backend web-servers
balance roundrobin
option httpchk GET /health
server web1 10.0.0.1:8080 check
server web2 10.0.0.2:8080 check
server web3 10.0.0.3:8080 checkCloud Load Balancers
- AWS Application Load Balancer (ALB) — L7, HTTP/HTTPS/WebSocket/gRPC
- AWS Network Load Balancer (NLB) — L4, ultra-high throughput, static IPs
- Azure Application Gateway — L7 with WAF, SSL offload
- Azure Load Balancer — L4, internal and external
DNS Load Balancing
DNS returns multiple IP addresses for the same hostname. Clients connect to one of the returned IPs.
DNS query: api.example.com
DNS response:
api.example.com → 10.0.0.1
api.example.com → 10.0.0.2
api.example.com → 10.0.0.3
Client picks one (usually the first, sometimes round-robin)Pros: Simple. No single load balancer. Can route to different geographies.
Cons: DNS TTL means clients cache the IP — you can't quickly remove a failed server. No health checks (DNS doesn't know if an IP is alive). Client-side load balancing is unpredictable.
Use for: Geographic routing (different DNS responses for US vs EU clients), or as a first hop before regional load balancers.
Global Load Balancing with Anycast
Anycast assigns the same IP address to servers in multiple geographic locations. BGP routing directs clients to the nearest server.
IP: 1.2.3.4 exists at:
- New York datacenter
- London datacenter
- Tokyo datacenter
Client in London → BGP routes to London (shortest AS path)
Client in New York → BGP routes to New York
Client in Seoul → BGP routes to TokyoCloudflare, AWS CloudFront, and Google Cloud CDN use anycast for their edge networks. This is how CDNs route you to the nearest edge with a single IP.
Database Load Balancing
Application-level load balancers don't work well for databases. Databases have state.
PgBouncer — PostgreSQL Connection Pooling
PostgreSQL connections are expensive (each consumes ~5-10MB RAM). PgBouncer acts as a proxy that manages a pool of connections.
Without PgBouncer:
1000 app instances × 10 DB connections each = 10,000 PG connections
PostgreSQL struggles above ~500-1000 connections
With PgBouncer:
1000 app instances → PgBouncer (pool of 100 PG connections) → PostgreSQL
PgBouncer multiplexes app requests over the pool# pgbouncer.ini
[pgbouncer]
pool_mode = transaction # connection released after each transaction
max_client_conn = 1000 # app connections PgBouncer accepts
default_pool_size = 100 # connections to PostgreSQLLoad Balancer vs API Gateway
These are frequently confused. They're different tools:
| | Load Balancer | API Gateway | |---|---|---| | Primary purpose | Distribute traffic | Manage API access | | Routing | By IP, URL prefix | By route, method, version | | Authentication | No | Yes (JWT, API keys, OAuth) | | Rate limiting | No | Yes | | Request transformation | No | Yes | | Service aggregation | No | Yes (BFF pattern) | | Protocol translation | Limited | Yes (REST → gRPC) | | Examples | NGINX, HAProxy, AWS ALB | Kong, YARP, AWS API Gateway |
In practice: Production systems often have both — a load balancer for raw traffic distribution and a layer 7 API gateway for authentication, routing, and rate limiting.
Key Takeaways
- L4 load balancing is fast and protocol-agnostic. L7 understands HTTP and enables content-based routing.
- Round robin for equal servers. Least connections for long-running workloads. IP hash for session affinity without cookies.
- Sticky sessions are a smell — externalize session state to Redis and go stateless instead.
- Health checks: active (LB polls
/health) + passive (monitor real traffic errors). - NGINX and HAProxy dominate software LBs. Cloud LBs (ALB, Azure App Gateway) are managed equivalents.
- DNS load balancing works for geographic routing. Anycast routes by network proximity.
- PgBouncer solves database connection exhaustion.
- A load balancer distributes traffic. An API gateway manages API access, auth, and rate limiting. Use both.