Publish/Subscribe

Publish/Subscribe (Pub/Sub) is the pattern that makes event-driven architecture possible at scale. It is simple in concept — publishers send messages to a topic; subscribers receive them — but its power comes from the decoupling it creates and the topology flexibility it enables. This lesson covers how Pub/Sub works, the delivery guarantees it can offer, the filter and routing capabilities that make it practical, and the operational realities of running Pub/Sub systems in production.

What Pub/Sub Is

Publish/Subscribe is a messaging pattern where:

Publishers send messages to a named topic (a logical channel) without knowing who will receive them
Subscribers register interest in a topic and receive all messages published to it
The broker manages the topic, stores messages, and delivers them to all subscribers

Publisher A ──►
Publisher B ──► [  order.placed  topic  ] ──► Subscriber 1 (Inventory)
                                          ──► Subscriber 2 (Billing)
                                          ──► Subscriber 3 (Analytics)
                                          ──► Subscriber 4 (Notification)

The defining property: each message is delivered to every subscriber (fan-out). This is different from a queue, where each message is delivered to exactly one consumer.

Pub/Sub vs. Point-to-Point Queue

These two patterns are often confused because they both use a broker and both involve async message delivery. They solve different problems:

| Property | Point-to-Point Queue | Pub/Sub Topic | |----------|----------------------|---------------| | Delivery | One consumer per message | All subscribers per message | | Purpose | Work distribution / load balancing | Event broadcasting / fan-out | | Consumer awareness | Consumers share a queue | Subscribers are independent | | Adding a consumer | Shares the load | Receives a full copy of all messages | | Typical use | Task queues, job processing | Event notifications, data pipelines |

Queue: 1 message → 1 consumer (work is shared)
Topic: 1 message → N consumers (each gets their own copy)

Topics and Subscriptions

Topic

A topic is a named logical channel. Publishers do not know who subscribes to a topic — they simply write to it.

Topic naming conventions matter for discoverability and routing:

com.organisation.domain.entity.event-type
com.systemforge.orders.order.placed
com.systemforge.payments.payment.failed
com.systemforge.inventory.stock.low

Subscription

A subscription is a consumer's named interest in a topic. Each subscription maintains its own independent cursor (position) in the message stream — one subscriber falling behind does not affect others.

topic: order.placed
  ├── subscription: inventory-service     (cursor at message 1205)
  ├── subscription: billing-service       (cursor at message 1198)
  └── subscription: analytics-pipeline   (cursor at message 900)

Each subscription processes the message stream at its own pace. The analytics pipeline is behind — that is fine; it does not affect inventory or billing.

Durable vs. Non-Durable Subscriptions

Durable subscriptions persist even when the subscriber is offline. Messages that arrive while the subscriber is down are stored and delivered when it reconnects.

Non-durable subscriptions exist only while the subscriber is connected. Messages that arrive when the subscriber is offline are lost.

Use durable subscriptions for any production integration where message loss is not acceptable.

Fan-Out

Fan-out is the mechanism by which a single published message is delivered to multiple subscriptions independently:

message: OrderPlaced { orderId: 1234 }
  │
  ├──► inventory subscription → Inventory Service processes the message
  ├──► billing subscription   → Billing Service processes the message
  └──► notify subscription    → Notification Service processes the message

Fan-out is the primary value proposition of Pub/Sub. It means:

The publisher needs to make one write to publish to all consumers
Adding a new consumer requires zero changes to the publisher or any existing consumer
Each consumer's failure is isolated — if Billing fails, Inventory continues processing

Message Filtering

Without filtering, every subscriber receives every message on the topic — including messages it does not care about. Filtering reduces unnecessary processing and network transfer.

Subscription-Level Filters

Most Pub/Sub brokers support filter expressions evaluated at the broker before delivery:

Azure Service Bus Topics — SQL filter:

SQL

-- Only deliver messages where region = 'EU'
region = 'EU'

-- Only deliver high-value orders
amount > 1000 AND currency = 'GBP'

AWS SNS — filter policy:

JSON

{
  "eventType": ["order.placed", "order.updated"],
  "region": ["EU", "UK"]
}

Google Cloud Pub/Sub — message filtering:

attributes.region = "EU" AND attributes.priority = "high"

Filters are evaluated against message attributes (metadata in the message header), not the message body. Keep filter-relevant data in attributes so the broker does not have to deserialize the payload for every message.

Topic-Per-Event-Type vs. Filtering

Two strategies for managing multiple event types:

Separate topics per event type:

order.placed   ── subscribers for placed events
order.cancelled ── subscribers for cancelled events
order.shipped   ── subscribers for shipped events

Single topic with filtering:

orders topic ── inventory subscription (filter: type = 'placed')
             ── billing subscription  (filter: type = 'placed' OR 'cancelled')
             ── audit subscription    (no filter — receives all events)

Guidance:

Separate topics when event volume is high and routing would be expensive
Filtering when the events are naturally grouped and most consumers need multiple types

Competing Consumers (Queue + Pub/Sub Combined)

Pub/Sub fan-out and queue-based competing consumers are complementary:

topic: order.placed
  ├── subscription: inventory-workers
  │     ├── consumer instance 1  ─┐
  │     ├── consumer instance 2  ─┼── competing consumers sharing the subscription
  │     └── consumer instance 3  ─┘
  └── subscription: billing-workers
        ├── consumer instance 1
        └── consumer instance 2

Within each subscription, competing consumers share the workload — each message is delivered to exactly one consumer instance. This gives you both fan-out (across subscriptions) and horizontal scaling (within a subscription).

This pattern is native to:

Azure Service Bus Topics — subscription = queue for competing consumers
Apache Kafka — consumer group = competing consumers within the same partition
AWS SNS + SQS — SNS fan-out to multiple SQS queues; each SQS queue has competing consumers

Delivery Guarantees

At-Least-Once Delivery

The standard guarantee in most Pub/Sub systems: the broker will deliver the message to each subscription at least once. In failure scenarios (consumer crashes after processing but before acknowledging), the message may be redelivered.

Consequence: every consumer must be idempotent — processing the same message twice must have the same effect as processing it once.

Idempotency techniques:

Store processed message IDs in a deduplication table with a TTL; check before processing
Use conditional database writes (upsert / optimistic concurrency)
Design state transitions to be re-entrant

At-Most-Once Delivery

The broker delivers the message once and does not redeliver if delivery fails. Possible message loss. Appropriate only for high-volume, low-value events where occasional loss is acceptable (telemetry, metrics).

Exactly-Once Delivery

The hardest guarantee: delivered exactly once, no duplicates, no loss. Truly exact-once requires coordination between the broker and the consumer's storage system (a distributed transaction). Kafka supports this with idempotent producers and transactional APIs, but at a performance cost.

Practical advice: Build idempotent consumers rather than relying on exactly-once broker guarantees. Idempotency is more portable and more resilient than broker-level deduplication.

Message Ordering

Ordering Within a Topic

Most Pub/Sub systems do not guarantee global message ordering across all publishers. Within a single publisher, messages are usually ordered, but across multiple publishers they interleave unpredictably.

Kafka: ordering guaranteed within a partition. Messages for the same entity (same order ID, same customer ID) routed to the same partition via the partition key will arrive in order to any consumer in the same consumer group.

Azure Service Bus: ordering within a session (messages with the same SessionId are delivered in order to one consumer at a time).

Google Cloud Pub/Sub: ordering guaranteed when using an ordering key.

Designing for Out-of-Order Delivery

When ordering cannot be guaranteed, design consumers to handle it:

Include a sequence number or event timestamp in the message
Use a sequence buffer: hold messages until you have received all preceding messages before processing
Apply the last-write-wins rule for state updates when re-ordering is acceptable

Push vs. Pull Delivery

Push Delivery

The broker delivers messages to consumers by calling a consumer endpoint (webhook/HTTP). Consumers do not poll.

Broker ──► POST /webhook → Consumer Service
           (broker drives the delivery rate)

Advantages: Simple consumer — just expose an HTTP endpoint. Broker handles timing and retries.
Disadvantages: Consumer must be publicly reachable. Broker controls delivery rate — consumer must handle backpressure.

Pull Delivery

Consumers poll the broker to retrieve messages at their own pace.

Consumer Service ──► GET /messages?maxCount=10 → Broker
                 ◄── [batch of messages] ◄──

Advantages: Consumer controls the rate. Works behind firewalls. Natural backpressure — the consumer fetches when ready.
Disadvantages: Consumer must actively poll. Latency depends on poll frequency.

Most enterprise messaging systems support pull. Many also support push for webhook-style integrations (AWS SNS → HTTP endpoint, Azure Event Grid, Google Cloud Pub/Sub push subscriptions).

Retention and Replay

Message Retention

How long does the broker keep messages after delivery?

| System | Default Retention | Maximum Retention | |--------|-------------------|-------------------| | Apache Kafka | Configurable (time or size) | Indefinite (with tiered storage) | | Azure Service Bus | 1 day | 14 days | | Azure Event Hub | 1 day | 90 days (with Capture) | | AWS SQS | 4 days | 14 days | | Google Cloud Pub/Sub | 7 days | 7 days |

Kafka's indefinite retention is its killer feature for integration: consumers can replay the entire event history at any time. This enables:

Rebuilding a failed consumer's state from scratch
Onboarding a new consumer that needs historical data
Replaying events after a bug fix to re-process affected records

Replay from a Subscription

For brokers without infinite retention, implement your own replay capability:

Archive all messages to cold storage (Azure Blob Storage, S3, GCS) as they are published
Build a replay service that reads from cold storage and republishes to the topic

Dead Letter Queue in Pub/Sub

Each subscription should have a Dead Letter Queue (DLQ) — a separate channel for messages that failed to process after exhausting retries.

subscription: inventory-service
  └── On message processing failure × N:
      Route to: inventory-service.DLQ

DLQ configuration per subscription is critical — a message that poisons one subscription's DLQ must not affect other subscriptions' processing.

DLQ operations:

Alert immediately when any message lands in a DLQ
Investigate root cause before resubmitting
Fix the root cause (data, code, or configuration)
Resubmit in controlled batches, monitoring for re-failure

Platform Comparison

| Platform | Model | Retention | Ordering | Filtering | Best For | |----------|-------|-----------|----------|-----------|----------| | Apache Kafka | Partitioned log | Indefinite | Per-partition | Consumer-side | High-throughput event streaming, replay | | Azure Service Bus Topics | Subscriptions | Up to 14 days | Sessions | SQL filter | Enterprise messaging, ordered workflows | | Azure Event Grid | Push-based | 24 hours | No | Event type, subject | Serverless triggers, cloud event routing | | AWS SNS + SQS | Fan-out to queues | 14 days (SQS) | FIFO queues | Filter policy | AWS-native fan-out | | Google Cloud Pub/Sub | Subscriptions | 7 days | Ordering keys | Attribute filter | GCP-native, push and pull | | RabbitMQ | Exchanges + queues | Configurable | Per-queue | Routing keys, headers | Flexible routing, on-premises |

Pub/Sub Anti-Patterns to Avoid

Chatty topics — publishing every tiny state change (field updates, cursor movements) as an event. The topic becomes high-volume noise, and consumers struggle to filter signal from it.

Overloaded topic — putting unrelated event types on the same topic to "keep it simple." Consumers receive irrelevant events; filtering becomes complex.

Consumer groups that do too much — one consumer group handling dozens of different event types. Split into focused consumers per domain.

Missing idempotency — trusting at-least-once delivery to mean "exactly once." It does not. Always build idempotent consumers.

No DLQ configuration — a consumer that fails continuously will hold up message processing or silently drop messages without a DLQ.

Unbounded subscriptions — creating subscriptions for testing or experiments and never cleaning them up. Orphaned subscriptions consume broker resources and retention space indefinitely.

Lesson Summary

Pub/Sub delivers each published message to every subscriber independently. Fan-out is the defining property that makes it different from a point-to-point queue.
Durable subscriptions maintain a per-subscriber cursor so each consumer progresses at its own pace without affecting others.
Filtering at the subscription level keeps consumers from processing irrelevant messages — use message attributes, not payload parsing, for filters.
Competing consumers (multiple instances sharing a subscription) provide horizontal scaling within the fan-out topology.
At-least-once delivery is the standard — build idempotent consumers. Do not rely on broker-level deduplication.
Kafka's event log with indefinite retention enables replay and late-joining consumers — the most powerful Pub/Sub capability for integration use cases.

Next: Messaging Systems