Distributed Systems Patterns: Saga, Outbox, Circuit Breaker & More
Deep-dive into essential distributed systems patterns — Saga (choreography & orchestration), Transactional Outbox, Circuit Breaker, Retry with exponential backoff, Idempotency, and Bulkhead. With .NET and MassTransit examples.
The Distributed Systems Problem
When you split a monolith into services, you trade one set of problems for another:
| Monolith Problem | Distributed Equivalent | |-----------------|------------------------| | Single point of failure | Partial failure (some services down) | | Shared transaction | Distributed consistency (no global ACID) | | In-process calls | Network calls that can fail, timeout, or duplicate | | Single deployment | Coordinated deployments across many services |
Distributed systems patterns exist to manage these tradeoffs — not eliminate them.
Pattern 1: Saga — Distributed Transactions
In a microservices architecture, you cannot have a single database transaction spanning multiple services. A Saga is a sequence of local transactions where each step publishes an event (or message) to trigger the next step.
If any step fails, the saga executes compensating transactions to undo the previous steps.
Choreography vs Orchestration
Choreography — services react to events, no central coordinator:
OrderService InventoryService PaymentService
│ │ │
│── OrderCreated ──────────> │ │
│ StockReserved ──────────────> │
│ │ PaymentProcessed ──>
│<──────────────────────────────────── OrderConfirmed │Pros: simple, loosely coupled
Cons: hard to track overall state, hidden logic spread across services
Orchestration — a central coordinator (saga orchestrator) tells each service what to do:
┌─────────────────────┐
│ OrderSagaOrchestrator │
└─────────────────────┘
│ │ │
ReserveStock ChargePayment SendEmail
│ │ │
InventoryService PaymentService EmailServicePros: workflow is visible in one place, easy to add steps
Cons: orchestrator can become a God service
Orchestration with MassTransit
// Define the saga state machine
public class OrderSaga : MassTransitStateMachine<OrderSagaState>
{
public State Submitted { get; private set; } = null!;
public State StockReserved { get; private set; } = null!;
public State PaymentTaken { get; private set; } = null!;
public State Completed { get; private set; } = null!;
public State Failed { get; private set; } = null!;
public Event<OrderCreatedEvent> OrderCreated { get; private set; } = null!;
public Event<StockReservedEvent> StockReserved_ { get; private set; } = null!;
public Event<StockReservationFailed> StockFailed { get; private set; } = null!;
public Event<PaymentProcessedEvent> PaymentProcessed { get; private set; } = null!;
public Event<PaymentFailedEvent> PaymentFailed { get; private set; } = null!;
public OrderSaga()
{
InstanceState(x => x.CurrentState);
Event(() => OrderCreated, x => x.CorrelateById(m => m.Message.OrderId));
Event(() => StockReserved_, x => x.CorrelateById(m => m.Message.OrderId));
Event(() => StockFailed, x => x.CorrelateById(m => m.Message.OrderId));
Event(() => PaymentProcessed, x => x.CorrelateById(m => m.Message.OrderId));
Event(() => PaymentFailed, x => x.CorrelateById(m => m.Message.OrderId));
Initially(
When(OrderCreated)
.Then(ctx => {
ctx.Saga.OrderId = ctx.Message.OrderId;
ctx.Saga.CustomerId = ctx.Message.CustomerId;
ctx.Saga.Total = ctx.Message.Total;
})
.PublishAsync(ctx => ctx.Init<ReserveStockCommand>(new
{
ctx.Message.OrderId,
ctx.Message.Items,
}))
.TransitionTo(Submitted)
);
During(Submitted,
When(StockReserved_)
.PublishAsync(ctx => ctx.Init<ChargePaymentCommand>(new
{
ctx.Saga.OrderId,
ctx.Saga.CustomerId,
ctx.Saga.Total,
}))
.TransitionTo(StockReserved),
When(StockFailed)
.Then(ctx => ctx.Saga.FailureReason = ctx.Message.Reason)
.PublishAsync(ctx => ctx.Init<OrderFailedEvent>(new
{
ctx.Saga.OrderId,
ctx.Message.Reason,
}))
.TransitionTo(Failed)
);
During(StockReserved,
When(PaymentProcessed)
.PublishAsync(ctx => ctx.Init<OrderConfirmedEvent>(new
{
ctx.Saga.OrderId,
}))
.TransitionTo(Completed)
.Finalize(),
When(PaymentFailed)
// Compensate: release the reserved stock
.PublishAsync(ctx => ctx.Init<ReleaseStockCommand>(new
{
ctx.Saga.OrderId,
}))
.TransitionTo(Failed)
);
SetCompletedWhenFinalized();
}
}
public class OrderSagaState : SagaStateMachineInstance
{
public Guid CorrelationId { get; set; }
public string CurrentState { get; set; } = null!;
public Guid OrderId { get; set; }
public Guid CustomerId { get; set; }
public decimal Total { get; set; }
public string? FailureReason { get; set; }
}// Registration
builder.Services.AddMassTransit(x =>
{
x.AddSagaStateMachine<OrderSaga, OrderSagaState>()
.EntityFrameworkRepository(r =>
{
r.ExistingDbContext<AppDbContext>();
r.UsePostgres();
});
x.UsingAzureServiceBus((ctx, cfg) =>
{
cfg.Host(connectionString);
cfg.ConfigureEndpoints(ctx);
});
});Pattern 2: Transactional Outbox
Problem: After saving an order to the database, you publish an event to the message bus. What if the app crashes between the DB write and the publish? The order is saved but the event is never sent — inventory is never reserved.
Solution: Write the event to an OutboxMessages table in the same transaction as the domain change. A background worker then reads and publishes these messages reliably.
┌─────────────────────────────────┐
│ Same DB Transaction │
│ ───────────────────────── │
│ INSERT INTO Orders │
│ INSERT INTO OutboxMessages │
└─────────────────────────────────┘
↓ (background worker polls)
┌─────────────────────────────────┐
│ OutboxProcessor (Background) │
│ ───────────────────────── │
│ SELECT unprocessed messages │
│ Publish to message bus │
│ UPDATE OutboxMessages.Sent=NOW │
└─────────────────────────────────┘Implementation
// Outbox table entity
public class OutboxMessage
{
public Guid Id { get; set; } = Guid.NewGuid();
public string Type { get; set; } = null!; // full type name
public string Content { get; set; } = null!; // JSON payload
public DateTimeOffset OccurredAt { get; set; } = DateTimeOffset.UtcNow;
public DateTimeOffset? ProcessedAt { get; set; }
public string? Error { get; set; }
}// Write to outbox in the same transaction as domain change
public class ConfirmOrderCommandHandler : IRequestHandler<ConfirmOrderCommand>
{
private readonly IOrderRepository _orders;
private readonly AppDbContext _db;
public async Task Handle(ConfirmOrderCommand cmd, CancellationToken ct)
{
var order = await _orders.GetByIdAsync(cmd.OrderId, ct)
?? throw new NotFoundException(cmd.OrderId);
order.Confirm(); // raises domain event
// Convert domain events → outbox messages in the same transaction
foreach (var domainEvent in order.PopDomainEvents())
{
_db.OutboxMessages.Add(new OutboxMessage
{
Type = domainEvent.GetType().FullName!,
Content = JsonSerializer.Serialize(domainEvent,
domainEvent.GetType(),
JsonSerializerOptions.Default),
});
}
await _db.SaveChangesAsync(ct);
}
}// Background worker that processes the outbox
public class OutboxProcessorWorker : BackgroundService
{
private readonly IServiceScopeFactory _scopeFactory;
private readonly ILogger<OutboxProcessorWorker> _logger;
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
await ProcessBatchAsync(stoppingToken);
await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken);
}
}
private async Task ProcessBatchAsync(CancellationToken ct)
{
using var scope = _scopeFactory.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
var bus = scope.ServiceProvider.GetRequiredService<IPublishEndpoint>();
var messages = await db.OutboxMessages
.Where(m => m.ProcessedAt == null && m.Error == null)
.OrderBy(m => m.OccurredAt)
.Take(50)
.ToListAsync(ct);
foreach (var message in messages)
{
try
{
var type = Type.GetType(message.Type)!;
var payload = JsonSerializer.Deserialize(message.Content, type)!;
await bus.Publish(payload, type, ct);
message.ProcessedAt = DateTimeOffset.UtcNow;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to process outbox message {Id}", message.Id);
message.Error = ex.Message;
}
}
await db.SaveChangesAsync(ct);
}
}MassTransit ships a built-in Outbox — use
cfg.UseMessageRetry()+cfg.UseInMemoryOutbox()(or Entity Framework Outbox) instead of rolling your own in production.
Pattern 3: Circuit Breaker
When Service A calls Service B and B is slow or down, A's threads accumulate waiting for responses. Eventually A runs out of threads and collapses too — cascading failure.
A Circuit Breaker tracks call failures. When failures exceed a threshold, it opens the circuit and immediately rejects calls (instead of waiting for timeouts), giving B time to recover.
States:
Closed → calls pass through, failure count tracked
Open → calls rejected immediately, timer running
Half-Open → one probe call allowed; if it succeeds → Closed; if fails → OpenWith Polly (Microsoft.Extensions.Resilience)
// Program.cs — using .NET 8 resilience pipeline
builder.Services.AddHttpClient<IInventoryClient, InventoryHttpClient>(client =>
{
client.BaseAddress = new Uri(builder.Configuration["Services:Inventory"]);
})
.AddResilienceHandler("inventory-pipeline", pipeline =>
{
// Retry: 3 attempts, exponential back-off + jitter
pipeline.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
Delay = TimeSpan.FromMilliseconds(500),
ShouldHandle = args => ValueTask.FromResult(
args.Outcome.Exception is HttpRequestException ||
args.Outcome.Result?.StatusCode is
>= System.Net.HttpStatusCode.InternalServerError or
System.Net.HttpStatusCode.RequestTimeout),
});
// Circuit breaker: open after 5 failures in 30s, stay open 30s
pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
FailureRatio = 0.5, // 50% failure rate
SamplingDuration = TimeSpan.FromSeconds(30),
MinimumThroughput = 5,
BreakDuration = TimeSpan.FromSeconds(30),
OnOpened = args =>
{
// Log / emit metric when circuit opens
Console.WriteLine($"Circuit opened: {args.BreakDuration}");
return ValueTask.CompletedTask;
},
});
// Timeout: individual call timeout
pipeline.AddTimeout(TimeSpan.FromSeconds(5));
});// The client — just uses HttpClient normally; resilience is transparent
public class InventoryHttpClient : IInventoryClient
{
private readonly HttpClient _http;
public InventoryHttpClient(HttpClient http) => _http = http;
public async Task<bool> CheckStockAsync(Guid productId, int quantity, CancellationToken ct)
{
try
{
var response = await _http.GetFromJsonAsync<StockResponse>(
$"/api/stock/{productId}?quantity={quantity}", ct);
return response?.Available ?? false;
}
catch (BrokenCircuitException)
{
// Circuit is open — return degraded response
return false;
}
}
}Pattern 4: Retry with Exponential Back-off and Jitter
Never retry immediately — you'll hammer a struggling service. Use exponential back-off with jitter (random delay component):
Attempt 1: wait 0.5s
Attempt 2: wait 1s ± jitter
Attempt 3: wait 2s ± jitter
Attempt 4: wait 4s ± jitterThe jitter prevents all instances retrying at the exact same time (the thundering herd problem).
Polly handles this automatically with BackoffType = Exponential and UseJitter = true (shown in the Circuit Breaker example above).
What to retry:
503 Service Unavailable504 Gateway Timeout408 Request Timeout- Network timeouts (
HttpRequestException)
What NOT to retry:
400 Bad Request— your payload is wrong, retry won't fix it401/403— auth problem, retry won't fix it404 Not Found— resource doesn't exist
Pattern 5: Bulkhead
Isolate critical paths so a slow downstream doesn't exhaust all threads.
// Isolate inventory calls to their own semaphore pool
pipeline.AddConcurrencyLimiter(new ConcurrencyLimiterOptions
{
PermitLimit = 10, // max 10 concurrent inventory calls
QueueLimit = 5, // queue up to 5 more
});The bulkhead ensures that if Inventory is slow and fills its 10+5 slots, other services (Payment, Notification) still have their own thread pools and aren't affected.
Pattern 6: Idempotency at the Consumer
In event-driven systems, message brokers guarantee at-least-once delivery. Your consumer will process the same message more than once. Design consumers to be idempotent:
public class OrderConfirmedConsumer : IConsumer<OrderConfirmedIntegrationEvent>
{
private readonly AppDbContext _db;
public async Task Consume(ConsumeContext<OrderConfirmedIntegrationEvent> context)
{
var @event = context.Message;
// Deduplication: check if already processed
var alreadyProcessed = await _db.ProcessedEvents
.AnyAsync(e => e.EventId == @event.EventId);
if (alreadyProcessed)
{
// Idempotent — ignore the duplicate
return;
}
// Process the event
await ReserveStockAsync(@event.OrderId, @event.Lines);
// Record that we processed it
_db.ProcessedEvents.Add(new ProcessedEvent { EventId = @event.EventId });
await _db.SaveChangesAsync();
}
}Patterns Together: Order Placement Flow
Here's how these patterns compose in a real flow:
Client
│
▼ POST /orders
OrderService
├── Write Order + OutboxMessage (same transaction) ← Outbox Pattern
└── Return 202 Accepted immediately
OutboxProcessor (background)
├── Read OutboxMessage
└── Publish OrderCreated event → Service Bus
OrderSagaOrchestrator (MassTransit)
├── Receive OrderCreated
├── Send ReserveStock → InventoryService
│ ├── HTTP call with Circuit Breaker + Retry ← Resilience Pattern
│ └── Idempotent consumer in InventoryService ← Idempotency
├── On StockReserved → Send ChargePayment → PaymentService
├── On PaymentProcessed → Publish OrderConfirmed ← Saga Pattern
└── On any failure → publish compensating commandsKey Takeaways
- Saga — the distributed transaction replacement: each service does its local work and publishes events; compensating transactions handle failures
- Outbox — the only reliable way to publish events atomically with a DB write; prevents lost messages on crash
- Circuit Breaker — stops cascading failures by failing fast when a dependency is down
- Retry + jitter — retry transient failures with exponential back-off; jitter prevents thundering herd
- Bulkhead — isolate resource pools so one slow dependency doesn't take down the whole service
- Idempotency — at-least-once delivery is the norm; design consumers to handle duplicate messages
- These patterns are not optional in production distributed systems — each one addresses a real failure mode you will encounter
Enjoyed this article?
Explore the System Design learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.