.NET & C# Development · Lesson 72 of 92
Polly — Retry, Circuit Breaker & Hedging for Flaky APIs
Why Resilience Matters
Your API depends on a payment gateway, an auth service, and a notification provider. Any of them can:
- Return 503 for 30 seconds during a deploy
- Time out under load
- Fail 20% of requests due to a flaky network
Without resilience, one slow downstream causes your thread pool to fill with waiting requests, your response times spike, and your app falls over — a cascading failure.
.NET 8 Resilience Stack
.NET 8 ships Microsoft.Extensions.Resilience — a first-party wrapper around Polly v8.
dotnet add package Microsoft.Extensions.Http.ResilienceThis is the package for HTTP clients. It pulls in Microsoft.Extensions.Resilience automatically.
The Standard Pipeline — One Line
builder.Services.AddHttpClient<PaymentApiClient>(c =>
c.BaseAddress = new Uri("https://payments.example.com"))
.AddStandardResilienceHandler();AddStandardResilienceHandler() wires up a layered pipeline with sensible defaults:
| Strategy | Default Config | |---|---| | Total timeout | 30 seconds for the entire attempt chain | | Retry | 3 retries, exponential backoff + jitter, handles 408/429/5xx | | Circuit breaker | Opens at 10% failure rate over 30s with 5 minimum requests | | Attempt timeout | 10 seconds per individual attempt |
Good enough for most internal service calls. For external third-party APIs, configure explicitly.
Building a Custom Pipeline
builder.Services.AddHttpClient<PaymentApiClient>(c =>
c.BaseAddress = new Uri("https://payments.example.com"))
.AddResilienceHandler("payment-pipeline", pipeline =>
{
// Strategies execute top-to-bottom on request, bottom-to-top on response
pipeline
.AddTimeout(TimeSpan.FromSeconds(30)) // total timeout (outermost)
.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromMilliseconds(500),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true
})
.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
SamplingDuration = TimeSpan.FromSeconds(60),
MinimumThroughput = 10,
FailureRatio = 0.5, // open when 50% of requests fail
BreakDuration = TimeSpan.FromSeconds(30)
})
.AddTimeout(TimeSpan.FromSeconds(8)); // per-attempt timeout (innermost)
});Order matters: strategies are applied like middleware. The per-attempt timeout wraps the actual HTTP call. The retry wraps the per-attempt timeout. The circuit breaker wraps the retry. The total timeout wraps everything.
Retry Strategy Deep Dive
var retryOptions = new HttpRetryStrategyOptions
{
MaxRetryAttempts = 4,
Delay = TimeSpan.FromMilliseconds(200),
BackoffType = DelayBackoffType.Exponential, // 200ms, 400ms, 800ms, 1600ms
UseJitter = true, // randomises delay ±25%
// Custom predicate — only retry on transient errors
ShouldHandle = args => args.Outcome switch
{
{ Exception: HttpRequestException or TimeoutRejectedException } => PredicateResult.True(),
{ Result: { StatusCode: HttpStatusCode.TooManyRequests } } => PredicateResult.True(),
{ Result: { StatusCode: >= HttpStatusCode.InternalServerError } } => PredicateResult.True(),
_ => PredicateResult.False()
},
// Honour Retry-After header from 429 responses
OnRetry = args =>
{
var retryAfter = args.Outcome.Result?.Headers.RetryAfter?.Delta;
if (retryAfter.HasValue)
args.RetryDelay = retryAfter.Value;
return default;
}
};Circuit Breaker — Closed / Open / Half-Open
CLOSED ──(failure ratio > threshold)──> OPEN
OPEN ──(break duration elapsed)────> HALF-OPEN
HALF-OPEN ──(probe succeeds)─────────> CLOSED
HALF-OPEN ──(probe fails)────────────> OPENvar cbOptions = new HttpCircuitBreakerStrategyOptions
{
// Sampling window
SamplingDuration = TimeSpan.FromSeconds(30),
// Minimum requests before the ratio is checked
MinimumThroughput = 5,
// Open when 40% of requests in the window fail
FailureRatio = 0.4,
// Stay open for 20 seconds, then probe with one request
BreakDuration = TimeSpan.FromSeconds(20),
// Fired when the circuit opens
OnOpened = args =>
{
logger.LogWarning(
"Circuit opened for {Duration}s — {Reason}",
args.BreakDuration.TotalSeconds,
args.Outcome.Exception?.Message ?? args.Outcome.Result?.StatusCode.ToString());
return default;
},
OnClosed = args =>
{
logger.LogInformation("Circuit closed — downstream recovered");
return default;
}
};When the circuit is open, BrokenCircuitException is thrown immediately — no HTTP call is made. Your catch block should return a cached response or a graceful fallback.
Timeout Strategy
// Per-attempt timeout — cancels one HTTP call
pipeline.AddTimeout(new HttpTimeoutStrategyOptions
{
Timeout = TimeSpan.FromSeconds(5),
OnTimeout = args =>
{
logger.LogWarning("HTTP call timed out after {Timeout}s", args.Timeout.TotalSeconds);
return default;
}
});A TimeoutRejectedException is thrown on timeout. The retry strategy can catch it and retry with backoff.
Hedging — Parallel Speculative Requests
Hedging fires a second request if the first hasn't returned within a threshold. Whichever responds first wins.
pipeline.AddHedging(new HttpHedgingStrategyOptions
{
MaxHedgedAttempts = 2,
Delay = TimeSpan.FromMilliseconds(500), // fire second request after 500ms
ShouldHandle = args => args.Outcome switch
{
{ Exception: HttpRequestException } => PredicateResult.True(),
{ Result: { StatusCode: HttpStatusCode.ServiceUnavailable } } => PredicateResult.True(),
_ => PredicateResult.False()
}
});Use hedging for read-only calls where latency matters more than load. Never hedge write operations.
Using ResiliencePipeline Directly (Non-HTTP)
For database calls, queue sends, or any async operation:
// Register
builder.Services.AddResiliencePipeline("db-pipeline", pipeline =>
{
pipeline
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromMilliseconds(100),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = new PredicateBuilder()
.Handle<SqlException>(ex => ex.IsTransient)
.Handle<TimeoutException>()
})
.AddTimeout(TimeSpan.FromSeconds(5));
});
// Use
public class OrderRepository(ResiliencePipelineProvider<string> pipelineProvider, DbContext db)
{
private readonly ResiliencePipeline _pipeline = pipelineProvider.GetPipeline("db-pipeline");
public async Task<Order?> GetOrderAsync(Guid id, CancellationToken ct)
{
return await _pipeline.ExecuteAsync(
async token => await db.Orders.FindAsync([id], token),
ct);
}
}Telemetry
Polly v8 emits metrics and events via System.Diagnostics.Metrics and ILogger automatically when you call AddStandardResilienceHandler or AddResilienceHandler.
// Enable telemetry enrichment (adds HTTP method, status code, server address to metrics)
builder.Services.ConfigureHttpClientDefaults(http =>
{
http.AddStandardResilienceHandler();
});Metrics emitted under Polly meter:
resilience.polly.retry.attempts— counterresilience.polly.circuit-breaker.open— gaugeresilience.polly.timeout— counter
These flow into Prometheus / OpenTelemetry automatically if you have the OTEL SDK set up.
Key Takeaways
AddStandardResilienceHandler()is a solid production default — use it unless you need custom thresholds- Strategy order: total timeout → retry → circuit breaker → per-attempt timeout
- Circuit breakers protect downstream services from being hammered when they're struggling
- Hedging reduces p99 latency for reads at the cost of extra load
- Polly v8 emits metrics natively — pair with OpenTelemetry for full observability