.NET & C# Development · Lesson 72 of 92

Polly — Retry, Circuit Breaker & Hedging for Flaky APIs

Why Resilience Matters

Your API depends on a payment gateway, an auth service, and a notification provider. Any of them can:

  • Return 503 for 30 seconds during a deploy
  • Time out under load
  • Fail 20% of requests due to a flaky network

Without resilience, one slow downstream causes your thread pool to fill with waiting requests, your response times spike, and your app falls over — a cascading failure.


.NET 8 Resilience Stack

.NET 8 ships Microsoft.Extensions.Resilience — a first-party wrapper around Polly v8.

Bash
dotnet add package Microsoft.Extensions.Http.Resilience

This is the package for HTTP clients. It pulls in Microsoft.Extensions.Resilience automatically.


The Standard Pipeline — One Line

C#
builder.Services.AddHttpClient<PaymentApiClient>(c =>
    c.BaseAddress = new Uri("https://payments.example.com"))
    .AddStandardResilienceHandler();

AddStandardResilienceHandler() wires up a layered pipeline with sensible defaults:

| Strategy | Default Config | |---|---| | Total timeout | 30 seconds for the entire attempt chain | | Retry | 3 retries, exponential backoff + jitter, handles 408/429/5xx | | Circuit breaker | Opens at 10% failure rate over 30s with 5 minimum requests | | Attempt timeout | 10 seconds per individual attempt |

Good enough for most internal service calls. For external third-party APIs, configure explicitly.


Building a Custom Pipeline

C#
builder.Services.AddHttpClient<PaymentApiClient>(c =>
    c.BaseAddress = new Uri("https://payments.example.com"))
    .AddResilienceHandler("payment-pipeline", pipeline =>
    {
        // Strategies execute top-to-bottom on request, bottom-to-top on response
        pipeline
            .AddTimeout(TimeSpan.FromSeconds(30))   // total timeout (outermost)
            .AddRetry(new HttpRetryStrategyOptions
            {
                MaxRetryAttempts = 3,
                Delay = TimeSpan.FromMilliseconds(500),
                BackoffType = DelayBackoffType.Exponential,
                UseJitter = true
            })
            .AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
            {
                SamplingDuration = TimeSpan.FromSeconds(60),
                MinimumThroughput = 10,
                FailureRatio = 0.5,          // open when 50% of requests fail
                BreakDuration = TimeSpan.FromSeconds(30)
            })
            .AddTimeout(TimeSpan.FromSeconds(8)); // per-attempt timeout (innermost)
    });

Order matters: strategies are applied like middleware. The per-attempt timeout wraps the actual HTTP call. The retry wraps the per-attempt timeout. The circuit breaker wraps the retry. The total timeout wraps everything.


Retry Strategy Deep Dive

C#
var retryOptions = new HttpRetryStrategyOptions
{
    MaxRetryAttempts = 4,
    Delay = TimeSpan.FromMilliseconds(200),
    BackoffType = DelayBackoffType.Exponential,  // 200ms, 400ms, 800ms, 1600ms
    UseJitter = true,                             // randomises delay ±25%

    // Custom predicate — only retry on transient errors
    ShouldHandle = args => args.Outcome switch
    {
        { Exception: HttpRequestException or TimeoutRejectedException } => PredicateResult.True(),
        { Result: { StatusCode: HttpStatusCode.TooManyRequests } } => PredicateResult.True(),
        { Result: { StatusCode: >= HttpStatusCode.InternalServerError } } => PredicateResult.True(),
        _ => PredicateResult.False()
    },

    // Honour Retry-After header from 429 responses
    OnRetry = args =>
    {
        var retryAfter = args.Outcome.Result?.Headers.RetryAfter?.Delta;
        if (retryAfter.HasValue)
            args.RetryDelay = retryAfter.Value;

        return default;
    }
};

Circuit Breaker — Closed / Open / Half-Open

CLOSED ──(failure ratio > threshold)──> OPEN
OPEN   ──(break duration elapsed)────> HALF-OPEN
HALF-OPEN ──(probe succeeds)─────────> CLOSED
HALF-OPEN ──(probe fails)────────────> OPEN
C#
var cbOptions = new HttpCircuitBreakerStrategyOptions
{
    // Sampling window
    SamplingDuration = TimeSpan.FromSeconds(30),

    // Minimum requests before the ratio is checked
    MinimumThroughput = 5,

    // Open when 40% of requests in the window fail
    FailureRatio = 0.4,

    // Stay open for 20 seconds, then probe with one request
    BreakDuration = TimeSpan.FromSeconds(20),

    // Fired when the circuit opens
    OnOpened = args =>
    {
        logger.LogWarning(
            "Circuit opened for {Duration}s — {Reason}",
            args.BreakDuration.TotalSeconds,
            args.Outcome.Exception?.Message ?? args.Outcome.Result?.StatusCode.ToString());
        return default;
    },

    OnClosed = args =>
    {
        logger.LogInformation("Circuit closed — downstream recovered");
        return default;
    }
};

When the circuit is open, BrokenCircuitException is thrown immediately — no HTTP call is made. Your catch block should return a cached response or a graceful fallback.


Timeout Strategy

C#
// Per-attempt timeout — cancels one HTTP call
pipeline.AddTimeout(new HttpTimeoutStrategyOptions
{
    Timeout = TimeSpan.FromSeconds(5),
    OnTimeout = args =>
    {
        logger.LogWarning("HTTP call timed out after {Timeout}s", args.Timeout.TotalSeconds);
        return default;
    }
});

A TimeoutRejectedException is thrown on timeout. The retry strategy can catch it and retry with backoff.


Hedging — Parallel Speculative Requests

Hedging fires a second request if the first hasn't returned within a threshold. Whichever responds first wins.

C#
pipeline.AddHedging(new HttpHedgingStrategyOptions
{
    MaxHedgedAttempts = 2,
    Delay = TimeSpan.FromMilliseconds(500), // fire second request after 500ms

    ShouldHandle = args => args.Outcome switch
    {
        { Exception: HttpRequestException } => PredicateResult.True(),
        { Result: { StatusCode: HttpStatusCode.ServiceUnavailable } } => PredicateResult.True(),
        _ => PredicateResult.False()
    }
});

Use hedging for read-only calls where latency matters more than load. Never hedge write operations.


Using ResiliencePipeline Directly (Non-HTTP)

For database calls, queue sends, or any async operation:

C#
// Register
builder.Services.AddResiliencePipeline("db-pipeline", pipeline =>
{
    pipeline
        .AddRetry(new RetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromMilliseconds(100),
            BackoffType = DelayBackoffType.Exponential,
            ShouldHandle = new PredicateBuilder()
                .Handle<SqlException>(ex => ex.IsTransient)
                .Handle<TimeoutException>()
        })
        .AddTimeout(TimeSpan.FromSeconds(5));
});

// Use
public class OrderRepository(ResiliencePipelineProvider<string> pipelineProvider, DbContext db)
{
    private readonly ResiliencePipeline _pipeline = pipelineProvider.GetPipeline("db-pipeline");

    public async Task<Order?> GetOrderAsync(Guid id, CancellationToken ct)
    {
        return await _pipeline.ExecuteAsync(
            async token => await db.Orders.FindAsync([id], token),
            ct);
    }
}

Telemetry

Polly v8 emits metrics and events via System.Diagnostics.Metrics and ILogger automatically when you call AddStandardResilienceHandler or AddResilienceHandler.

C#
// Enable telemetry enrichment (adds HTTP method, status code, server address to metrics)
builder.Services.ConfigureHttpClientDefaults(http =>
{
    http.AddStandardResilienceHandler();
});

Metrics emitted under Polly meter:

  • resilience.polly.retry.attempts — counter
  • resilience.polly.circuit-breaker.open — gauge
  • resilience.polly.timeout — counter

These flow into Prometheus / OpenTelemetry automatically if you have the OTEL SDK set up.


Key Takeaways

  • AddStandardResilienceHandler() is a solid production default — use it unless you need custom thresholds
  • Strategy order: total timeout → retry → circuit breaker → per-attempt timeout
  • Circuit breakers protect downstream services from being hammered when they're struggling
  • Hedging reduces p99 latency for reads at the cost of extra load
  • Polly v8 emits metrics natively — pair with OpenTelemetry for full observability