Rate Limit Your API Before It Gets Hammered

Why Rate Limiting Belongs in Your API

Without it:

A single bad actor (or a bug in a client) can exhaust your database connections
Scrapers can pull your entire catalogue in seconds
A thundering herd during a traffic spike takes down every user

.NET 7 added Microsoft.AspNetCore.RateLimiting — no third-party package needed.

The Four Built-In Limiters

Fixed Window

A fixed quota resets at the end of each window. Simple, but allows a burst of requests right at the boundary (end of window N + start of window N+1).

// Program.cs
builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("fixed", limiter =>
    {
        limiter.Window           = TimeSpan.FromMinutes(1);
        limiter.PermitLimit      = 60;   // 60 requests per minute
        limiter.QueueLimit       = 0;    // reject excess immediately
        limiter.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    });
});

Sliding Window

Subdivides the window into segments and slides the quota forward, smoothing out boundary bursts.

options.AddSlidingWindowLimiter("sliding", limiter =>
{
    limiter.Window              = TimeSpan.FromMinutes(1);
    limiter.SegmentsPerWindow   = 6;     // 6 x 10-second segments
    limiter.PermitLimit         = 60;
    limiter.QueueLimit          = 5;
});

Token Bucket

Tokens accumulate at a steady rate up to a maximum. Good for bursty workloads where short spikes are acceptable but sustained throughput is capped.

options.AddTokenBucketLimiter("token-bucket", limiter =>
{
    limiter.TokenLimit          = 100;   // max burst
    limiter.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
    limiter.TokensPerPeriod     = 20;    // refill 20 tokens every 10s
    limiter.QueueLimit          = 0;
});

Concurrency Limiter

Caps the number of simultaneous requests in flight — not the rate, but the parallelism. Useful for CPU-bound or database-bound endpoints.

options.AddConcurrencyLimiter("concurrency", limiter =>
{
    limiter.PermitLimit  = 10;  // max 10 concurrent requests
    limiter.QueueLimit   = 5;   // queue up to 5 more
});

Returning 429 With Retry-After

The default rejection returns 503 Service Unavailable. Change it globally:

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.OnRejected = async (context, cancellationToken) =>
    {
        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }

        await context.HttpContext.Response.WriteAsync(
            "Too many requests. Please slow down.", cancellationToken);
    };
});

Applying Limiters

Global — Every Endpoint

// Applies a named policy to all endpoints that don't override it
app.UseRateLimiter();

Set options.GlobalLimiter to a partition-based limiter (shown below) to rate limit everything.

Per Endpoint With the Attribute

app.UseRateLimiter(); // must be registered in the pipeline

// Controller action
[EnableRateLimiting("sliding")]
[HttpPost("search")]
public IActionResult Search([FromBody] SearchRequest req) { ... }

// Opt a specific action out of a global limiter
[DisableRateLimiting]
[HttpGet("health")]
public IActionResult Health() => Ok();

Minimal API:

app.MapPost("/search", SearchHandler)
   .RequireRateLimiting("sliding");

app.MapGet("/health", () => Results.Ok())
   .DisableRateLimiting();

Rate Limiting by User ID

Partition a limiter so each authenticated user gets their own quota:

builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("per-user", httpContext =>
        RateLimitPartition.GetFixedWindowLimiter(
            partitionKey: httpContext.User.FindFirstValue(ClaimTypes.NameIdentifier)
                          ?? httpContext.Connection.RemoteIpAddress?.ToString()
                          ?? "anonymous",
            factory: _ => new FixedWindowRateLimiterOptions
            {
                Window      = TimeSpan.FromMinutes(1),
                PermitLimit = 100,
                QueueLimit  = 0
            }));
});

Authenticated users get 100 req/min each. Unauthenticated requests share a quota per IP.

Rate Limiting by IP Address

options.AddPolicy("per-ip", httpContext =>
    RateLimitPartition.GetSlidingWindowLimiter(
        partitionKey: httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown",
        factory: _ => new SlidingWindowRateLimiterOptions
        {
            Window            = TimeSpan.FromSeconds(30),
            SegmentsPerWindow = 3,
            PermitLimit       = 30,
            QueueLimit        = 0
        }));

Chaining Limiters (Global + Per-Endpoint)

Use PartitionedRateLimiter.CreateChained when you want multiple independent limits (e.g., per-IP AND global):

using System.Threading.RateLimiting;

var perIpLimiter = PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
    RateLimitPartition.GetFixedWindowLimiter(
        partitionKey: ctx.Connection.RemoteIpAddress?.ToString() ?? "unknown",
        factory: _ => new FixedWindowRateLimiterOptions
        {
            Window = TimeSpan.FromSeconds(10), PermitLimit = 10
        }));

var globalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(_ =>
    RateLimitPartition.GetTokenBucketLimiter(
        partitionKey: "global",
        factory: _ => new TokenBucketRateLimiterOptions
        {
            TokenLimit = 1000, ReplenishmentPeriod = TimeSpan.FromSeconds(1), TokensPerPeriod = 100
        }));

builder.Services.AddRateLimiter(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.CreateChained(perIpLimiter, globalLimiter);
    options.RejectionStatusCode = 429;
});

A request must satisfy both limiters to proceed.

Full Setup Example

// Program.cs
builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.OnRejected = async (ctx, ct) =>
    {
        if (ctx.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retry))
            ctx.HttpContext.Response.Headers.RetryAfter = ((int)retry.TotalSeconds).ToString();

        ctx.HttpContext.Response.ContentType = "application/json";
        await ctx.HttpContext.Response.WriteAsync(
            """{"error":"rate_limit_exceeded","message":"Slow down, friend."}""", ct);
    };

    // Authenticated: 200/min per user. Anonymous: 20/min per IP.
    options.AddPolicy("adaptive", httpContext =>
    {
        var userId = httpContext.User.FindFirstValue(ClaimTypes.NameIdentifier);
        if (userId is not null)
            return RateLimitPartition.GetFixedWindowLimiter(userId,
                _ => new FixedWindowRateLimiterOptions { Window = TimeSpan.FromMinutes(1), PermitLimit = 200 });

        var ip = httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        return RateLimitPartition.GetFixedWindowLimiter($"anon:{ip}",
            _ => new FixedWindowRateLimiterOptions { Window = TimeSpan.FromMinutes(1), PermitLimit = 20 });
    });
});

var app = builder.Build();
app.UseRateLimiter(); // must come before UseRouting/MapControllers
app.MapControllers();

Key Takeaways

Fixed window is simplest; sliding window is smoother; token bucket handles bursts gracefully
Concurrency limiter caps parallelism, not throughput — great for downstream bottlenecks
Always return 429 not 503 — clients can distinguish "slow down" from "server broken"
Retry-After tells clients exactly how long to wait — implement it
Partition by user ID when authenticated, fall back to IP for anonymous traffic