.NET & C# Development · Lesson 196 of 229
Case Study: The N+1 Query That Took Down Production
Case Study: The N+1 Query That Took Down Production
System: B2B order management API — 50 customers, ~5,000 orders/day
Stack: ASP.NET Core 8, EF Core 9, PostgreSQL 16, Azure
Incident duration: 47 minutes
Root cause: N+1 query triggered by a new report endpoint
Impact: All API endpoints degraded; database CPU at 100%
Timeline
08:42 New "monthly report" endpoint deployed to production
(included in a larger release, not flagged as high-risk)
09:15 First alert: API P95 latency > 5 seconds (threshold: 500ms)
On-call engineer paged
09:18 Second alert: database CPU > 90%
Database auto-scaling triggered — max instance size already reached
09:23 On-call checks dashboards: all endpoints degraded, not just the report
Hypothesis 1: database overloaded — find the hot query
09:31 pg_stat_activity shows 847 active connections
Maximum pool size: 100 — connection pool exhausted
All API endpoints queuing for a database connection
09:41 Suspect the new report endpoint — it was the only change
Roll back the deployment
09:47 Database CPU drops from 98% to 12%
API latency returns to 40ms P95
Incident closed
09:48 Post-mortem startedThe Code That Caused It
C#
// The new monthly report endpoint — looked innocent in code review
app.MapGet("/api/reports/monthly", async (
int month,
int year,
OrderFlowDbContext db) =>
{
var orders = await db.Orders
.Where(o => o.CreatedAt.Month == month && o.CreatedAt.Year == year)
.ToListAsync(); // fetches all orders for the month
// ← N+1 starts here
var report = orders.Select(o => new OrderReportLine(
OrderId: o.Id,
CustomerName: db.Customers.Find(o.CustomerId)!.Name, // ← 1 query PER ORDER
Total: o.Total,
ItemCount: o.Lines.Count // ← navigation property — lazy loading
)).ToList();
return Results.Ok(report);
});For a month with 200 orders:
1 query: SELECT * FROM orders WHERE month=... AND year=... → 200 rows
200 queries: SELECT * FROM customers WHERE id=1
SELECT * FROM customers WHERE id=2
... (one per order)
200 queries: SELECT * FROM order_items WHERE order_id=1
SELECT * FROM order_items WHERE order_id=2
... (lazy loading Lines)
Total: 401 database queries per report request
During the month-end reporting window:
15 finance users opened the report simultaneously
401 queries × 15 users = 6,015 queries per second
Normal baseline: ~50 queries per second
→ 120x the normal database loadWhy It Passed Code Review
The code review checklist didn't include:
□ Does this endpoint load an unbounded collection?
□ Does it access navigation properties in a loop?
□ Was it tested with production-sized data?
The reviewer saw:
- Clean LINQ
- No obvious bugs
- Correct results on the dev database (12 orders that month)
What they missed:
- db.Customers.Find() inside Select() runs synchronously,
blocking the thread and issuing a synchronous DB call
- o.Lines.Count triggers lazy loading (Lines was not eagerly loaded)
- The endpoint was never load testedRoot Cause Analysis
Contributing factors:
1. N+1 query pattern
Primary cause. EF Core lazy loading + synchronous Find() inside LINQ Select().
2. Unbounded result set
No pagination — returns ALL orders for the month.
200 orders × 401 queries = linear scaling to disaster.
3. No connection pool limit per tenant
All 15 users shared the same pool, starving other endpoints.
4. No query timeout
The long-running report queries held connections for 30+ seconds.
5. No load testing
The endpoint was only tested on a dev DB with 12 orders.
6. Lazy loading enabled globally
db.UseLazyLoadingProxies() was in the DbContext configuration —
a team decision from 2 years ago that nobody questioned.The Fix
C#
// Fixed endpoint — one query, no N+1
app.MapGet("/api/reports/monthly", async (
int month,
int year,
int page,
int pageSize,
OrderFlowDbContext db) =>
{
var report = await db.Orders
.AsNoTracking() // no change tracking needed for reports
.Where(o => o.CreatedAt.Month == month && o.CreatedAt.Year == year)
.OrderByDescending(o => o.CreatedAt)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new OrderReportLine(
OrderId: o.Id,
CustomerName: o.Customer.Name, // JOIN — 0 extra queries
Total: o.Total,
ItemCount: o.Lines.Count)) // COUNT in SQL — 0 extra queries
.ToListAsync();
return Results.Ok(report);
});SQL
-- EF Core now generates ONE query:
SELECT o.id, c.name, o.total, COUNT(l.id)
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
LEFT JOIN order_items l ON l.order_id = o.id
WHERE EXTRACT(MONTH FROM o.created_at) = 1
AND EXTRACT(YEAR FROM o.created_at) = 2026
GROUP BY o.id, c.name, o.total
ORDER BY o.created_at DESC
LIMIT 50 OFFSET 0Process Changes After the Incident
1. Disabled lazy loading globally
C#
// Before
builder.Services.AddDbContext<AppDbContext>(opts =>
opts.UseLazyLoadingProxies() // removed
.UseNpgsql(connectionString));
// After — explicit eager loading only
builder.Services.AddDbContext<AppDbContext>(opts =>
opts.UseNpgsql(connectionString));2. Added query tag tracing
C#
// Tag every query with its handler name — visible in pg_stat_activity
public override async Task<List<T>> ToListAsync(...)
{
return await base.TagWith(typeof(T).Name).ToListAsync(...);
}3. Added a N+1 detection integration test
C#
[Fact]
public async Task MonthlyReport_ExecutesExactlyOneQuery()
{
var queryCount = 0;
var listener = new QueryCountListener(ref queryCount);
// Intercept EF Core commands
await using var scope = factory.Services.CreateAsyncScope();
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
db.Database.GetDbConnection().StateChange += (_, _) => queryCount++;
var response = await client.GetAsync("/api/reports/monthly?month=1&year=2026&page=1&pageSize=50");
queryCount.Should().Be(1, "report endpoint must execute exactly one query");
}4. Added slow query alerting
SQL
-- Postgres slow query logging
ALTER SYSTEM SET log_min_duration_statement = '500'; -- log queries > 500ms
SELECT pg_reload_conf();5. Added connection pool monitoring
C#
// Alert when pool utilisation > 80%
builder.Services.AddHealthChecks()
.AddCheck<DatabaseConnectionPoolHealthCheck>("db-pool");Lessons
1. Lazy loading is a footgun at scale.
Disable it globally. Use .Include() explicitly where needed.
2. "It works on dev" is not "it works in production."
Load test every endpoint that touches an unbounded collection.
3. Pagination is not optional.
Any endpoint returning a list needs LIMIT/OFFSET or keyset pagination.
4. Find() inside LINQ Select() is always N+1.
If you're calling db.Entity.Find() in a loop or in Select(), rewrite as a JOIN.
5. Monitor what matters: query count per request, P99 latency, connection pool usage.
The outage was detectable 30 minutes before it became critical — we just weren't looking.Metrics Before and After
Before fix After fix
Queries/request 401 1
P95 latency 28,000ms 120ms
DB CPU (peak) 98% 14%
Report load time 30+ sec 0.8 sec
Concurrent users 5 (limit) 100+