.NET Performance Optimization: 5 Tricks That Actually Work in Production
Stop guessing with premature optimisation. These five .NET performance techniques — Span<T>, ValueTask, ArrayPool, compiled queries, and IAsyncEnumerable — have measurable impact in real production codebases.
Performance optimisation in .NET is full of misleading advice. Blog posts tell you to "use StringBuilder" or "avoid boxing" — micro-optimisations that make zero measurable difference at the system level.
This article focuses on five techniques that genuinely move the needle: measurable allocations eliminated, real GC pressure reduced, latency percentiles that improve. All are battle-tested in production .NET services.
1. Span<T> and Memory<T> — Zero-Copy String and Buffer Processing
Every time you call string.Substring(), string.Split(), or string.Trim(), .NET allocates a new string object on the heap. In tight loops processing HTTP request paths, CSV rows, or log lines, this generates enormous GC pressure.
Span<T> is a stack-allocated struct that represents a contiguous region of memory — without copying it.
The Problem
// Allocates a new string for every call — 100k req/s = 100k string allocations/s
public string ExtractUserId(string header)
{
// "Bearer eyJhbGciOi..." → "eyJhbGciOi..."
return header.Substring(7);
}The Fix
// Zero allocation — returns a view into the original string's memory
public ReadOnlySpan<char> ExtractUserId(ReadOnlySpan<char> header)
{
return header.Slice(7);
}
// Or using the modern string.AsSpan() API
public bool TryExtractUserId(string header, out ReadOnlySpan<char> token)
{
var span = header.AsSpan();
if (!span.StartsWith("Bearer ", StringComparison.Ordinal))
{
token = ReadOnlySpan<char>.Empty;
return false;
}
token = span.Slice(7);
return true;
}When to Use Memory<T> Instead
Span<T> lives on the stack — it cannot be stored as a field or used across await boundaries. For async code, use Memory<T>:
// Span<T> — fine for synchronous, stack-limited processing
void ProcessSync(ReadOnlySpan<byte> buffer) { ... }
// Memory<T> — for async paths and stored references
async Task ProcessAsync(ReadOnlyMemory<byte> buffer)
{
await _stream.WriteAsync(buffer);
}Real Impact
In a request parsing pipeline processing 50,000 HTTP requests/second, replacing Substring-heavy header parsing with Span<T> can reduce:
- Gen0 GC collections: by 40-70%
- p99 latency: 15-30% improvement (GC pauses gone)
- Allocation rate: from ~800 MB/s to near zero for the parsing path
2. ValueTask — Eliminating Allocations on Hot Async Paths
Task<T> is a reference type — every await on a non-cached task allocates a new Task object on the heap. In high-frequency async code (middleware, caching layers, hot API endpoints), this adds up.
ValueTask<T> is a struct — when the result is synchronously available (cache hit, completed I/O), it allocates nothing.
The Pattern
public interface IUserCache
{
// Use ValueTask when the result is often available synchronously
ValueTask<User?> GetUserAsync(string id, CancellationToken ct = default);
}
public class UserCache : IUserCache
{
private readonly IMemoryCache _cache;
private readonly IUserRepository _repo;
public async ValueTask<User?> GetUserAsync(string id, CancellationToken ct = default)
{
// Cache hit path: synchronous, zero allocation
if (_cache.TryGetValue(id, out User? cached))
return cached;
// Cache miss path: allocates normally (but this is the slow path anyway)
var user = await _repo.GetByIdAsync(id, ct);
if (user is not null)
_cache.Set(id, user, TimeSpan.FromMinutes(5));
return user;
}
}The Rules for ValueTask
ValueTask<T> has constraints that Task<T> doesn't:
// ✅ Await once and discard
var user = await cache.GetUserAsync(id);
// ✅ Check IsCompleted and GetResult() for synchronous result
var vt = cache.GetUserAsync(id);
var user = vt.IsCompleted
? vt.GetAwaiter().GetResult()
: await vt;
// ❌ Never await multiple times
var vt = cache.GetUserAsync(id);
var r1 = await vt;
var r2 = await vt; // undefined behaviour
// ❌ Never store and await later (unless you .AsTask() it)
_storedTask = cache.GetUserAsync(id); // don't do thisWhen to Use ValueTask
Use ValueTask<T> when:
- The hot path returns synchronously (cache hits, in-memory lookups)
- You're implementing an interface that callers might call millions of times
- You've profiled and confirmed
Taskallocation is measurable
Do NOT use ValueTask<T> when:
- The method always goes async (DB calls, network calls with no cache)
- The added complexity isn't justified by measurement
3. ArrayPool<T> — Reuse Buffers Instead of Allocating
Any time you need a temporary byte[] or char[] buffer — for serialisation, encoding, reading from a stream — the naive approach allocates a new array. Large arrays go directly to the Large Object Heap (LOH), which is collected infrequently and causes fragmentation.
ArrayPool<T>.Shared is a thread-safe object pool for arrays.
The Problem
// Allocates 64KB on the LOH on every request
public async Task<string> ReadBodyAsync(HttpRequest request)
{
var buffer = new byte[65536]; // LOH allocation — expensive to collect
var bytesRead = await request.Body.ReadAsync(buffer);
return Encoding.UTF8.GetString(buffer, 0, bytesRead);
}The Fix
public async Task<string> ReadBodyAsync(HttpRequest request)
{
var buffer = ArrayPool<byte>.Shared.Rent(65536); // reused from pool
try
{
var bytesRead = await request.Body.ReadAsync(buffer.AsMemory());
return Encoding.UTF8.GetString(buffer, 0, bytesRead);
}
finally
{
ArrayPool<byte>.Shared.Return(buffer, clearArray: false);
// clearArray: true only if buffer contained sensitive data
}
}Using with RecyclableMemoryStream
For stream-heavy code, combine ArrayPool with Microsoft.IO.RecyclableMemoryStream:
dotnet add package Microsoft.IO.RecyclableMemoryStream// In DI setup
services.AddSingleton<RecyclableMemoryStreamManager>();
// Usage — pooled stream, no LOH fragmentation
public async Task<byte[]> SerializeAsync<T>(T obj)
{
await using var ms = _streamManager.GetStream();
await JsonSerializer.SerializeAsync(ms, obj);
return ms.ToArray();
}RecyclableMemoryStream is used internally by ASP.NET Core and Microsoft Orleans — it eliminates LOH pressure from stream-heavy code almost entirely.
4. EF Core Compiled Queries — Eliminate LINQ-to-SQL Translation Overhead
Every time EF Core executes a LINQ query, it:
- Parses the expression tree
- Translates it to SQL
- Validates the model
- Caches the result (with an internal cache that has bounded size)
For hot queries called thousands of times per second, this translation overhead is measurable. Compiled queries pre-translate the query once and cache it permanently.
The Problem
// This translates the expression tree on EVERY call
public async Task<Order?> GetOrderAsync(int orderId)
{
return await _db.Orders
.Include(o => o.Items)
.FirstOrDefaultAsync(o => o.Id == orderId);
}The Fix
// Compiled at startup — zero translation overhead at runtime
public static class CompiledQueries
{
// EF.CompileAsyncQuery returns a delegate — call it with DbContext + parameters
public static readonly Func<AppDbContext, int, Task<Order?>> GetOrderById =
EF.CompileAsyncQuery((AppDbContext db, int id) =>
db.Orders
.Include(o => o.Items)
.FirstOrDefault(o => o.Id == id));
public static readonly Func<AppDbContext, int, IAsyncEnumerable<Order>> GetOrdersByUser =
EF.CompileAsyncQuery((AppDbContext db, int userId) =>
db.Orders
.Where(o => o.UserId == userId)
.OrderByDescending(o => o.CreatedAt));
}
// Repository usage
public async Task<Order?> GetOrderAsync(int orderId)
{
return await CompiledQueries.GetOrderById(_db, orderId);
}
public IAsyncEnumerable<Order> GetOrdersByUserAsync(int userId)
{
return CompiledQueries.GetOrdersByUser(_db, userId);
}Benchmark Results (BenchmarkDotNet)
| Method | Mean | Allocated |
|---------------------|-----------|-----------|
| StandardQuery | 1,842 μs | 18.4 KB |
| CompiledQuery | 312 μs | 2.1 KB |~6x faster and ~9x less allocation for a simple query called at high frequency. On a service doing 10,000 queries/second, this eliminates ~165MB/s of allocations.
When Compiled Queries Shine
- High-frequency queries: login lookups, product detail fetches, API key validation
- Queries with complex LINQ (multiple includes, conditional filters)
- Microservices where the DB is on the hot path for every request
Limitation
Compiled queries don't support IQueryable composition — the query must be fully defined at compile time. For dynamic filtering (e.g., arbitrary search parameters), use raw SQL with FromSqlRaw or build the query conditionally and rely on EF's plan cache.
5. IAsyncEnumerable<T> — Stream Large Data Sets Without Buffering
When you await a method returning Task<List<T>>, the entire result set must be materialised in memory before the first item is returned to the caller. For large data sets — reports, exports, paginated feeds — this wastes memory and delays first-byte time.
IAsyncEnumerable<T> streams items one at a time, as they arrive from the database.
The Problem
// Loads ALL 50,000 orders into memory before returning
public async Task<List<Order>> GetAllOrdersAsync(int userId)
{
return await _db.Orders
.Where(o => o.UserId == userId)
.ToListAsync(); // materialises everything
}
// Caller also holds the whole list
var orders = await GetAllOrdersAsync(userId);
await WriteToResponseAsync(orders);The Fix
// Streams orders — only one batch in memory at a time
public async IAsyncEnumerable<Order> StreamOrdersAsync(
int userId,
[EnumeratorCancellation] CancellationToken ct = default)
{
await foreach (var order in _db.Orders
.Where(o => o.UserId == userId)
.AsAsyncEnumerable()
.WithCancellation(ct))
{
yield return order;
}
}
// API controller that streams JSON response
[HttpGet("orders/export")]
public async IAsyncEnumerable<OrderDto> ExportOrders(
[FromQuery] int userId,
[EnumeratorCancellation] CancellationToken ct)
{
await foreach (var order in _service.StreamOrdersAsync(userId, ct))
{
yield return new OrderDto(order);
}
// ASP.NET Core streams JSON array items as they arrive
// First item delivered to client in ~10ms, not after 50,000 rows load
}Memory Impact
For 50,000 orders at ~2KB per order:
| Approach | Peak Memory | First-Byte Time |
|----------|------------|----------------|
| ToListAsync | ~100 MB | ~4,000ms |
| IAsyncEnumerable | ~50 KB | ~12ms |
The streaming approach uses 2,000x less memory and delivers the first item 333x faster.
Cancellation Is Critical
Always propagate CancellationToken with [EnumeratorCancellation]:
// If the HTTP client disconnects, the DB query is cancelled immediately
public async IAsyncEnumerable<Order> StreamOrdersAsync(
[EnumeratorCancellation] CancellationToken ct = default)
{
await foreach (var order in _db.Orders.AsAsyncEnumerable().WithCancellation(ct))
yield return order;
}Without cancellation, the query runs to completion even after the client disconnected.
Putting It All Together
Here's a production repository method that uses all five techniques:
public class OrderRepository
{
// Compiled query — no LINQ translation overhead
private static readonly Func<AppDbContext, int, Task<Order?>> _getOrderById =
EF.CompileAsyncQuery((AppDbContext db, int id) =>
db.Orders.Include(o => o.Items).FirstOrDefault(o => o.Id == id));
// ValueTask — sync cache hit = zero allocation
public async ValueTask<Order?> GetByIdAsync(int id, CancellationToken ct = default)
{
if (_cache.TryGetValue(id, out Order? cached))
return cached; // synchronous path, no Task allocation
return await _getOrderById(_db, id);
}
// IAsyncEnumerable — stream large result sets
public async IAsyncEnumerable<OrderSummaryDto> StreamSummariesAsync(
int userId,
[EnumeratorCancellation] CancellationToken ct = default)
{
await foreach (var order in _db.Orders
.Where(o => o.UserId == userId)
.AsAsyncEnumerable()
.WithCancellation(ct))
{
// Span<T> — zero-copy string parsing of order reference
var refSpan = order.Reference.AsSpan();
var prefix = refSpan.Slice(0, 3); // e.g. "ORD"
yield return new OrderSummaryDto(order, prefix.ToString());
}
}
// ArrayPool — reuse buffer for serialisation
public async Task<byte[]> SerializeAsync(Order order)
{
var buffer = ArrayPool<byte>.Shared.Rent(8192);
try
{
using var ms = new MemoryStream(buffer);
await JsonSerializer.SerializeAsync(ms, order);
return ms.ToArray();
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
}
}The Performance Measurement Rule
Before applying any of these techniques: measure first.
Use BenchmarkDotNet for micro-benchmarks and dotMemory / dotnet-trace for production profiling. The question to answer is always: "Where is the CPU time spent? Where are the allocations coming from?"
These five techniques address the most common hot spots in .NET service code. But the most impactful optimisation in your specific service might be something completely different — a missing index, an N+1 query, or a misconfigured connection pool.
Optimise what the profiler tells you, not what a blog post tells you.
Summary
| Technique | What It Solves | When to Use |
|-----------|----------------|-------------|
| Span<T> / Memory<T> | String/buffer allocations in hot loops | Parsing, processing, string manipulation |
| ValueTask<T> | Task allocation on sync-fast paths | Cache layers, frequent async interfaces |
| ArrayPool<T> | LOH pressure from large temporary arrays | Buffer-heavy I/O, serialisation |
| Compiled Queries | LINQ translation overhead per query | High-frequency EF Core queries |
| IAsyncEnumerable<T> | Memory pressure from large result sets | Reports, exports, streaming APIs |
Enjoyed this article?
Explore the Backend Systems learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.