.NET Performance Optimization: 5 Tricks That Actually Work in Production

Performance optimisation in .NET is full of misleading advice. Blog posts tell you to "use StringBuilder" or "avoid boxing" — micro-optimisations that make zero measurable difference at the system level.

This article focuses on five techniques that genuinely move the needle: measurable allocations eliminated, real GC pressure reduced, latency percentiles that improve. All are battle-tested in production .NET services.

1. `Span<T>` and `Memory<T>` — Zero-Copy String and Buffer Processing

Every time you call string.Substring(), string.Split(), or string.Trim(), .NET allocates a new string object on the heap. In tight loops processing HTTP request paths, CSV rows, or log lines, this generates enormous GC pressure.

Span<T> is a stack-allocated struct that represents a contiguous region of memory — without copying it.

The Problem

// Allocates a new string for every call — 100k req/s = 100k string allocations/s
public string ExtractUserId(string header)
{
    // "Bearer eyJhbGciOi..." → "eyJhbGciOi..."
    return header.Substring(7);
}

The Fix

// Zero allocation — returns a view into the original string's memory
public ReadOnlySpan<char> ExtractUserId(ReadOnlySpan<char> header)
{
    return header.Slice(7);
}

// Or using the modern string.AsSpan() API
public bool TryExtractUserId(string header, out ReadOnlySpan<char> token)
{
    var span = header.AsSpan();
    if (!span.StartsWith("Bearer ", StringComparison.Ordinal))
    {
        token = ReadOnlySpan<char>.Empty;
        return false;
    }
    token = span.Slice(7);
    return true;
}

When to Use `Memory<T>` Instead

Span<T> lives on the stack — it cannot be stored as a field or used across await boundaries. For async code, use Memory<T>:

// Span<T> — fine for synchronous, stack-limited processing
void ProcessSync(ReadOnlySpan<byte> buffer) { ... }

// Memory<T> — for async paths and stored references
async Task ProcessAsync(ReadOnlyMemory<byte> buffer)
{
    await _stream.WriteAsync(buffer);
}

Real Impact

In a request parsing pipeline processing 50,000 HTTP requests/second, replacing Substring-heavy header parsing with Span<T> can reduce:

Gen0 GC collections: by 40-70%
p99 latency: 15-30% improvement (GC pauses gone)
Allocation rate: from ~800 MB/s to near zero for the parsing path

2. `ValueTask` — Eliminating Allocations on Hot Async Paths

Task<T> is a reference type — every await on a non-cached task allocates a new Task object on the heap. In high-frequency async code (middleware, caching layers, hot API endpoints), this adds up.

ValueTask<T> is a struct — when the result is synchronously available (cache hit, completed I/O), it allocates nothing.

The Pattern

public interface IUserCache
{
    // Use ValueTask when the result is often available synchronously
    ValueTask<User?> GetUserAsync(string id, CancellationToken ct = default);
}

public class UserCache : IUserCache
{
    private readonly IMemoryCache _cache;
    private readonly IUserRepository _repo;

    public async ValueTask<User?> GetUserAsync(string id, CancellationToken ct = default)
    {
        // Cache hit path: synchronous, zero allocation
        if (_cache.TryGetValue(id, out User? cached))
            return cached;

        // Cache miss path: allocates normally (but this is the slow path anyway)
        var user = await _repo.GetByIdAsync(id, ct);
        if (user is not null)
            _cache.Set(id, user, TimeSpan.FromMinutes(5));
        return user;
    }
}

The Rules for ValueTask

ValueTask<T> has constraints that Task<T> doesn't:

// ✅ Await once and discard
var user = await cache.GetUserAsync(id);

// ✅ Check IsCompleted and GetResult() for synchronous result
var vt = cache.GetUserAsync(id);
var user = vt.IsCompleted
    ? vt.GetAwaiter().GetResult()
    : await vt;

// ❌ Never await multiple times
var vt = cache.GetUserAsync(id);
var r1 = await vt;
var r2 = await vt; // undefined behaviour

// ❌ Never store and await later (unless you .AsTask() it)
_storedTask = cache.GetUserAsync(id); // don't do this

When to Use ValueTask

Use ValueTask<T> when:

The hot path returns synchronously (cache hits, in-memory lookups)
You're implementing an interface that callers might call millions of times
You've profiled and confirmed Task allocation is measurable

Do NOT use ValueTask<T> when:

The method always goes async (DB calls, network calls with no cache)
The added complexity isn't justified by measurement

3. `ArrayPool<T>` — Reuse Buffers Instead of Allocating

Any time you need a temporary byte[] or char[] buffer — for serialisation, encoding, reading from a stream — the naive approach allocates a new array. Large arrays go directly to the Large Object Heap (LOH), which is collected infrequently and causes fragmentation.

ArrayPool<T>.Shared is a thread-safe object pool for arrays.

The Problem

// Allocates 64KB on the LOH on every request
public async Task<string> ReadBodyAsync(HttpRequest request)
{
    var buffer = new byte[65536]; // LOH allocation — expensive to collect
    var bytesRead = await request.Body.ReadAsync(buffer);
    return Encoding.UTF8.GetString(buffer, 0, bytesRead);
}

The Fix

public async Task<string> ReadBodyAsync(HttpRequest request)
{
    var buffer = ArrayPool<byte>.Shared.Rent(65536); // reused from pool
    try
    {
        var bytesRead = await request.Body.ReadAsync(buffer.AsMemory());
        return Encoding.UTF8.GetString(buffer, 0, bytesRead);
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(buffer, clearArray: false);
        // clearArray: true only if buffer contained sensitive data
    }
}

Using with `RecyclableMemoryStream`

For stream-heavy code, combine ArrayPool with Microsoft.IO.RecyclableMemoryStream:

Bash

dotnet add package Microsoft.IO.RecyclableMemoryStream

// In DI setup
services.AddSingleton<RecyclableMemoryStreamManager>();

// Usage — pooled stream, no LOH fragmentation
public async Task<byte[]> SerializeAsync<T>(T obj)
{
    await using var ms = _streamManager.GetStream();
    await JsonSerializer.SerializeAsync(ms, obj);
    return ms.ToArray();
}

RecyclableMemoryStream is used internally by ASP.NET Core and Microsoft Orleans — it eliminates LOH pressure from stream-heavy code almost entirely.

4. EF Core Compiled Queries — Eliminate LINQ-to-SQL Translation Overhead

Every time EF Core executes a LINQ query, it:

Parses the expression tree
Translates it to SQL
Validates the model
Caches the result (with an internal cache that has bounded size)

For hot queries called thousands of times per second, this translation overhead is measurable. Compiled queries pre-translate the query once and cache it permanently.

The Problem

// This translates the expression tree on EVERY call
public async Task<Order?> GetOrderAsync(int orderId)
{
    return await _db.Orders
        .Include(o => o.Items)
        .FirstOrDefaultAsync(o => o.Id == orderId);
}

The Fix

// Compiled at startup — zero translation overhead at runtime
public static class CompiledQueries
{
    // EF.CompileAsyncQuery returns a delegate — call it with DbContext + parameters
    public static readonly Func<AppDbContext, int, Task<Order?>> GetOrderById =
        EF.CompileAsyncQuery((AppDbContext db, int id) =>
            db.Orders
              .Include(o => o.Items)
              .FirstOrDefault(o => o.Id == id));

    public static readonly Func<AppDbContext, int, IAsyncEnumerable<Order>> GetOrdersByUser =
        EF.CompileAsyncQuery((AppDbContext db, int userId) =>
            db.Orders
              .Where(o => o.UserId == userId)
              .OrderByDescending(o => o.CreatedAt));
}

// Repository usage
public async Task<Order?> GetOrderAsync(int orderId)
{
    return await CompiledQueries.GetOrderById(_db, orderId);
}

public IAsyncEnumerable<Order> GetOrdersByUserAsync(int userId)
{
    return CompiledQueries.GetOrdersByUser(_db, userId);
}

Benchmark Results (BenchmarkDotNet)

| Method              | Mean       | Allocated |
|---------------------|-----------|-----------|
| StandardQuery       | 1,842 μs  | 18.4 KB   |
| CompiledQuery       |   312 μs  |  2.1 KB   |

~6x faster and ~9x less allocation for a simple query called at high frequency. On a service doing 10,000 queries/second, this eliminates ~165MB/s of allocations.

When Compiled Queries Shine

High-frequency queries: login lookups, product detail fetches, API key validation
Queries with complex LINQ (multiple includes, conditional filters)
Microservices where the DB is on the hot path for every request

Limitation

Compiled queries don't support IQueryable composition — the query must be fully defined at compile time. For dynamic filtering (e.g., arbitrary search parameters), use raw SQL with FromSqlRaw or build the query conditionally and rely on EF's plan cache.

5. `IAsyncEnumerable<T>` — Stream Large Data Sets Without Buffering

When you await a method returning Task<List<T>>, the entire result set must be materialised in memory before the first item is returned to the caller. For large data sets — reports, exports, paginated feeds — this wastes memory and delays first-byte time.

IAsyncEnumerable<T> streams items one at a time, as they arrive from the database.

The Problem

// Loads ALL 50,000 orders into memory before returning
public async Task<List<Order>> GetAllOrdersAsync(int userId)
{
    return await _db.Orders
        .Where(o => o.UserId == userId)
        .ToListAsync(); // materialises everything
}

// Caller also holds the whole list
var orders = await GetAllOrdersAsync(userId);
await WriteToResponseAsync(orders);

The Fix

// Streams orders — only one batch in memory at a time
public async IAsyncEnumerable<Order> StreamOrdersAsync(
    int userId,
    [EnumeratorCancellation] CancellationToken ct = default)
{
    await foreach (var order in _db.Orders
        .Where(o => o.UserId == userId)
        .AsAsyncEnumerable()
        .WithCancellation(ct))
    {
        yield return order;
    }
}

// API controller that streams JSON response
[HttpGet("orders/export")]
public async IAsyncEnumerable<OrderDto> ExportOrders(
    [FromQuery] int userId,
    [EnumeratorCancellation] CancellationToken ct)
{
    await foreach (var order in _service.StreamOrdersAsync(userId, ct))
    {
        yield return new OrderDto(order);
    }
    // ASP.NET Core streams JSON array items as they arrive
    // First item delivered to client in ~10ms, not after 50,000 rows load
}

Memory Impact

For 50,000 orders at ~2KB per order:

| Approach | Peak Memory | First-Byte Time | |----------|------------|----------------| | ToListAsync | ~100 MB | ~4,000ms | | IAsyncEnumerable | ~50 KB | ~12ms |

The streaming approach uses 2,000x less memory and delivers the first item 333x faster.

Cancellation Is Critical

Always propagate CancellationToken with [EnumeratorCancellation]:

// If the HTTP client disconnects, the DB query is cancelled immediately
public async IAsyncEnumerable<Order> StreamOrdersAsync(
    [EnumeratorCancellation] CancellationToken ct = default)
{
    await foreach (var order in _db.Orders.AsAsyncEnumerable().WithCancellation(ct))
        yield return order;
}

Without cancellation, the query runs to completion even after the client disconnected.

Putting It All Together

Here's a production repository method that uses all five techniques:

public class OrderRepository
{
    // Compiled query — no LINQ translation overhead
    private static readonly Func<AppDbContext, int, Task<Order?>> _getOrderById =
        EF.CompileAsyncQuery((AppDbContext db, int id) =>
            db.Orders.Include(o => o.Items).FirstOrDefault(o => o.Id == id));

    // ValueTask — sync cache hit = zero allocation
    public async ValueTask<Order?> GetByIdAsync(int id, CancellationToken ct = default)
    {
        if (_cache.TryGetValue(id, out Order? cached))
            return cached; // synchronous path, no Task allocation

        return await _getOrderById(_db, id);
    }

    // IAsyncEnumerable — stream large result sets
    public async IAsyncEnumerable<OrderSummaryDto> StreamSummariesAsync(
        int userId,
        [EnumeratorCancellation] CancellationToken ct = default)
    {
        await foreach (var order in _db.Orders
            .Where(o => o.UserId == userId)
            .AsAsyncEnumerable()
            .WithCancellation(ct))
        {
            // Span<T> — zero-copy string parsing of order reference
            var refSpan = order.Reference.AsSpan();
            var prefix  = refSpan.Slice(0, 3);   // e.g. "ORD"

            yield return new OrderSummaryDto(order, prefix.ToString());
        }
    }

    // ArrayPool — reuse buffer for serialisation
    public async Task<byte[]> SerializeAsync(Order order)
    {
        var buffer = ArrayPool<byte>.Shared.Rent(8192);
        try
        {
            using var ms = new MemoryStream(buffer);
            await JsonSerializer.SerializeAsync(ms, order);
            return ms.ToArray();
        }
        finally
        {
            ArrayPool<byte>.Shared.Return(buffer);
        }
    }
}

The Performance Measurement Rule

Before applying any of these techniques: measure first.

Use BenchmarkDotNet for micro-benchmarks and dotMemory / dotnet-trace for production profiling. The question to answer is always: "Where is the CPU time spent? Where are the allocations coming from?"

These five techniques address the most common hot spots in .NET service code. But the most impactful optimisation in your specific service might be something completely different — a missing index, an N+1 query, or a misconfigured connection pool.

Optimise what the profiler tells you, not what a blog post tells you.

Summary

| Technique | What It Solves | When to Use | |-----------|----------------|-------------| | Span<T> / Memory<T> | String/buffer allocations in hot loops | Parsing, processing, string manipulation | | ValueTask<T> | Task allocation on sync-fast paths | Cache layers, frequent async interfaces | | ArrayPool<T> | LOH pressure from large temporary arrays | Buffer-heavy I/O, serialisation | | Compiled Queries | LINQ translation overhead per query | High-frequency EF Core queries | | IAsyncEnumerable<T> | Memory pressure from large result sets | Reports, exports, streaming APIs |

.NET Performance Optimization: 5 Tricks That Actually Work in Production

1. `Span<T>` and `Memory<T>` — Zero-Copy String and Buffer Processing

The Problem

The Fix

When to Use `Memory<T>` Instead

Real Impact

2. `ValueTask` — Eliminating Allocations on Hot Async Paths

The Pattern

The Rules for ValueTask

When to Use ValueTask

3. `ArrayPool<T>` — Reuse Buffers Instead of Allocating

The Problem

The Fix

Using with `RecyclableMemoryStream`

4. EF Core Compiled Queries — Eliminate LINQ-to-SQL Translation Overhead

The Problem

The Fix

Benchmark Results (BenchmarkDotNet)

When Compiled Queries Shine

Limitation

5. `IAsyncEnumerable<T>` — Stream Large Data Sets Without Buffering

The Problem

The Fix

Memory Impact

Cancellation Is Critical

Putting It All Together

The Performance Measurement Rule

Summary

Enjoyed this article?

Leave a comment

1. Span<T> and Memory<T> — Zero-Copy String and Buffer Processing

The Problem

The Fix

When to Use Memory<T> Instead

Real Impact

2. ValueTask — Eliminating Allocations on Hot Async Paths

The Pattern

The Rules for ValueTask

When to Use ValueTask

3. ArrayPool<T> — Reuse Buffers Instead of Allocating

The Problem

The Fix

Using with RecyclableMemoryStream

4. EF Core Compiled Queries — Eliminate LINQ-to-SQL Translation Overhead

The Problem

The Fix

Benchmark Results (BenchmarkDotNet)

When Compiled Queries Shine

Limitation

5. IAsyncEnumerable<T> — Stream Large Data Sets Without Buffering

The Problem

The Fix

Memory Impact

Cancellation Is Critical

Putting It All Together

The Performance Measurement Rule

Summary

Enjoyed this article?

Leave a comment

1. `Span<T>` and `Memory<T>` — Zero-Copy String and Buffer Processing

When to Use `Memory<T>` Instead

2. `ValueTask` — Eliminating Allocations on Hot Async Paths

3. `ArrayPool<T>` — Reuse Buffers Instead of Allocating

Using with `RecyclableMemoryStream`

5. `IAsyncEnumerable<T>` — Stream Large Data Sets Without Buffering