BenchmarkDotNet — Measure Performance and Allocations in .NET

The rule of performance work is: measure first, optimise second, measure again. BenchmarkDotNet is the standard tool for measuring .NET code. It handles JIT warm-up, statistical analysis, memory allocation tracking, and comparison across implementations — things a manual Stopwatch will get completely wrong.

What you'll learn:

Writing your first benchmark
Memory allocation diagnostics
Comparing multiple implementations
Parameterised benchmarks
Interpreting results correctly
Common benchmarking mistakes to avoid
Running benchmarks in CI

Setup

Bash

dotnet new console -n MyApp.Benchmarks
cd MyApp.Benchmarks
dotnet add package BenchmarkDotNet

BenchmarkDotNet benchmarks must run in Release mode — it enforces this. Debug mode gives meaningless numbers.

1. Your First Benchmark

// Benchmarks/StringBenchmarks.cs
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]           // track allocations
[RankColumn]                // rank results best→worst
public class StringBenchmarks
{
    private readonly string[] _words = Enumerable
        .Range(0, 1000)
        .Select(i => $"word{i}")
        .ToArray();

    [Benchmark(Baseline = true)]
    public string StringConcat()
    {
        string result = "";
        foreach (var word in _words)
            result += word + " ";
        return result;
    }

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new System.Text.StringBuilder();
        foreach (var word in _words)
            sb.Append(word).Append(' ');
        return sb.ToString();
    }

    [Benchmark]
    public string StringJoin() => string.Join(" ", _words);

    [Benchmark]
    public string StringCreate()
    {
        int totalLength = _words.Sum(w => w.Length + 1);
        return string.Create(totalLength, _words, (span, words) =>
        {
            int pos = 0;
            foreach (var word in words)
            {
                word.AsSpan().CopyTo(span[pos..]);
                pos += word.Length;
                span[pos++] = ' ';
            }
        });
    }
}

// Program.cs
BenchmarkRunner.Run<StringBenchmarks>();

Bash

dotnet run -c Release

Typical output

| Method        | Mean       | Error    | StdDev   | Ratio | Rank | Allocated |
|-------------- |-----------:|---------:|---------:|------:|-----:|----------:|
| StringConcat  | 2,341.3 us | 21.45 us | 20.06 us |  1.00 |    4 | 3,906 KB  |
| StringBuilder |    52.1 us |  0.41 us |  0.38 us |  0.02 |    2 |    88 KB  |
| StringJoin    |    48.7 us |  0.32 us |  0.30 us |  0.02 |    1 |    88 KB  |
| StringCreate  |    49.2 us |  0.29 us |  0.27 us |  0.02 |    3 |    16 KB  |

StringConcat allocates 44x more memory than StringJoin and takes 48x longer. Without the benchmark you'd have to trust intuition — here you have numbers.

2. MemoryDiagnoser — Allocation Tracking

[MemoryDiagnoser] adds the Allocated column. It reports the total bytes allocated per operation (GC-collected objects included). This is the most important column for hot-path optimisation.

[MemoryDiagnoser]
public class ParseBenchmarks
{
    private const string CsvLine = "2026-05-26,ORDER-123,450.00,GBP";

    [Benchmark(Baseline = true)]
    public (DateTime, string, decimal, string) ParseWithSubstring()
    {
        var parts = CsvLine.Split(',');  // allocates string[]
        return (
            DateTime.Parse(parts[0]),   // allocates string
            parts[1],
            decimal.Parse(parts[2]),
            parts[3]
        );
    }

    [Benchmark]
    public (DateTime, string, decimal, string) ParseWithSpan()
    {
        var span = CsvLine.AsSpan();
        // No string allocations during parsing
        int i1 = span.IndexOf(',');
        var datePart = span[..i1];

        span = span[(i1 + 1)..];
        int i2 = span.IndexOf(',');
        var orderPart = span[..i2];

        span = span[(i2 + 1)..];
        int i3 = span.IndexOf(',');
        var amountPart = span[..i3];
        var currencyPart = span[(i3 + 1)..];

        return (
            DateTime.Parse(datePart),
            orderPart.ToString(),       // allocate string only when storing
            decimal.Parse(amountPart),
            currencyPart.ToString()
        );
    }
}

| Method          | Mean     | Allocated |
|---------------- |---------:|----------:|
| ParseWithSubstr | 312.4 ns |   472 B   |
| ParseWithSpan   |  98.3 ns |    96 B   |

5x fewer allocations, 3x faster — and the only change was avoiding Split and intermediate substrings.

3. Parameters — Testing Multiple Inputs

[MemoryDiagnoser]
public class CollectionBenchmarks
{
    [Params(10, 100, 1000, 10_000)]
    public int N;

    private int[] _data = null!;

    [GlobalSetup]
    public void Setup()
    {
        _data = Enumerable.Range(0, N).ToArray();
    }

    [Benchmark(Baseline = true)]
    public List<int> LinqSelect() =>
        _data.Select(x => x * 2).ToList();

    [Benchmark]
    public List<int> ForLoop()
    {
        var result = new List<int>(_data.Length);
        for (int i = 0; i < _data.Length; i++)
            result.Add(_data[i] * 2);
        return result;
    }

    [Benchmark]
    public int[] ArrayFor()
    {
        var result = new int[_data.Length];
        for (int i = 0; i < _data.Length; i++)
            result[i] = _data[i] * 2;
        return result;
    }
}

BenchmarkDotNet runs every method with every parameter combination, giving you a matrix of results. At small N the differences are negligible; at large N the allocations of List vs Array become visible.

4. GlobalSetup and IterationSetup

public class JsonBenchmarks
{
    private string _json = null!;
    private OrderDto _order = null!;

    [GlobalSetup]               // runs once before all benchmarks
    public void Setup()
    {
        _order = new OrderDto(Guid.NewGuid(), "CUST-1", 10, 450.00m, "GBP");
        _json = JsonSerializer.Serialize(_order);
    }

    [Benchmark]
    public string Serialize() => JsonSerializer.Serialize(_order);

    [Benchmark]
    public OrderDto? Deserialize() => JsonSerializer.Deserialize<OrderDto>(_json);

    [Benchmark]
    public string SerializeWithOptions() =>
        JsonSerializer.Serialize(_order, CachedOptions.Default);

    // Cache the options — creating JsonSerializerOptions is expensive
    private static class CachedOptions
    {
        public static readonly JsonSerializerOptions Default = new()
        {
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
            WriteIndented = false,
        };
    }
}

[GlobalSetup] prevents setup work from being measured. [IterationSetup] runs before each iteration (use sparingly — it disrupts statistical accuracy for very fast benchmarks).

5. Comparing EF Core Query Strategies

[MemoryDiagnoser]
public class EfCoreBenchmarks
{
    private AppDbContext _context = null!;

    [GlobalSetup]
    public void Setup()
    {
        var options = new DbContextOptionsBuilder<AppDbContext>()
            .UseNpgsql("Host=localhost;Database=bench;Username=app;Password=secret")
            .Options;
        _context = new AppDbContext(options);
    }

    [Benchmark(Baseline = true)]
    public async Task<List<OrderSummaryDto>> NormalQuery()
    {
        return await _context.Orders
            .Include(o => o.Lines)
            .Select(o => new OrderSummaryDto(o.Id, o.Total, o.Lines.Count))
            .ToListAsync();
    }

    [Benchmark]
    public async Task<List<OrderSummaryDto>> CompiledQuery()
    {
        return await _compiledQuery(_context).ToListAsync();
    }

    private static readonly Func<AppDbContext, IAsyncEnumerable<OrderSummaryDto>>
        _compiledQuery = EF.CompileAsyncQuery((AppDbContext ctx) =>
            ctx.Orders
                .Select(o => new OrderSummaryDto(o.Id, o.Total, o.Lines.Count)));

    [Benchmark]
    public async Task<List<OrderSummaryDto>> RawSql()
    {
        return await _context.Database
            .SqlQuery<OrderSummaryDto>(
                $"SELECT id, total, (SELECT COUNT(*) FROM order_lines WHERE order_id = o.id) AS line_count FROM orders o")
            .ToListAsync();
    }

    [GlobalCleanup]
    public void Cleanup() => _context.Dispose();
}

6. Interpreting Results Correctly

Mean vs Median

BenchmarkDotNet reports Mean by default. For latency measurements, outliers (GC pauses, OS interrupts) inflate the mean. Add [Outlier] attribute to see the distribution:

[RPlotExporter]      // generates distribution charts
[HtmlExporter]       // HTML report
[StatisticalTestColumn]  // statistical significance
public class MyBenchmarks { }

Error and StdDev

| Method | Mean    | Error   | StdDev  |
|------- |--------:|--------:|--------:|
| Fast   | 10.1 ns | 0.05 ns | 0.04 ns |
| Slow   | 12.3 ns | 2.41 ns | 2.25 ns |

StdDev of 2.25 ns on a mean of 12.3 ns means the measurement is unstable (18% relative deviation). Likely causes: GC interference, memory pressure, non-deterministic branching. Run longer or investigate what's causing the variance.

Ratio

The Ratio column (from [Baseline = true]) shows relative performance. 0.02 means 50x faster. This is more meaningful than absolute time because it's machine-independent.

7. Common Mistakes

Running without -c Release: Debug builds are 3–10x slower and have no JIT optimisations. Numbers from debug mode are meaningless. BenchmarkDotNet validates this — it refuses to run in Debug mode.

Benchmarking setup work inside the benchmark method:

// Wrong — measures object creation, not the method under test
[Benchmark]
public string Wrong()
{
    var data = GenerateTestData();  // this is being measured too
    return Process(data);
}

// Correct
private string _data = null!;

[GlobalSetup]
public void Setup() => _data = GenerateTestData();

[Benchmark]
public string Correct() => Process(_data);

Dead code elimination: The JIT can eliminate code that has no observable side effects. BenchmarkDotNet's harness prevents this, but be careful with micro-benchmarks that compute but don't return:

// Wrong — JIT may eliminate the computation
[Benchmark]
public void Wrong()
{
    int sum = 0;
    for (int i = 0; i < 1000; i++)
        sum += i;
    // sum is never used — JIT eliminates the loop
}

// Correct — return the result
[Benchmark]
public int Correct()
{
    int sum = 0;
    for (int i = 0; i < 1000; i++)
        sum += i;
    return sum;
}

Benchmarking too small a unit: A benchmark that runs in 1–2 nanoseconds is measuring noise, not your code. Keep benchmarks in the 10ns–10ms range. For sub-nanosecond operations, benchmark larger batches.

8. Running in CI

Benchmarks are slow (minutes, not seconds) — don't run them on every commit. Instead:

YAML

# Run on schedule or when performance-related files change
on:
  schedule:
    - cron: '0 2 * * 1'  # weekly, Monday 2am
  paths:
    - 'src/MyApp.Core/**'

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with: { dotnet-version: '9.0.x' }
      - run: dotnet run --project benchmarks/MyApp.Benchmarks -c Release -- --exporters json
      - uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: BenchmarkDotNet.Artifacts/

For regression detection, compare the JSON output between runs using github-action-benchmark or a custom comparison script.

BenchmarkDotNet — Measure Performance and Allocations in .NET

BenchmarkDotNet — Measure Performance and Allocations in .NET

Setup

1. Your First Benchmark

Typical output

2. MemoryDiagnoser — Allocation Tracking

3. Parameters — Testing Multiple Inputs

4. GlobalSetup and IterationSetup

5. Comparing EF Core Query Strategies

6. Interpreting Results Correctly

Mean vs Median

Error and StdDev

Ratio

7. Common Mistakes

8. Running in CI

Enjoyed this article?

Leave a comment