.NET & C# Development · Lesson 68 of 92

Measure Before You Optimize — BenchmarkDotNet in Practice

Why You Need Benchmarks

Stopwatch in a console app is not a benchmark. It doesn't control for JIT warm-up, CPU frequency scaling, GC interference, or branch prediction. It will mislead you.

BenchmarkDotNet handles all of that:

Multiple iterations with statistical analysis (mean, stddev, confidence intervals)
Separate warmup and measurement phases
GC collection counts per operation
Memory allocation tracking per operation

Install:

Bash

dotnet add package BenchmarkDotNet

Your First Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public class StringConcatBenchmarks
{
    private const int Iterations = 1000;

    [Benchmark(Baseline = true)]
    public string PlusOperator()
    {
        string result = "";
        for (int i = 0; i < Iterations; i++)
            result += i.ToString();
        return result;
    }

    [Benchmark]
    public string StringBuilderConcat()
    {
        var sb = new StringBuilder();
        for (int i = 0; i < Iterations; i++)
            sb.Append(i);
        return sb.ToString();
    }

    [Benchmark]
    public string StringJoin() =>
        string.Join("", Enumerable.Range(0, Iterations));
}

// Entry point
BenchmarkRunner.Run<StringConcatBenchmarks>();

Run in Release mode only. Debug mode disables optimisations and makes results meaningless.

Bash

dotnet run -c Release

GlobalSetup and IterationSetup

Use [GlobalSetup] for one-time initialisation (runs once before all iterations). Use [IterationSetup] when each iteration needs a fresh state (runs before every measurement iteration).

[MemoryDiagnoser]
public class SearchBenchmarks
{
    private int[] _data = null!;
    private HashSet<int> _hashSet = null!;
    private const int Target = 750_000;

    [GlobalSetup]
    public void Setup()
    {
        _data    = Enumerable.Range(0, 1_000_000).ToArray();
        _hashSet = new HashSet<int>(_data);
    }

    [Benchmark(Baseline = true)]
    public bool LinearSearch() => Array.IndexOf(_data, Target) >= 0;

    [Benchmark]
    public bool BinarySearch() => Array.BinarySearch(_data, Target) >= 0;

    [Benchmark]
    public bool HashSetLookup() => _hashSet.Contains(Target);
}

[IterationSetup] example — you need a fresh list for each sort measurement:

private int[] _unsorted = null!;

[IterationSetup]
public void ResetData() => _unsorted = Enumerable.Range(0, 10_000).Reverse().ToArray();

[Benchmark]
public void Sort() => Array.Sort(_unsorted);

MemoryDiagnoser — Allocation Tracking

[MemoryDiagnoser] adds three columns to the output:

Allocated — bytes allocated on the managed heap per operation
Gen0/Gen1/Gen2 — GC collections per 1000 operations

[MemoryDiagnoser]
public class MappingBenchmarks
{
    private readonly Order _order = new Order
    {
        Id = 1, CustomerId = "C001",
        Items = Enumerable.Range(1, 10)
            .Select(i => new OrderItem { ProductId = i, Qty = 2, Price = 9.99m })
            .ToList()
    };

    [Benchmark(Baseline = true)]
    public OrderDto AutoMapperMap() => _autoMapper.Map<OrderDto>(_order);

    [Benchmark]
    public OrderDto MapsterMap() => _order.Adapt<OrderDto>();

    [Benchmark]
    public OrderDto ManualMap() => _order.ToDto();
}

Typical output:

| Method       | Mean      | Allocated |
|--------------|-----------|-----------|
| AutoMapperMap| 1,840 ns  | 312 B     |
| MapsterMap   | 620 ns    | 184 B     |
| ManualMap    | 210 ns    | 96 B      |

Comparing Multiple Implementations

Use [Params] to drive the same benchmark with different input sizes:

[MemoryDiagnoser]
public class FilterBenchmarks
{
    [Params(100, 10_000, 1_000_000)]
    public int Size { get; set; }

    private int[] _data = null!;

    [GlobalSetup]
    public void Setup() => _data = Enumerable.Range(0, Size).ToArray();

    [Benchmark(Baseline = true)]
    public int[] LinqWhere() => _data.Where(x => x % 2 == 0).ToArray();

    [Benchmark]
    public int[] ForLoop()
    {
        var result = new List<int>(Size / 2);
        for (int i = 0; i < _data.Length; i++)
            if (_data[i] % 2 == 0) result.Add(_data[i]);
        return result.ToArray();
    }

    [Benchmark]
    public int[] SpanLoop()
    {
        var span = _data.AsSpan();
        var result = new List<int>(Size / 2);
        for (int i = 0; i < span.Length; i++)
            if (span[i] % 2 == 0) result.Add(span[i]);
        return result.ToArray();
    }
}

BenchmarkDotNet runs every combination of [Params] x [Benchmark] and presents a single table.

Reading the Results

| Method    | Size    | Mean        | Error     | StdDev    | Gen0    | Allocated |
|-----------|---------|-------------|-----------|-----------|---------|-----------|
| LinqWhere | 100     |    1.82 μs  |  0.02 μs  |  0.02 μs  |  0.0572 |    480 B  |
| ForLoop   | 100     |    0.64 μs  |  0.01 μs  |  0.01 μs  |  0.0334 |    280 B  |
| SpanLoop  | 100     |    0.61 μs  |  0.01 μs  |  0.01 μs  |  0.0334 |    280 B  |
| LinqWhere | 1000000 |  4,810.00 μs|  38.4 μs  |  35.9 μs  |  62.5   |  4,000 KB |
| ForLoop   | 1000000 |  1,940.00 μs|  14.2 μs  |  13.3 μs  |  31.2   |  2,000 KB |

Mean — average time per single operation call.
Error — half the 99.9% confidence interval. If Error > 10% of Mean, your machine was too noisy.
Gen0 / Gen1 / Gen2 — GC collections per 1000 operations. High Gen0 is usually fine; Gen2 is expensive.
Allocated — bytes per operation. Zero is ideal for hot paths.

Integrating Into CI as a Regression Check

Use BenchmarkDotNet.Exporters to write results to JSON and fail the build if performance regresses beyond a threshold:

[Config(typeof(BenchmarkConfig))]
public class MyBenchmarks { ... }

public class BenchmarkConfig : ManualConfig
{
    public BenchmarkConfig()
    {
        AddExporter(JsonExporter.Full);
        AddDiagnoser(MemoryDiagnoser.Default);
        WithArtifactsPath("BenchmarkResults");
    }
}

In CI (GitHub Actions example):

YAML

- name: Run benchmarks
  run: dotnet run -c Release --project tests/Benchmarks
  
- name: Check regression
  run: |
    python scripts/check_regression.py \
      --baseline benchmarks/baseline.json \
      --current BenchmarkResults/results.json \
      --threshold 1.15   # fail if 15% slower than baseline

A simpler approach: commit BenchmarkResults/ to a branch and use the BenchmarkDotNet --diff tool against the last run. Any regression blocks the PR.

Rules

Always run in Release mode. The [Benchmark] runner will warn you if you forget.
Avoid I/O in benchmarks — you're measuring disk/network, not your code.
Make sure each benchmark returns a value or is consumed — the JIT will optimise away dead code otherwise.
Run on a quiet machine. Cloud CI agents with shared CPUs produce noisy numbers. Use [SimpleJob(RuntimeMoniker.Net90)] with multiple invocation counts for stability.
Benchmark the thing you're about to optimise, not a synthetic toy. If your real code sorts a list of 50 items, don't benchmark a list of 1 million.

Zero-Allocation Code — Span<T>, ArrayPool & Object Pooling

Next Lesson

Challenge: Profile & Fix This Slow OrderFlow Endpoint