.NET & C# Development · Lesson 68 of 92
Measure Before You Optimize — BenchmarkDotNet in Practice
Why You Need Benchmarks
Stopwatch in a console app is not a benchmark. It doesn't control for JIT warm-up, CPU frequency scaling, GC interference, or branch prediction. It will mislead you.
BenchmarkDotNet handles all of that:
- Multiple iterations with statistical analysis (mean, stddev, confidence intervals)
- Separate warmup and measurement phases
- GC collection counts per operation
- Memory allocation tracking per operation
Install:
dotnet add package BenchmarkDotNetYour First Benchmark
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class StringConcatBenchmarks
{
private const int Iterations = 1000;
[Benchmark(Baseline = true)]
public string PlusOperator()
{
string result = "";
for (int i = 0; i < Iterations; i++)
result += i.ToString();
return result;
}
[Benchmark]
public string StringBuilderConcat()
{
var sb = new StringBuilder();
for (int i = 0; i < Iterations; i++)
sb.Append(i);
return sb.ToString();
}
[Benchmark]
public string StringJoin() =>
string.Join("", Enumerable.Range(0, Iterations));
}
// Entry point
BenchmarkRunner.Run<StringConcatBenchmarks>();Run in Release mode only. Debug mode disables optimisations and makes results meaningless.
dotnet run -c ReleaseGlobalSetup and IterationSetup
Use [GlobalSetup] for one-time initialisation (runs once before all iterations). Use [IterationSetup] when each iteration needs a fresh state (runs before every measurement iteration).
[MemoryDiagnoser]
public class SearchBenchmarks
{
private int[] _data = null!;
private HashSet<int> _hashSet = null!;
private const int Target = 750_000;
[GlobalSetup]
public void Setup()
{
_data = Enumerable.Range(0, 1_000_000).ToArray();
_hashSet = new HashSet<int>(_data);
}
[Benchmark(Baseline = true)]
public bool LinearSearch() => Array.IndexOf(_data, Target) >= 0;
[Benchmark]
public bool BinarySearch() => Array.BinarySearch(_data, Target) >= 0;
[Benchmark]
public bool HashSetLookup() => _hashSet.Contains(Target);
}[IterationSetup] example — you need a fresh list for each sort measurement:
private int[] _unsorted = null!;
[IterationSetup]
public void ResetData() => _unsorted = Enumerable.Range(0, 10_000).Reverse().ToArray();
[Benchmark]
public void Sort() => Array.Sort(_unsorted);MemoryDiagnoser — Allocation Tracking
[MemoryDiagnoser] adds three columns to the output:
- Allocated — bytes allocated on the managed heap per operation
- Gen0/Gen1/Gen2 — GC collections per 1000 operations
[MemoryDiagnoser]
public class MappingBenchmarks
{
private readonly Order _order = new Order
{
Id = 1, CustomerId = "C001",
Items = Enumerable.Range(1, 10)
.Select(i => new OrderItem { ProductId = i, Qty = 2, Price = 9.99m })
.ToList()
};
[Benchmark(Baseline = true)]
public OrderDto AutoMapperMap() => _autoMapper.Map<OrderDto>(_order);
[Benchmark]
public OrderDto MapsterMap() => _order.Adapt<OrderDto>();
[Benchmark]
public OrderDto ManualMap() => _order.ToDto();
}Typical output:
| Method | Mean | Allocated |
|--------------|-----------|-----------|
| AutoMapperMap| 1,840 ns | 312 B |
| MapsterMap | 620 ns | 184 B |
| ManualMap | 210 ns | 96 B |Comparing Multiple Implementations
Use [Params] to drive the same benchmark with different input sizes:
[MemoryDiagnoser]
public class FilterBenchmarks
{
[Params(100, 10_000, 1_000_000)]
public int Size { get; set; }
private int[] _data = null!;
[GlobalSetup]
public void Setup() => _data = Enumerable.Range(0, Size).ToArray();
[Benchmark(Baseline = true)]
public int[] LinqWhere() => _data.Where(x => x % 2 == 0).ToArray();
[Benchmark]
public int[] ForLoop()
{
var result = new List<int>(Size / 2);
for (int i = 0; i < _data.Length; i++)
if (_data[i] % 2 == 0) result.Add(_data[i]);
return result.ToArray();
}
[Benchmark]
public int[] SpanLoop()
{
var span = _data.AsSpan();
var result = new List<int>(Size / 2);
for (int i = 0; i < span.Length; i++)
if (span[i] % 2 == 0) result.Add(span[i]);
return result.ToArray();
}
}BenchmarkDotNet runs every combination of [Params] x [Benchmark] and presents a single table.
Reading the Results
| Method | Size | Mean | Error | StdDev | Gen0 | Allocated |
|-----------|---------|-------------|-----------|-----------|---------|-----------|
| LinqWhere | 100 | 1.82 μs | 0.02 μs | 0.02 μs | 0.0572 | 480 B |
| ForLoop | 100 | 0.64 μs | 0.01 μs | 0.01 μs | 0.0334 | 280 B |
| SpanLoop | 100 | 0.61 μs | 0.01 μs | 0.01 μs | 0.0334 | 280 B |
| LinqWhere | 1000000 | 4,810.00 μs| 38.4 μs | 35.9 μs | 62.5 | 4,000 KB |
| ForLoop | 1000000 | 1,940.00 μs| 14.2 μs | 13.3 μs | 31.2 | 2,000 KB |- Mean — average time per single operation call.
- Error — half the 99.9% confidence interval. If Error > 10% of Mean, your machine was too noisy.
- Gen0 / Gen1 / Gen2 — GC collections per 1000 operations. High Gen0 is usually fine; Gen2 is expensive.
- Allocated — bytes per operation. Zero is ideal for hot paths.
Integrating Into CI as a Regression Check
Use BenchmarkDotNet.Exporters to write results to JSON and fail the build if performance regresses beyond a threshold:
[Config(typeof(BenchmarkConfig))]
public class MyBenchmarks { ... }
public class BenchmarkConfig : ManualConfig
{
public BenchmarkConfig()
{
AddExporter(JsonExporter.Full);
AddDiagnoser(MemoryDiagnoser.Default);
WithArtifactsPath("BenchmarkResults");
}
}In CI (GitHub Actions example):
- name: Run benchmarks
run: dotnet run -c Release --project tests/Benchmarks
- name: Check regression
run: |
python scripts/check_regression.py \
--baseline benchmarks/baseline.json \
--current BenchmarkResults/results.json \
--threshold 1.15 # fail if 15% slower than baselineA simpler approach: commit BenchmarkResults/ to a branch and use the BenchmarkDotNet --diff tool against the last run. Any regression blocks the PR.
Rules
- Always run in Release mode. The
[Benchmark]runner will warn you if you forget. - Avoid I/O in benchmarks — you're measuring disk/network, not your code.
- Make sure each benchmark returns a value or is consumed — the JIT will optimise away dead code otherwise.
- Run on a quiet machine. Cloud CI agents with shared CPUs produce noisy numbers. Use
[SimpleJob(RuntimeMoniker.Net90)]with multiple invocation counts for stability. - Benchmark the thing you're about to optimise, not a synthetic toy. If your real code sorts a list of 50 items, don't benchmark a list of 1 million.