Learnixo
Back to blog
Backend Systemsintermediate

BenchmarkDotNet β€” Measure Performance and Allocations in .NET

Use BenchmarkDotNet to write reliable microbenchmarks in C#: measure throughput, latency, and memory allocations, compare implementations, interpret results correctly, and avoid common benchmarking mistakes.

Asma Hafeez KhanMay 26, 20268 min read
.NETC#PerformanceBenchmarkDotNetProfilingAllocationsBenchmarking
Share:𝕏

BenchmarkDotNet β€” Measure Performance and Allocations in .NET

The rule of performance work is: measure first, optimise second, measure again. BenchmarkDotNet is the standard tool for measuring .NET code. It handles JIT warm-up, statistical analysis, memory allocation tracking, and comparison across implementations β€” things a manual Stopwatch will get completely wrong.

What you'll learn:

  • Writing your first benchmark
  • Memory allocation diagnostics
  • Comparing multiple implementations
  • Parameterised benchmarks
  • Interpreting results correctly
  • Common benchmarking mistakes to avoid
  • Running benchmarks in CI

Setup

Bash
dotnet new console -n MyApp.Benchmarks
cd MyApp.Benchmarks
dotnet add package BenchmarkDotNet

BenchmarkDotNet benchmarks must run in Release mode β€” it enforces this. Debug mode gives meaningless numbers.


1. Your First Benchmark

C#
// Benchmarks/StringBenchmarks.cs
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]           // track allocations
[RankColumn]                // rank results best→worst
public class StringBenchmarks
{
    private readonly string[] _words = Enumerable
        .Range(0, 1000)
        .Select(i => $"word{i}")
        .ToArray();

    [Benchmark(Baseline = true)]
    public string StringConcat()
    {
        string result = "";
        foreach (var word in _words)
            result += word + " ";
        return result;
    }

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new System.Text.StringBuilder();
        foreach (var word in _words)
            sb.Append(word).Append(' ');
        return sb.ToString();
    }

    [Benchmark]
    public string StringJoin() => string.Join(" ", _words);

    [Benchmark]
    public string StringCreate()
    {
        int totalLength = _words.Sum(w => w.Length + 1);
        return string.Create(totalLength, _words, (span, words) =>
        {
            int pos = 0;
            foreach (var word in words)
            {
                word.AsSpan().CopyTo(span[pos..]);
                pos += word.Length;
                span[pos++] = ' ';
            }
        });
    }
}

// Program.cs
BenchmarkRunner.Run<StringBenchmarks>();
Bash
dotnet run -c Release

Typical output

| Method        | Mean       | Error    | StdDev   | Ratio | Rank | Allocated |
|-------------- |-----------:|---------:|---------:|------:|-----:|----------:|
| StringConcat  | 2,341.3 us | 21.45 us | 20.06 us |  1.00 |    4 | 3,906 KB  |
| StringBuilder |    52.1 us |  0.41 us |  0.38 us |  0.02 |    2 |    88 KB  |
| StringJoin    |    48.7 us |  0.32 us |  0.30 us |  0.02 |    1 |    88 KB  |
| StringCreate  |    49.2 us |  0.29 us |  0.27 us |  0.02 |    3 |    16 KB  |

StringConcat allocates 44x more memory than StringJoin and takes 48x longer. Without the benchmark you'd have to trust intuition β€” here you have numbers.


2. MemoryDiagnoser β€” Allocation Tracking

[MemoryDiagnoser] adds the Allocated column. It reports the total bytes allocated per operation (GC-collected objects included). This is the most important column for hot-path optimisation.

C#
[MemoryDiagnoser]
public class ParseBenchmarks
{
    private const string CsvLine = "2026-05-26,ORDER-123,450.00,GBP";

    [Benchmark(Baseline = true)]
    public (DateTime, string, decimal, string) ParseWithSubstring()
    {
        var parts = CsvLine.Split(',');  // allocates string[]
        return (
            DateTime.Parse(parts[0]),   // allocates string
            parts[1],
            decimal.Parse(parts[2]),
            parts[3]
        );
    }

    [Benchmark]
    public (DateTime, string, decimal, string) ParseWithSpan()
    {
        var span = CsvLine.AsSpan();
        // No string allocations during parsing
        int i1 = span.IndexOf(',');
        var datePart = span[..i1];

        span = span[(i1 + 1)..];
        int i2 = span.IndexOf(',');
        var orderPart = span[..i2];

        span = span[(i2 + 1)..];
        int i3 = span.IndexOf(',');
        var amountPart = span[..i3];
        var currencyPart = span[(i3 + 1)..];

        return (
            DateTime.Parse(datePart),
            orderPart.ToString(),       // allocate string only when storing
            decimal.Parse(amountPart),
            currencyPart.ToString()
        );
    }
}
| Method          | Mean     | Allocated |
|---------------- |---------:|----------:|
| ParseWithSubstr | 312.4 ns |   472 B   |
| ParseWithSpan   |  98.3 ns |    96 B   |

5x fewer allocations, 3x faster β€” and the only change was avoiding Split and intermediate substrings.


3. Parameters β€” Testing Multiple Inputs

C#
[MemoryDiagnoser]
public class CollectionBenchmarks
{
    [Params(10, 100, 1000, 10_000)]
    public int N;

    private int[] _data = null!;

    [GlobalSetup]
    public void Setup()
    {
        _data = Enumerable.Range(0, N).ToArray();
    }

    [Benchmark(Baseline = true)]
    public List<int> LinqSelect() =>
        _data.Select(x => x * 2).ToList();

    [Benchmark]
    public List<int> ForLoop()
    {
        var result = new List<int>(_data.Length);
        for (int i = 0; i < _data.Length; i++)
            result.Add(_data[i] * 2);
        return result;
    }

    [Benchmark]
    public int[] ArrayFor()
    {
        var result = new int[_data.Length];
        for (int i = 0; i < _data.Length; i++)
            result[i] = _data[i] * 2;
        return result;
    }
}

BenchmarkDotNet runs every method with every parameter combination, giving you a matrix of results. At small N the differences are negligible; at large N the allocations of List vs Array become visible.


4. GlobalSetup and IterationSetup

C#
public class JsonBenchmarks
{
    private string _json = null!;
    private OrderDto _order = null!;

    [GlobalSetup]               // runs once before all benchmarks
    public void Setup()
    {
        _order = new OrderDto(Guid.NewGuid(), "CUST-1", 10, 450.00m, "GBP");
        _json = JsonSerializer.Serialize(_order);
    }

    [Benchmark]
    public string Serialize() => JsonSerializer.Serialize(_order);

    [Benchmark]
    public OrderDto? Deserialize() => JsonSerializer.Deserialize<OrderDto>(_json);

    [Benchmark]
    public string SerializeWithOptions() =>
        JsonSerializer.Serialize(_order, CachedOptions.Default);

    // Cache the options β€” creating JsonSerializerOptions is expensive
    private static class CachedOptions
    {
        public static readonly JsonSerializerOptions Default = new()
        {
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
            WriteIndented = false,
        };
    }
}

[GlobalSetup] prevents setup work from being measured. [IterationSetup] runs before each iteration (use sparingly β€” it disrupts statistical accuracy for very fast benchmarks).


5. Comparing EF Core Query Strategies

C#
[MemoryDiagnoser]
public class EfCoreBenchmarks
{
    private AppDbContext _context = null!;

    [GlobalSetup]
    public void Setup()
    {
        var options = new DbContextOptionsBuilder<AppDbContext>()
            .UseNpgsql("Host=localhost;Database=bench;Username=app;Password=secret")
            .Options;
        _context = new AppDbContext(options);
    }

    [Benchmark(Baseline = true)]
    public async Task<List<OrderSummaryDto>> NormalQuery()
    {
        return await _context.Orders
            .Include(o => o.Lines)
            .Select(o => new OrderSummaryDto(o.Id, o.Total, o.Lines.Count))
            .ToListAsync();
    }

    [Benchmark]
    public async Task<List<OrderSummaryDto>> CompiledQuery()
    {
        return await _compiledQuery(_context).ToListAsync();
    }

    private static readonly Func<AppDbContext, IAsyncEnumerable<OrderSummaryDto>>
        _compiledQuery = EF.CompileAsyncQuery((AppDbContext ctx) =>
            ctx.Orders
                .Select(o => new OrderSummaryDto(o.Id, o.Total, o.Lines.Count)));

    [Benchmark]
    public async Task<List<OrderSummaryDto>> RawSql()
    {
        return await _context.Database
            .SqlQuery<OrderSummaryDto>(
                $"SELECT id, total, (SELECT COUNT(*) FROM order_lines WHERE order_id = o.id) AS line_count FROM orders o")
            .ToListAsync();
    }

    [GlobalCleanup]
    public void Cleanup() => _context.Dispose();
}

6. Interpreting Results Correctly

Mean vs Median

BenchmarkDotNet reports Mean by default. For latency measurements, outliers (GC pauses, OS interrupts) inflate the mean. Add [Outlier] attribute to see the distribution:

C#
[RPlotExporter]      // generates distribution charts
[HtmlExporter]       // HTML report
[StatisticalTestColumn]  // statistical significance
public class MyBenchmarks { }

Error and StdDev

| Method | Mean    | Error   | StdDev  |
|------- |--------:|--------:|--------:|
| Fast   | 10.1 ns | 0.05 ns | 0.04 ns |
| Slow   | 12.3 ns | 2.41 ns | 2.25 ns |

StdDev of 2.25 ns on a mean of 12.3 ns means the measurement is unstable (18% relative deviation). Likely causes: GC interference, memory pressure, non-deterministic branching. Run longer or investigate what's causing the variance.

Ratio

The Ratio column (from [Baseline = true]) shows relative performance. 0.02 means 50x faster. This is more meaningful than absolute time because it's machine-independent.


7. Common Mistakes

Running without -c Release: Debug builds are 3–10x slower and have no JIT optimisations. Numbers from debug mode are meaningless. BenchmarkDotNet validates this β€” it refuses to run in Debug mode.

Benchmarking setup work inside the benchmark method:

C#
// Wrong β€” measures object creation, not the method under test
[Benchmark]
public string Wrong()
{
    var data = GenerateTestData();  // this is being measured too
    return Process(data);
}

// Correct
private string _data = null!;

[GlobalSetup]
public void Setup() => _data = GenerateTestData();

[Benchmark]
public string Correct() => Process(_data);

Dead code elimination: The JIT can eliminate code that has no observable side effects. BenchmarkDotNet's harness prevents this, but be careful with micro-benchmarks that compute but don't return:

C#
// Wrong β€” JIT may eliminate the computation
[Benchmark]
public void Wrong()
{
    int sum = 0;
    for (int i = 0; i < 1000; i++)
        sum += i;
    // sum is never used β€” JIT eliminates the loop
}

// Correct β€” return the result
[Benchmark]
public int Correct()
{
    int sum = 0;
    for (int i = 0; i < 1000; i++)
        sum += i;
    return sum;
}

Benchmarking too small a unit: A benchmark that runs in 1–2 nanoseconds is measuring noise, not your code. Keep benchmarks in the 10ns–10ms range. For sub-nanosecond operations, benchmark larger batches.


8. Running in CI

Benchmarks are slow (minutes, not seconds) β€” don't run them on every commit. Instead:

YAML
# Run on schedule or when performance-related files change
on:
  schedule:
    - cron: '0 2 * * 1'  # weekly, Monday 2am
  paths:
    - 'src/MyApp.Core/**'

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with: { dotnet-version: '9.0.x' }
      - run: dotnet run --project benchmarks/MyApp.Benchmarks -c Release -- --exporters json
      - uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: BenchmarkDotNet.Artifacts/

For regression detection, compare the JSON output between runs using github-action-benchmark or a custom comparison script.

Enjoyed this article?

Explore the Backend Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.