OpenTelemetry in .NET: Traces, Metrics, and Logs in One Stack

The Three Pillars of Observability

Traces — follow a request across services. Answer: where did the time go?
Metrics — aggregated measurements over time. Answer: how is the system performing?
Logs — timestamped events with context. Answer: what happened?

OpenTelemetry (OTel) is the CNCF standard for collecting all three. One SDK, one agent, any backend.

Setup

Bash

dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.EntityFrameworkCore
dotnet add package OpenTelemetry.Instrumentation.Runtime
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol  # OTLP (Jaeger, Grafana)
dotnet add package OpenTelemetry.Exporter.Prometheus.AspNetCore   # Prometheus scrape endpoint

Full Configuration

// Program.cs
var serviceName    = builder.Environment.ApplicationName;
var serviceVersion = "1.0.0";

var resourceBuilder = ResourceBuilder.CreateDefault()
    .AddService(serviceName, serviceVersion: serviceVersion)
    .AddTelemetrySdk()
    .AddEnvironmentVariableDetector();

builder.Services.AddOpenTelemetry()

    // ── Traces ──────────────────────────────────────────────────────────────
    .WithTracing(tracing => tracing
        .SetResourceBuilder(resourceBuilder)
        .AddAspNetCoreInstrumentation(options =>
        {
            options.RecordException = true;
            options.Filter = ctx => !ctx.Request.Path.StartsWithSegments("/health");
        })
        .AddHttpClientInstrumentation(options =>
        {
            options.RecordException = true;
        })
        .AddEntityFrameworkCoreInstrumentation(options =>
        {
            options.SetDbStatementForText = true;  // include SQL in spans (dev only)
        })
        .AddSource("OrderFlow.*")      // pick up custom ActivitySource
        .AddOtlpExporter(otlp =>
        {
            otlp.Endpoint = new Uri(builder.Configuration["Otlp:Endpoint"]!);
        }))

    // ── Metrics ─────────────────────────────────────────────────────────────
    .WithMetrics(metrics => metrics
        .SetResourceBuilder(resourceBuilder)
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()   // GC, threadpool, memory
        .AddMeter("OrderFlow.*")       // pick up custom Meter
        .AddOtlpExporter()
        .AddPrometheusExporter())      // /metrics endpoint for Prometheus

    // ── Logs ────────────────────────────────────────────────────────────────
    .WithLogging(logging => logging
        .SetResourceBuilder(resourceBuilder)
        .AddOtlpExporter());

// Wire up Prometheus scrape endpoint
app.MapPrometheusScrapingEndpoint(); // /metrics

Custom Traces (ActivitySource)

// Shared ActivitySource — one per service
public static class Telemetry
{
    public static readonly ActivitySource ActivitySource =
        new("OrderFlow.Orders", "1.0.0");
}

// In your service
public class OrderService
{
    public async Task<Order> CreateOrderAsync(CreateOrderCommand cmd, CancellationToken ct)
    {
        using var activity = Telemetry.ActivitySource.StartActivity("CreateOrder");
        activity?.SetTag("order.customerId", cmd.CustomerId);
        activity?.SetTag("order.lineCount",  cmd.Lines.Count);

        try
        {
            var order = await _repo.CreateAsync(cmd, ct);

            activity?.SetTag("order.id",    order.Id);
            activity?.SetTag("order.total", order.Total);
            activity?.SetStatus(ActivityStatusCode.Ok);

            return order;
        }
        catch (Exception ex)
        {
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            activity?.RecordException(ex);
            throw;
        }
    }
}

Custom Metrics

// Shared Meter
public class OrderMetrics
{
    private readonly Counter<long> _ordersCreated;
    private readonly Histogram<double> _orderProcessingMs;
    private readonly UpDownCounter<int> _pendingOrders;

    public OrderMetrics(IMeterFactory meterFactory)
    {
        var meter = meterFactory.Create("OrderFlow.Orders");

        _ordersCreated = meter.CreateCounter<long>(
            "orders.created",
            unit: "{orders}",
            description: "Total orders created");

        _orderProcessingMs = meter.CreateHistogram<double>(
            "orders.processing_duration",
            unit: "ms",
            description: "Order processing time in milliseconds");

        _pendingOrders = meter.CreateUpDownCounter<int>(
            "orders.pending",
            unit: "{orders}",
            description: "Current number of pending orders");
    }

    public void RecordOrderCreated(string region, string channel)
        => _ordersCreated.Add(1, new TagList
        {
            { "region",  region  },
            { "channel", channel }
        });

    public void RecordProcessingTime(double ms, string status)
        => _orderProcessingMs.Record(ms, new TagList { { "status", status } });

    public void RecordPendingChange(int delta) => _pendingOrders.Add(delta);
}

// Register
builder.Services.AddSingleton<OrderMetrics>();

Structured Logging with OTel

// OTel log bridge — logs automatically get TraceId and SpanId injected
// No additional code needed — just use ILogger<T>

public class OrderService
{
    private readonly ILogger<OrderService> _logger;

    public async Task SubmitOrderAsync(Guid id, CancellationToken ct)
    {
        // This log will automatically include:
        // - TraceId from the current activity
        // - SpanId from the current span
        // - Service name, version, environment
        _logger.LogInformation(
            "Order {OrderId} submitted by {CustomerId}",
            id, _currentUser.Id);
    }
}

Correlation IDs

OpenTelemetry propagates the traceparent header automatically between services. The trace ID ties all spans across services together in Jaeger/Grafana.

// Middleware to expose TraceId in responses (useful for support)
app.Use(async (ctx, next) =>
{
    var traceId = Activity.Current?.TraceId.ToString();
    if (traceId is not null)
        ctx.Response.Headers["X-Trace-Id"] = traceId;
    await next(ctx);
});

Sampling

Sampling reduces trace volume in high-traffic services:

.WithTracing(tracing => tracing
    .SetSampler(new ParentBasedSampler(    // respect upstream sampling decision
        new TraceIdRatioBasedSampler(0.1)  // sample 10% of root spans
    )))

Always sample errors regardless:

public class AlwaysSampleErrorsSampler : Sampler
{
    private readonly Sampler _inner;

    public AlwaysSampleErrorsSampler(Sampler inner) => _inner = inner;

    public override SamplingResult ShouldSample(in SamplingParameters parameters)
    {
        // Always sample if there's an error tag
        if (parameters.Tags?.Any(t => t.Key == "error" && t.Value?.ToString() == "true") == true)
            return new SamplingResult(SamplingDecision.RecordAndSample);

        return _inner.ShouldSample(parameters);
    }
}

Local Observability Stack (Docker Compose)

YAML

# docker-compose.observability.yml
services:
  # Collect all OTel signals
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP

  # Distributed tracing
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI

  # Metrics
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  # Dashboards (traces + metrics + logs in one)
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"

  # Logs
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"

Grafana Dashboards

Key queries for .NET API dashboards:

PROMQL

# Request rate (requests per second)
rate(http_server_request_duration_seconds_count[5m])

# P95 latency
histogram_quantile(0.95, rate(http_server_request_duration_seconds_bucket[5m]))

# Error rate
rate(http_server_request_duration_seconds_count{http_response_status_code=~"5.."}[5m])
  /
rate(http_server_request_duration_seconds_count[5m])

# GC Gen2 collections per minute
rate(dotnet_gc_collections_total{generation="gen2"}[1m]) * 60

Interview Questions

Q: What is the difference between traces, metrics, and logs? Traces follow a request through multiple services showing timing for each step. Metrics are aggregated numerical measurements over time (request rate, P95 latency, error rate). Logs are timestamped text events with context. You need all three — a spike in P95 latency (metric) points you to a trace, and the trace's spans point you to the relevant logs.

Q: What is the W3C traceparent header? The standard HTTP header for trace context propagation. It contains the trace ID, parent span ID, and sampling flag. OpenTelemetry injects it automatically on outbound HTTP requests and extracts it on inbound — linking all spans for a request across services into one trace.

Q: What is trace sampling and why is it needed? High-traffic services can generate millions of traces per minute — storing all of them is expensive. Sampling selects a fraction (e.g., 10%) of traces to record. Strategies: head-based (decide at root span), tail-based (decide after seeing the full trace — allows always sampling errors), parent-based (respect upstream decision).

Q: What is a custom ActivitySource? ActivitySource is the .NET API for creating custom spans (activities). You create one per library/service with new ActivitySource("MyApp.Orders"). Register it with OTel using .AddSource("MyApp.*"). Inside operations, call ActivitySource.StartActivity("OperationName") to create a child span under the current trace.