OpenTelemetry in .NET: Traces, Metrics, and Logs in One Stack
Implement complete observability in .NET with OpenTelemetry. Covers traces, metrics, logs, OTLP export, Grafana/Jaeger/Prometheus setup, custom instrumentation, sampling, and production patterns.
The Three Pillars of Observability
- Traces — follow a request across services. Answer: where did the time go?
- Metrics — aggregated measurements over time. Answer: how is the system performing?
- Logs — timestamped events with context. Answer: what happened?
OpenTelemetry (OTel) is the CNCF standard for collecting all three. One SDK, one agent, any backend.
Setup
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.EntityFrameworkCore
dotnet add package OpenTelemetry.Instrumentation.Runtime
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol # OTLP (Jaeger, Grafana)
dotnet add package OpenTelemetry.Exporter.Prometheus.AspNetCore # Prometheus scrape endpointFull Configuration
// Program.cs
var serviceName = builder.Environment.ApplicationName;
var serviceVersion = "1.0.0";
var resourceBuilder = ResourceBuilder.CreateDefault()
.AddService(serviceName, serviceVersion: serviceVersion)
.AddTelemetrySdk()
.AddEnvironmentVariableDetector();
builder.Services.AddOpenTelemetry()
// ── Traces ──────────────────────────────────────────────────────────────
.WithTracing(tracing => tracing
.SetResourceBuilder(resourceBuilder)
.AddAspNetCoreInstrumentation(options =>
{
options.RecordException = true;
options.Filter = ctx => !ctx.Request.Path.StartsWithSegments("/health");
})
.AddHttpClientInstrumentation(options =>
{
options.RecordException = true;
})
.AddEntityFrameworkCoreInstrumentation(options =>
{
options.SetDbStatementForText = true; // include SQL in spans (dev only)
})
.AddSource("OrderFlow.*") // pick up custom ActivitySource
.AddOtlpExporter(otlp =>
{
otlp.Endpoint = new Uri(builder.Configuration["Otlp:Endpoint"]!);
}))
// ── Metrics ─────────────────────────────────────────────────────────────
.WithMetrics(metrics => metrics
.SetResourceBuilder(resourceBuilder)
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation() // GC, threadpool, memory
.AddMeter("OrderFlow.*") // pick up custom Meter
.AddOtlpExporter()
.AddPrometheusExporter()) // /metrics endpoint for Prometheus
// ── Logs ────────────────────────────────────────────────────────────────
.WithLogging(logging => logging
.SetResourceBuilder(resourceBuilder)
.AddOtlpExporter());
// Wire up Prometheus scrape endpoint
app.MapPrometheusScrapingEndpoint(); // /metricsCustom Traces (ActivitySource)
// Shared ActivitySource — one per service
public static class Telemetry
{
public static readonly ActivitySource ActivitySource =
new("OrderFlow.Orders", "1.0.0");
}
// In your service
public class OrderService
{
public async Task<Order> CreateOrderAsync(CreateOrderCommand cmd, CancellationToken ct)
{
using var activity = Telemetry.ActivitySource.StartActivity("CreateOrder");
activity?.SetTag("order.customerId", cmd.CustomerId);
activity?.SetTag("order.lineCount", cmd.Lines.Count);
try
{
var order = await _repo.CreateAsync(cmd, ct);
activity?.SetTag("order.id", order.Id);
activity?.SetTag("order.total", order.Total);
activity?.SetStatus(ActivityStatusCode.Ok);
return order;
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
activity?.RecordException(ex);
throw;
}
}
}Custom Metrics
// Shared Meter
public class OrderMetrics
{
private readonly Counter<long> _ordersCreated;
private readonly Histogram<double> _orderProcessingMs;
private readonly UpDownCounter<int> _pendingOrders;
public OrderMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("OrderFlow.Orders");
_ordersCreated = meter.CreateCounter<long>(
"orders.created",
unit: "{orders}",
description: "Total orders created");
_orderProcessingMs = meter.CreateHistogram<double>(
"orders.processing_duration",
unit: "ms",
description: "Order processing time in milliseconds");
_pendingOrders = meter.CreateUpDownCounter<int>(
"orders.pending",
unit: "{orders}",
description: "Current number of pending orders");
}
public void RecordOrderCreated(string region, string channel)
=> _ordersCreated.Add(1, new TagList
{
{ "region", region },
{ "channel", channel }
});
public void RecordProcessingTime(double ms, string status)
=> _orderProcessingMs.Record(ms, new TagList { { "status", status } });
public void RecordPendingChange(int delta) => _pendingOrders.Add(delta);
}
// Register
builder.Services.AddSingleton<OrderMetrics>();Structured Logging with OTel
// OTel log bridge — logs automatically get TraceId and SpanId injected
// No additional code needed — just use ILogger<T>
public class OrderService
{
private readonly ILogger<OrderService> _logger;
public async Task SubmitOrderAsync(Guid id, CancellationToken ct)
{
// This log will automatically include:
// - TraceId from the current activity
// - SpanId from the current span
// - Service name, version, environment
_logger.LogInformation(
"Order {OrderId} submitted by {CustomerId}",
id, _currentUser.Id);
}
}Correlation IDs
OpenTelemetry propagates the traceparent header automatically between services. The trace ID ties all spans across services together in Jaeger/Grafana.
// Middleware to expose TraceId in responses (useful for support)
app.Use(async (ctx, next) =>
{
var traceId = Activity.Current?.TraceId.ToString();
if (traceId is not null)
ctx.Response.Headers["X-Trace-Id"] = traceId;
await next(ctx);
});Sampling
Sampling reduces trace volume in high-traffic services:
.WithTracing(tracing => tracing
.SetSampler(new ParentBasedSampler( // respect upstream sampling decision
new TraceIdRatioBasedSampler(0.1) // sample 10% of root spans
)))Always sample errors regardless:
public class AlwaysSampleErrorsSampler : Sampler
{
private readonly Sampler _inner;
public AlwaysSampleErrorsSampler(Sampler inner) => _inner = inner;
public override SamplingResult ShouldSample(in SamplingParameters parameters)
{
// Always sample if there's an error tag
if (parameters.Tags?.Any(t => t.Key == "error" && t.Value?.ToString() == "true") == true)
return new SamplingResult(SamplingDecision.RecordAndSample);
return _inner.ShouldSample(parameters);
}
}Local Observability Stack (Docker Compose)
# docker-compose.observability.yml
services:
# Collect all OTel signals
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
# Distributed tracing
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Jaeger UI
# Metrics
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
# Dashboards (traces + metrics + logs in one)
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
# Logs
loki:
image: grafana/loki:latest
ports:
- "3100:3100"Grafana Dashboards
Key queries for .NET API dashboards:
# Request rate (requests per second)
rate(http_server_request_duration_seconds_count[5m])
# P95 latency
histogram_quantile(0.95, rate(http_server_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_server_request_duration_seconds_count{http_response_status_code=~"5.."}[5m])
/
rate(http_server_request_duration_seconds_count[5m])
# GC Gen2 collections per minute
rate(dotnet_gc_collections_total{generation="gen2"}[1m]) * 60Interview Questions
Q: What is the difference between traces, metrics, and logs? Traces follow a request through multiple services showing timing for each step. Metrics are aggregated numerical measurements over time (request rate, P95 latency, error rate). Logs are timestamped text events with context. You need all three — a spike in P95 latency (metric) points you to a trace, and the trace's spans point you to the relevant logs.
Q: What is the W3C traceparent header?
The standard HTTP header for trace context propagation. It contains the trace ID, parent span ID, and sampling flag. OpenTelemetry injects it automatically on outbound HTTP requests and extracts it on inbound — linking all spans for a request across services into one trace.
Q: What is trace sampling and why is it needed? High-traffic services can generate millions of traces per minute — storing all of them is expensive. Sampling selects a fraction (e.g., 10%) of traces to record. Strategies: head-based (decide at root span), tail-based (decide after seeing the full trace — allows always sampling errors), parent-based (respect upstream decision).
Q: What is a custom ActivitySource?
ActivitySource is the .NET API for creating custom spans (activities). You create one per library/service with new ActivitySource("MyApp.Orders"). Register it with OTel using .AddSource("MyApp.*"). Inside operations, call ActivitySource.StartActivity("OperationName") to create a child span under the current trace.
Enjoyed this article?
Explore the Backend Systems learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.