Learnixo
Back to blog
AI Systemsintermediate

SignalR Production Patterns — Scale, Reliability, and Monitoring

Production SignalR: connection lifecycle management, heartbeats, fallback transports, monitoring connection counts, graceful shutdown, and the operational patterns for real-time systems at hospital scale.

Asma Hafeez KhanMay 16, 20265 min read
SignalRProductionScalingASP.NET Core.NETMonitoring
Share:𝕏

Transport Fallback

SignalR negotiates the best available transport:

Transport priority:
  1. WebSockets (full duplex, most efficient)
  2. Server-Sent Events (server-to-client only)
  3. Long Polling (polling with held connection)

Negotiation happens automatically:
  Client: "I support WebSockets, SSE, LongPolling"
  Server: "I support WebSockets and LongPolling"
  Result: WebSockets selected

Skip negotiation for performance:
  .withUrl("/hubs/clinical", {
    transport: signalR.HttpTransportType.WebSockets,
    skipNegotiation: true  // requires WebSockets, no fallback
  })

Heartbeats and Timeouts

C#
// Configure hub timeouts (server-side)
builder.Services.AddSignalR(options =>
{
    // How often the server pings connected clients (default: 15s)
    options.KeepAliveInterval = TimeSpan.FromSeconds(15);

    // How long the server waits for client response before considering disconnected
    // (default: 30s — must be greater than KeepAliveInterval)
    options.ClientTimeoutInterval = TimeSpan.FromSeconds(30);

    // Max message size (default: 32KB)
    options.MaximumReceiveMessageSize = 64 * 1024;  // 64KB

    // Max parallel hub invocations per connection (default: 1)
    options.MaximumParallelInvocationsPerClient = 1;
});
TYPESCRIPT
// Client-side: server timeout for heartbeat responses
const connection = new signalR.HubConnectionBuilder()
    .withUrl("/hubs/clinical", { accessTokenFactory: getAccessToken })
    .withAutomaticReconnect()
    .build();

// SignalR client has built-in keep-alive (ping every 15s by default)
connection.serverTimeoutInMilliseconds = 30000;  // 30s
connection.keepAliveIntervalInMilliseconds = 15000; // 15s

Graceful Shutdown

C#
// Tell SignalR to gracefully close connections before the app stops
builder.Services.AddSignalR();
builder.Services.Configure<HostOptions>(options =>
{
    // Give connections time to close before forceful shutdown
    options.ShutdownTimeout = TimeSpan.FromSeconds(30);
});

// The hub sends a close message to clients before the connection ends
// Clients receive onclose() and can show "server shutting down, please refresh"

Connection Count Monitoring

C#
// Track total connections via IConnectionTracker
public sealed class ConnectionCountMetrics : IHostedService, IDisposable
{
    private readonly IConnectionTracker _tracker;
    private readonly IMeterFactory      _meters;
    private Timer?                      _timer;

    public ConnectionCountMetrics(
        IConnectionTracker tracker, IMeterFactory meters)
    {
        _tracker = tracker;
        var meter = meters.Create("SystemForge.SignalR");
        meter.CreateObservableGauge(
            "signalr.connections.active",
            () => _tracker.TotalConnections,
            description: "Active SignalR connections");
    }

    public Task StartAsync(CancellationToken ct)
    {
        _timer = new Timer(
            _ => /* report metrics */ _tracker.TotalConnections,
            null, TimeSpan.Zero, TimeSpan.FromSeconds(30));
        return Task.CompletedTask;
    }

    public Task StopAsync(CancellationToken ct)
    {
        _timer?.Change(Timeout.Infinite, 0);
        return Task.CompletedTask;
    }

    public void Dispose() => _timer?.Dispose();
}

Error Handling in Hub Methods

C#
public sealed class ClinicalDashboardHub : Hub<IClinicalDashboardClient>
{
    [Authorize]
    public async Task UpdateDrugOrder(Guid orderId, string newStatus)
    {
        try
        {
            var result = await _service.UpdateStatusAsync(orderId, newStatus);
            if (result.IsFailure)
            {
                // HubException message is sent to the client
                throw new HubException($"Order update failed: {result.Error.Description}");
            }

            // Notify ward on success
            await Clients.Group($"ward:{result.Value.WardId}")
                .DrugOrderStatusChanged(result.Value.ToDto());
        }
        catch (HubException)
        {
            throw;  // re-throw HubException — it goes to the client
        }
        catch (Exception ex)
        {
            // Log but don't expose internal error to client
            _logger.LogError(ex, "Unhandled error updating drug order {OrderId}", orderId);
            throw new HubException("An unexpected error occurred. Please try again.");
        }
    }
}

Message Size and Throttling

C#
// Prevent large messages from overwhelming the hub
builder.Services.AddSignalR(options =>
{
    // Reject messages larger than 32KB
    options.MaximumReceiveMessageSize = 32 * 1024;  // 32KB
});

// For large payloads (e.g., PDF reports), send the URL not the content
// Client fetches via HTTP — SignalR is for notifications, not file transfer
await Clients.Caller.LargeReportReady(new { ReportUrl = "/reports/abc123" });

Per-Connection Rate Limiting

C#
// Custom filter to rate-limit hub method calls per connection
public sealed class HubRateLimitFilter : IHubFilter
{
    private readonly ConcurrentDictionary<string, RateLimitState> _state = new();

    public async ValueTask<object?> InvokeMethodAsync(
        HubInvocationContext ctx,
        Func<HubInvocationContext, ValueTask<object?>> next)
    {
        var connectionId = ctx.Context.ConnectionId;
        var state = _state.GetOrAdd(connectionId, _ => new RateLimitState());

        if (state.IsRateLimited())
            throw new HubException("Rate limit exceeded. Please slow down.");

        state.RecordCall();
        return await next(ctx);
    }
}

// Register
builder.Services.AddSignalR(options =>
    options.AddFilter<HubRateLimitFilter>());

Health Checks for SignalR

C#
// Check that the hub is accepting connections
builder.Services.AddHealthChecks()
    .AddCheck("signalr", () =>
    {
        var count = _connectionTracker.TotalConnections;
        return HealthCheckResult.Healthy($"{count} active connections");
    })
    .AddRedis(redisConnectionString, name: "signalr-backplane");

app.MapHealthChecks("/health/signalr", new HealthCheckOptions
{
    Predicate    = check => check.Name.StartsWith("signalr"),
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

Production issue I've seen: A hospital's ward monitoring system had no heartbeat timeout configured. When a nurse's laptop went to sleep, the WebSocket connection stayed "open" from the server's perspective for 4 hours (until the OS forcibly closed it). With 200 nurses, the server accumulated thousands of zombie connections. Setting ClientTimeoutInterval to 30 seconds cleaned up stale connections promptly, reducing memory usage by 40%.


Deployment Checklist

Pre-deployment SignalR checklist:
  ☐ Redis backplane configured for 2+ instances
  ☐ Load balancer supports WebSocket upgrade headers
  ☐ KeepAliveInterval and ClientTimeoutInterval configured
  ☐ MaximumReceiveMessageSize appropriate for your payloads
  ☐ Hub methods handle errors with HubException
  ☐ Clients re-join groups on reconnect
  ☐ Connection count monitoring in place
  ☐ Redis health check configured
  ☐ Graceful shutdown timeout set
  ☐ CORS configured for hub paths
  ☐ JWT extracted from query string for hub paths

Key Takeaway

Production SignalR requires more than just AddSignalR(): configure heartbeat timeouts to clean up zombie connections, Redis backplane for multi-instance consistency, graceful shutdown for clean client disconnects, and monitoring for connection count and backplane health. Handle hub method errors with HubException (client-visible) vs Exception (server-side only). Clients must re-join groups after reconnect — groups are per-connection and reset on reconnect.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.