.NET & C# Development · Lesson 211 of 229
System Design: Real-Time Notification Platform in .NET — Fan-Out, Multi-Channel Delivery, and Deduplication
System Design: Real-Time Notification Platform in .NET — Fan-Out, Multi-Channel Delivery, and Deduplication at Scale
System: B2C platform, 2M active users, 50K concurrent connections at peak, events from 15 upstream microservices, four delivery channels (in-app, email, SMS, push notification).
The core problem: A single user action — a post going viral, a flash sale starting, a system alert — can trigger notifications to hundreds of thousands of recipients in seconds. The naive approach (synchronous fan-out per event) saturates the database, blows through SMS rate limits, and sends duplicate notifications when a service retries.
System Overview
Upstream Services → EventBus (RabbitMQ)
↓
NotificationWorker
/ | \
FanOutEngine DedupeCache PreferenceEngine
↓ ↓
DeliveryRouter ←───────────── UserPreferences
/ | \ \
SignalR Email SMS Push
(in-app) (SES) (Twilio) (FCM/APNS)Data Model
public class NotificationTemplate
{
public Guid Id { get; private set; }
public string EventType { get; private set; } = ""; // "order.shipped", "mention", etc.
public string TitleTemplate { get; private set; } = ""; // "Your order {{OrderId}} has shipped"
public string BodyTemplate { get; private set; } = "";
public NotificationChannel[] EnabledChannels { get; private set; } = [];
}
public class UserNotificationPreferences
{
public string UserId { get; private set; } = "";
// Per-channel, per-event-type opt-in/out
public Dictionary<string, ChannelPreference> Channels { get; private set; } = new();
public bool IsEnabled(string eventType, NotificationChannel channel)
{
var key = $"{eventType}:{channel}";
if (Channels.TryGetValue(key, out var pref)) return pref.Enabled;
// Fall back to channel-level default
if (Channels.TryGetValue(channel.ToString(), out pref)) return pref.Enabled;
return true; // default opt-in
}
}
public class OutboundNotification
{
public Guid Id { get; private set; } = Guid.NewGuid();
public string UserId { get; private set; } = "";
public string EventType { get; private set; } = "";
public string Title { get; private set; } = "";
public string Body { get; private set; } = "";
public NotificationChannel Channel { get; private set; }
public NotificationStatus Status { get; private set; } = NotificationStatus.Pending;
public DateTime CreatedAt { get; private set; } = DateTime.UtcNow;
public DateTime? SentAt { get; private set; }
public string? FailureReason { get; private set; }
public enum NotificationStatus { Pending, Sent, Failed, Suppressed }
}
public enum NotificationChannel { InApp, Email, Sms, Push }Design Decision 1: Fan-Out Strategy
The naive approach — fan out to all subscribers synchronously when an event arrives — fails at scale. A user with 50K followers triggering a new_post event creates 50K database writes synchronously.
Three strategies, chosen by recipient count:
public class FanOutEngine
{
private const int DirectFanOutThreshold = 1000;
private const int BatchSize = 500;
private readonly IUserGraphRepository _graph;
private readonly INotificationQueue _queue;
public async Task FanOutAsync(NotificationEvent evt, CancellationToken ct)
{
var followerCount = await _graph.GetFollowerCountAsync(evt.ActorId, ct);
if (followerCount <= DirectFanOutThreshold)
{
await DirectFanOutAsync(evt, ct);
}
else
{
await EnqueueDeferredFanOutAsync(evt, ct);
}
}
// Small actor: fan out immediately, enqueue one delivery job per follower
private async Task DirectFanOutAsync(NotificationEvent evt, CancellationToken ct)
{
var followerIds = await _graph.GetFollowerIdsAsync(evt.ActorId, ct);
var batches = followerIds.Chunk(BatchSize);
foreach (var batch in batches)
{
var jobs = batch.Select(userId => new DeliveryJob(userId, evt));
await _queue.EnqueueBatchAsync(jobs, ct);
}
}
// Large actor: enqueue a single fan-out job that the worker expands
private async Task EnqueueDeferredFanOutAsync(NotificationEvent evt, CancellationToken ct)
{
await _queue.EnqueueAsync(new FanOutJob(evt.ActorId, evt), ct);
}
}
// The deferred fan-out worker pages through followers in the background
public class DeferredFanOutWorker : IConsumer<FanOutJob>
{
private readonly IUserGraphRepository _graph;
private readonly INotificationQueue _queue;
public async Task Consume(ConsumeContext<FanOutJob> context)
{
var job = context.Message;
string? cursor = null;
do
{
var page = await _graph.GetFollowerPageAsync(
job.ActorId, pageSize: 500, cursor, context.CancellationToken);
var deliveryJobs = page.Items.Select(userId => new DeliveryJob(userId, job.Event));
await _queue.EnqueueBatchAsync(deliveryJobs, context.CancellationToken);
cursor = page.NextCursor;
}
while (cursor is not null);
}
}Design Decision 2: Deduplication
Events are retried on failure. Without deduplication, a transient RabbitMQ issue causes duplicate emails. The fix: idempotency keys with Redis TTL.
public class DeduplicationService
{
private readonly IDatabase _redis;
private static readonly TimeSpan DedupeTtl = TimeSpan.FromHours(24);
// Returns true if this notification should be delivered (first time)
// Returns false if already delivered (duplicate)
public async Task<bool> TryMarkDeliveredAsync(
string userId,
string eventId,
NotificationChannel channel)
{
// Key: dedup:{channel}:{userId}:{eventId}
var key = $"dedup:{channel}:{userId}:{eventId}";
// SET NX (only set if not exists) with expiry
var wasSet = await _redis.StringSetAsync(
key,
DateTime.UtcNow.ToString("O"),
DedupeTtl,
When.NotExists);
return wasSet; // true = first delivery, false = duplicate
}
}
public class NotificationDeliveryWorker : IConsumer<DeliveryJob>
{
private readonly DeduplicationService _dedupe;
private readonly DeliveryRouter _router;
private readonly PreferenceEngine _preferences;
public async Task Consume(ConsumeContext<DeliveryJob> context)
{
var job = context.Message;
// Check preferences first — no point deduplicating a suppressed notification
var prefs = await _preferences.GetAsync(job.UserId, context.CancellationToken);
var channels = await _router.GetChannelsForUserAsync(job.UserId, job.EventType, prefs);
foreach (var channel in channels)
{
// Idempotency check per channel
if (!await _dedupe.TryMarkDeliveredAsync(job.UserId, job.EventId, channel))
continue; // already delivered on this channel
await _router.DeliverAsync(job, channel, context.CancellationToken);
}
}
}Design Decision 3: Multi-Channel Routing
Different events warrant different channels. A new follower gets an in-app notification. A security alert gets SMS. A weekly digest gets email. Routing is event-type driven.
public class DeliveryRouter
{
private readonly IInAppNotifier _inApp;
private readonly IEmailSender _email;
private readonly ISmsSender _sms;
private readonly IPushNotifier _push;
private readonly INotificationTemplateRepository _templates;
public async Task DeliverAsync(
DeliveryJob job,
NotificationChannel channel,
CancellationToken ct)
{
var template = await _templates.GetAsync(job.EventType, ct);
if (template is null) return;
var rendered = RenderTemplate(template, job.Payload);
await (channel switch
{
NotificationChannel.InApp => _inApp.SendAsync(job.UserId, rendered, ct),
NotificationChannel.Email => _email.SendAsync(job.UserId, rendered, ct),
NotificationChannel.Sms => _sms.SendAsync(job.UserId, rendered.ShortBody, ct),
NotificationChannel.Push => _push.SendAsync(job.UserId, rendered, ct),
_ => Task.CompletedTask,
});
}
private RenderedNotification RenderTemplate(
NotificationTemplate template,
Dictionary<string, string> payload)
{
var title = Regex.Replace(
template.TitleTemplate,
@"\{\{(\w+)\}\}",
m => payload.GetValueOrDefault(m.Groups[1].Value, m.Value));
var body = Regex.Replace(
template.BodyTemplate,
@"\{\{(\w+)\}\}",
m => payload.GetValueOrDefault(m.Groups[1].Value, m.Value));
return new RenderedNotification(title, body, ShortBody: body[..Math.Min(160, body.Length)]);
}
}Design Decision 4: In-App Real-Time with SignalR at Scale
In-app notifications must arrive in real-time. With 50K concurrent connections across multiple API instances, a single SignalR server isn't enough — the Redis backplane distributes messages across instances.
// Hub
public interface INotificationClient
{
Task NotificationReceived(NotificationPayload notification);
Task NotificationCountUpdated(int unreadCount);
}
public class NotificationHub : Hub<INotificationClient>
{
private readonly IUnreadCountService _unreadCount;
public override async Task OnConnectedAsync()
{
// Each user joins their personal group so any server instance can reach them
await Groups.AddToGroupAsync(Context.ConnectionId, $"user:{Context.UserIdentifier}");
await base.OnConnectedAsync();
}
public async Task MarkRead(Guid notificationId)
{
var userId = Context.UserIdentifier!;
await _unreadCount.MarkReadAsync(userId, notificationId);
var count = await _unreadCount.GetUnreadCountAsync(userId);
await Clients.Caller.NotificationCountUpdated(count);
}
}// In-app delivery via IHubContext — works across server instances via Redis backplane
public class InAppNotifier : IInAppNotifier
{
private readonly IHubContext<NotificationHub, INotificationClient> _hub;
private readonly IOutboundNotificationRepository _repo;
private readonly IUnreadCountService _unreadCount;
public async Task SendAsync(string userId, RenderedNotification rendered, CancellationToken ct)
{
// Persist so user sees notification on next login even if offline
var notification = new OutboundNotification(userId, rendered);
await _repo.AddAsync(notification, ct);
var unreadCount = await _unreadCount.IncrementAsync(userId);
// Send to all connected devices — Redis backplane routes to the right server
await _hub.Clients
.Group($"user:{userId}")
.NotificationReceived(new NotificationPayload(notification.Id, rendered.Title, rendered.Body));
await _hub.Clients
.Group($"user:{userId}")
.NotificationCountUpdated(unreadCount);
}
}// Program.cs — Redis backplane for multi-instance SignalR
builder.Services.AddSignalR()
.AddStackExchangeRedis(connectionString, options =>
{
options.Configuration.ChannelPrefix = RedisChannel.Literal("notifications");
});
builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
.AddJwtBearer(options =>
{
options.Events = new JwtBearerEvents
{
// SignalR passes the token as a query string parameter
OnMessageReceived = ctx =>
{
var token = ctx.Request.Query["access_token"];
if (!string.IsNullOrEmpty(token) &&
ctx.Request.Path.StartsWithSegments("/hubs/notifications"))
{
ctx.Token = token;
}
return Task.CompletedTask;
}
};
});Challenge 1: Rate Limiting Per Channel
SMS costs money. Email providers have hourly limits. Without rate limiting, a viral event sends 50K SMS messages in 30 seconds.
public class RateLimitedSmsSender : ISmsSender
{
private readonly ISmsSender _inner;
private readonly IDatabase _redis;
// Per-user: max 3 SMS per hour
private const int UserHourlyLimit = 3;
// Global: max 5,000 SMS per minute (provider limit)
private const int GlobalMinuteLimit = 5000;
public async Task SendAsync(string userId, string body, CancellationToken ct)
{
// Check per-user limit
var userKey = $"sms:user:{userId}:{DateTime.UtcNow:yyyyMMddHH}";
var userCount = await _redis.StringIncrementAsync(userKey);
if (userCount == 1) await _redis.KeyExpireAsync(userKey, TimeSpan.FromHours(2));
if (userCount > UserHourlyLimit)
{
// Log suppression — still track it for analytics
return;
}
// Check global rate
var globalKey = $"sms:global:{DateTime.UtcNow:yyyyMMddHHmm}";
var globalCount = await _redis.StringIncrementAsync(globalKey);
if (globalCount == 1) await _redis.KeyExpireAsync(globalKey, TimeSpan.FromMinutes(2));
if (globalCount > GlobalMinuteLimit)
{
// Backoff: push to a delayed queue instead of dropping
await EnqueueDelayedAsync(userId, body, delay: TimeSpan.FromMinutes(1));
return;
}
await _inner.SendAsync(userId, body, ct);
}
}Challenge 2: Notification Fatigue — Digest Batching
A user who gets 100 notifications in an hour will unsubscribe. The fix: batch low-priority notifications into hourly or daily digests.
public class DigestBatchingService
{
private readonly IDatabase _redis;
private readonly IEmailSender _email;
public async Task StageForDigestAsync(
string userId,
RenderedNotification notification,
DigestFrequency frequency)
{
var bucketKey = frequency switch
{
DigestFrequency.Hourly => $"digest:hourly:{userId}:{DateTime.UtcNow:yyyyMMddHH}",
DigestFrequency.Daily => $"digest:daily:{userId}:{DateTime.UtcNow:yyyyMMdd}",
_ => throw new ArgumentOutOfRangeException()
};
// Store notification in sorted set (score = timestamp for ordering)
await _redis.SortedSetAddAsync(
bucketKey,
JsonSerializer.Serialize(notification),
DateTimeOffset.UtcNow.ToUnixTimeSeconds());
// Expire after 2x the digest window to auto-clean
var ttl = frequency == DigestFrequency.Hourly
? TimeSpan.FromHours(2)
: TimeSpan.FromDays(2);
await _redis.KeyExpireAsync(bucketKey, ttl);
}
// Called by a scheduled worker (every hour for hourly digests)
public async Task FlushHourlyDigestsAsync(CancellationToken ct)
{
// Scan for all hourly digest buckets from the previous hour
var previousHour = DateTime.UtcNow.AddHours(-1).ToString("yyyyMMddHH");
var pattern = $"digest:hourly:*:{previousHour}";
await foreach (var key in ScanKeysAsync(pattern, ct))
{
var userId = ExtractUserId(key);
var items = await _redis.SortedSetRangeByRankAsync(key);
if (items.Length == 0) continue;
var notifications = items
.Select(i => JsonSerializer.Deserialize<RenderedNotification>(i!)!)
.ToList();
await _email.SendDigestAsync(userId, notifications, ct);
await _redis.KeyDeleteAsync(key);
}
}
}Challenge 3: Offline Users — Push Notification Fallback
A user who hasn't opened the app in 3 days shouldn't receive 3 days of in-app SignalR pushes on reconnect. The fallback path: push notification for users offline longer than a threshold.
public class PresenceAwareNotifier
{
private readonly IDatabase _redis;
private readonly IInAppNotifier _inApp;
private readonly IPushNotifier _push;
private static readonly TimeSpan OfflineThreshold = TimeSpan.FromMinutes(5);
public async Task SendAsync(string userId, RenderedNotification rendered, CancellationToken ct)
{
var lastSeenKey = $"presence:{userId}";
var lastSeenValue = await _redis.StringGetAsync(lastSeenKey);
bool isOnline = lastSeenValue.HasValue &&
DateTimeOffset.Parse(lastSeenValue!) > DateTimeOffset.UtcNow - OfflineThreshold;
if (isOnline)
{
await _inApp.SendAsync(userId, rendered, ct);
}
else
{
// User is offline: persist in-app notification + send push
await _inApp.PersistAsync(userId, rendered, ct);
await _push.SendAsync(userId, rendered, ct);
}
}
}
// Update presence on any WebSocket activity
public class PresenceMiddleware
{
private readonly RequestDelegate _next;
private readonly IDatabase _redis;
public async Task InvokeAsync(HttpContext context)
{
if (context.User.Identity?.IsAuthenticated == true)
{
var userId = context.User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
if (userId is not null)
{
await _redis.StringSetAsync(
$"presence:{userId}",
DateTimeOffset.UtcNow.ToString("O"),
TimeSpan.FromMinutes(10));
}
}
await _next(context);
}
}Challenge 4: Notification Preferences at Scale
Fetching preferences from the database on every notification delivery is a hot read. At 50K concurrent deliveries, this creates 50K database queries per second.
public class CachedPreferenceEngine : IPreferenceEngine
{
private readonly IPreferenceRepository _repo;
private readonly IDatabase _redis;
private static readonly TimeSpan CacheTtl = TimeSpan.FromMinutes(15);
public async Task<UserNotificationPreferences> GetAsync(string userId, CancellationToken ct)
{
var cacheKey = $"prefs:{userId}";
var cached = await _redis.StringGetAsync(cacheKey);
if (cached.HasValue)
return JsonSerializer.Deserialize<UserNotificationPreferences>(cached!)!;
var prefs = await _repo.GetAsync(userId, ct)
?? UserNotificationPreferences.Default(userId);
await _redis.StringSetAsync(
cacheKey,
JsonSerializer.Serialize(prefs),
CacheTtl);
return prefs;
}
public async Task UpdateAsync(string userId, UserNotificationPreferences prefs, CancellationToken ct)
{
await _repo.UpsertAsync(prefs, ct);
// Invalidate cache immediately — next read will repopulate
await _redis.KeyDeleteAsync($"prefs:{userId}");
}
}Observability
The notification pipeline must be fully observable — silent failures (notification never sent, duplicate sent) are worse than visible errors.
// Prometheus metrics for every stage
public class NotificationMetrics
{
private readonly Counter _enqueued;
private readonly Counter _delivered;
private readonly Counter _deduplicated;
private readonly Counter _suppressed;
private readonly Counter _failed;
private readonly Histogram _deliveryLatency;
public NotificationMetrics(IMeterFactory factory)
{
var meter = factory.Create("Notifications");
_enqueued = meter.CreateCounter<long>(
"notifications.enqueued",
description: "Total notifications entering the fan-out queue");
_delivered = meter.CreateCounter<long>(
"notifications.delivered",
description: "Notifications successfully delivered per channel");
_deduplicated = meter.CreateCounter<long>(
"notifications.deduplicated",
description: "Notifications suppressed as duplicates");
_suppressed = meter.CreateCounter<long>(
"notifications.suppressed",
description: "Notifications suppressed by user preference or rate limit");
_failed = meter.CreateCounter<long>(
"notifications.failed",
description: "Delivery failures per channel");
_deliveryLatency = meter.CreateHistogram<double>(
"notifications.delivery_latency_ms",
description: "Time from event receipt to delivery in ms");
}
public void RecordEnqueued(string eventType) =>
_enqueued.Add(1, new("event_type", eventType));
public void RecordDelivered(NotificationChannel channel, string eventType) =>
_delivered.Add(1, new("channel", channel), new("event_type", eventType));
public void RecordLatency(double ms, NotificationChannel channel) =>
_deliveryLatency.Record(ms, new("channel", channel));
}What We'd Do Differently
Use a dedicated notification platform for SMS/Push. Services like Knock, Courier, or Novu handle template management, preference UI, analytics, and multi-provider fallback. Building all this in-house is months of work that doesn't differentiate your product.
Separate read path from write path. The notification inbox (unread count, notification list) is read-heavy. A dedicated read model (Redis sorted set per user, pre-serialised) serves the inbox without touching the main database on every page load.
Per-tenant rate limits in multi-tenant systems. One tenant triggering a flood should not exhaust the SMS budget for all tenants. Namespace rate limit keys by tenant, not just by user.
Time-zone aware digest scheduling. Hourly digests at 3am are useless. Schedule digests relative to each user's local time — store the user's timezone and use TimeZoneInfo.ConvertTimeFromUtc to find the right flush window.