Fleet Management System Design: 8 Real-World Case Studies with Full Solutions
Complete senior engineer case study guide for fleet management platforms — real-time tracking, route optimisation, driver compliance, fuel anomaly detection, maintenance scheduling, multi-tenancy, offline mobile, and live ETA. Full architecture, C#/.NET and React code, Azure integration, and interview answers.
Why Fleet Management Is a Hard Engineering Problem
Fleet management sounds straightforward on the surface — track vehicles, schedule maintenance, report compliance. But every requirement hides a distributed systems problem underneath:
- 500 vehicles sending GPS pings every 5 seconds is 250,000 writes per day — Postgres rows are the wrong tool
- Driver hours compliance requires a state machine, not a counter
- Firmware updates to 5,000 devices worldwide require staged rollouts with rollback
- Multi-tenant isolation for 50 fleet companies means one data leak is a business-ending event
This guide walks through eight case studies that senior engineers are given in technical interviews at fleet and IoT platform companies. Each one covers the problem, the wrong approaches, the correct architecture, and production-ready code.
Case Study 1: Real-Time Vehicle Tracking Dashboard
The Problem
A depot operates 500 trucks. Each truck sends its GPS coordinates every 5 seconds via a telematics unit. Build a dashboard that shows all 500 vehicles on a live map — their position, speed, and status — updating in real time without page refresh.
Scale: 500 vehicles × 12 pings/minute = 6,000 events/minute = 100 events/second at steady state.
Wrong Approaches
Polling every second from the frontend. At 500 concurrent browser sessions polling every second, you get 500 HTTP requests per second just for position data. This ignores the network overhead and server fan-out problem entirely.
Writing every ping as a Postgres row. At 100 events/second that is 8.6 million rows per day, per vehicle group. Query performance degrades, and you are paying storage costs for data you will aggregate anyway.
Re-rendering all 500 map pins on every update. React will re-render the entire component tree if state changes at the top level. 500 marker updates per second will freeze the browser.
Correct Architecture
Telematics Device
│
▼
Azure IoT Hub ← ingestion, device auth, routing
│
▼
Azure Event Hub ← fan-out to multiple consumers
│
┌────┴────────┐
▼ ▼
TimescaleDB Redis ← time-series history + live state cache
│
▼
.NET API + SignalR ← push to connected browsers
│
▼
React Dashboard ← virtualised map, grouped updatesWhy TimescaleDB instead of plain Postgres? TimescaleDB is a Postgres extension that partitions tables automatically by time. Queries like "give me all positions for vehicle X in the last 6 hours" run against a single partition instead of scanning millions of rows. It also compresses old data automatically.
Why Redis for live state?
The dashboard only needs the current position of each vehicle, not the history. Redis holds a hash per vehicle — key format vehicle:[id]:state — with position, speed, status. The .NET API reads from Redis (sub-millisecond) rather than querying TimescaleDB on every SignalR push.
Why SignalR instead of polling? SignalR maintains a persistent connection per browser. The server pushes updates only when state changes. At 100 events/second across 500 vehicles, most vehicles do not move significantly between pings — delta filtering means the frontend receives roughly 20-30 meaningful updates per second across the whole fleet, not 100.
Backend: Event Processor (.NET)
public class VehiclePositionProcessor : IEventProcessor
{
private readonly IDatabase _redis;
private readonly IHubContext<FleetHub> _hub;
private readonly TimescaleRepository _timescale;
public async Task ProcessAsync(VehiclePositionEvent evt)
{
var key = $"vehicle:{evt.VehicleId}:state";
var previous = await _redis.HashGetAllAsync(key);
// Only push to clients if position changed meaningfully (>10 metres)
if (HasMovedSignificantly(previous, evt))
{
var state = new VehicleState
{
VehicleId = evt.VehicleId,
Lat = evt.Lat,
Lng = evt.Lng,
SpeedKmh = evt.SpeedKmh,
Status = evt.Status,
Timestamp = evt.Timestamp
};
await _redis.HashSetAsync(key, state.ToHashEntries());
await _hub.Clients.Group($"fleet:{evt.FleetId}")
.SendAsync("VehicleUpdated", state);
}
// Always persist to time-series — history is separate concern
await _timescale.InsertPositionAsync(evt);
}
private static bool HasMovedSignificantly(
HashEntry[] previous, VehiclePositionEvent current)
{
if (previous.Length == 0) return true;
var prevLat = (double)previous.First(e => e.Name == "lat").Value;
var prevLng = (double)previous.First(e => e.Name == "lng").Value;
return HaversineDistance(prevLat, prevLng, current.Lat, current.Lng) > 0.01;
}
}Frontend: React with Virtualised Map
// Group updates by vehicleId — batched every 500ms to avoid thrashing
function useFleetPositions(fleetId: string) {
const [positions, setPositions] = useState<Map<string, VehicleState>>(
new Map()
);
useEffect(() => {
const connection = new HubConnectionBuilder()
.withUrl("/hubs/fleet")
.withAutomaticReconnect()
.build();
const buffer = new Map<string, VehicleState>();
connection.on("VehicleUpdated", (state: VehicleState) => {
buffer.set(state.vehicleId, state);
});
// Flush buffer to state every 500ms — smooth UI, not 100 renders/sec
const flush = setInterval(() => {
if (buffer.size > 0) {
setPositions((prev) => new Map([...prev, ...buffer]));
buffer.clear();
}
}, 500);
connection.start();
return () => {
clearInterval(flush);
connection.stop();
};
}, [fleetId]);
return positions;
}Key insight for the interview: The 500ms buffer is not a workaround — it is a deliberate UX decision. A map pin jumping 10 times per second is worse for the operator than one smooth update per second. Batching is the right default.
Case Study 2: Driver Hours Compliance (Tachograph Rules)
The Problem
EU regulations require drivers to take a 45-minute break after 4.5 hours of continuous driving. A driver working shift cannot exceed 9 hours driving per day (extendable to 10 hours twice per week). Build a system that tracks this in real time and alerts dispatchers before a driver violates a rule.
Why This Requires a State Machine
The naive approach is to sum driving time from GPS speed data. This fails immediately:
- GPS speed is unreliable at low speeds and in tunnels
- A truck stopped at a red light for 3 minutes is not "taking a break"
- The regulations distinguish between driving, other work, availability, and rest — four distinct states, each with its own rules
The correct model is a finite state machine with explicit transitions:
States: DRIVING → REST → DRIVING (break resets counter)
DRIVING → OTHER_WORK
OTHER_WORK → AVAILABILITY
AVAILABILITY → REST
* → OFF_DUTY (end of shift)
Transitions triggered by:
- Speed > 5 km/h for 30 seconds → enter DRIVING
- Speed = 0 for 3+ minutes → enter REST candidate (not confirmed yet)
- Speed = 0 for 45+ minutes → confirmed REST (break counts)
- Manual input from driver terminal → OTHER_WORK, AVAILABILITYData Model
CREATE TABLE driver_duty_events (
id BIGSERIAL PRIMARY KEY,
driver_id UUID NOT NULL,
vehicle_id UUID NOT NULL,
state TEXT NOT NULL CHECK (state IN (
'driving','rest','other_work','availability','off_duty')),
started_at TIMESTAMPTZ NOT NULL,
ended_at TIMESTAMPTZ, -- NULL = current state
duration_s INT GENERATED ALWAYS AS (
EXTRACT(EPOCH FROM (COALESCE(ended_at, NOW()) - started_at))
) STORED
);
-- Current driving window: sum of DRIVING since last qualifying REST
CREATE VIEW driver_current_window AS
SELECT
driver_id,
SUM(duration_s) FILTER (WHERE state = 'driving') AS driving_seconds,
MAX(started_at) FILTER (WHERE state = 'driving') AS last_drive_start
FROM driver_duty_events
WHERE started_at > (
SELECT COALESCE(MAX(ended_at), NOW() - INTERVAL '24 hours')
FROM driver_duty_events e2
WHERE e2.driver_id = driver_duty_events.driver_id
AND e2.state = 'rest'
AND e2.duration_s >= 2700 -- 45 minutes qualifies as break
)
GROUP BY driver_id;Compliance Check Service (.NET)
public class DriverComplianceService
{
private const int MaxContinuousDrivingSeconds = 4 * 3600 + 30 * 60; // 4.5h
private const int DailyMaxDrivingSeconds = 9 * 3600;
private const int AlertThresholdPercent = 80;
public async Task<ComplianceStatus> GetStatusAsync(Guid driverId)
{
var window = await _repo.GetCurrentWindowAsync(driverId);
var dailyTotal = await _repo.GetDailyDrivingTotalAsync(driverId);
var continuousPercent =
(double)window.DrivingSeconds / MaxContinuousDrivingSeconds * 100;
var dailyPercent =
(double)dailyTotal / DailyMaxDrivingSeconds * 100;
return new ComplianceStatus
{
DriverId = driverId,
ContinuousDrivingSeconds = window.DrivingSeconds,
DailyDrivingSeconds = dailyTotal,
ContinuousPercent = continuousPercent,
DailyPercent = dailyPercent,
Alert = continuousPercent >= AlertThresholdPercent
|| dailyPercent >= AlertThresholdPercent,
Severity = GetSeverity(continuousPercent, dailyPercent)
};
}
private static AlertSeverity GetSeverity(double continuous, double daily)
{
var max = Math.Max(continuous, daily);
return max >= 100 ? AlertSeverity.Violation
: max >= 90 ? AlertSeverity.Critical
: max >= 80 ? AlertSeverity.Warning
: AlertSeverity.None;
}
}Interview point: Never store "remaining drive time" as a column. It is a derived value that changes every second while the driver is moving. Store events, derive metrics. Stored derived values go stale.
Case Study 3: Staged Firmware Rollout to 5,000 Devices
The Problem
A new firmware version fixes a critical bug in device telemetry reporting. It needs to be deployed to 5,000 devices globally. Some devices are currently in active use — interrupting them is not acceptable. The rollout must be pausable if problems emerge.
Wrong Approach
Pushing the update to all devices simultaneously. If the firmware has an issue (it will), you brick 5,000 devices at once. Recovery requires physical access to every device.
Correct Architecture: Ring-Based Rollout
Ring 0 (Canary): 1% → 50 devices — internal/test devices
Ring 1: 10% → 500 devices — low-risk locations
Ring 2: 40% → 2,000 devices
Ring 3: 100% → remaining
Each ring:
1. Schedule update window (off-hours, device must be idle)
2. Push update notification to devices in ring
3. Device downloads firmware, validates checksum
4. Device waits for idle state, applies update, reboots
5. Reports success/failure back to platform
6. Health check: if failure rate > 2%, PAUSE rollout automaticallyDevice State Machine
public enum DeviceUpdateState
{
Idle,
UpdateQueued,
Downloading,
DownloadComplete,
WaitingForIdle,
Applying,
Rebooting,
Updated,
Failed,
RolledBack
}
public class DeviceUpdateStateMachine
{
private static readonly Dictionary<DeviceUpdateState,
HashSet<DeviceUpdateState>> ValidTransitions = new()
{
[Idle] = new() { UpdateQueued },
[UpdateQueued] = new() { Downloading, Idle }, // Idle = cancelled
[Downloading] = new() { DownloadComplete, Failed },
[DownloadComplete] = new() { WaitingForIdle, Failed },
[WaitingForIdle] = new() { Applying },
[Applying] = new() { Rebooting, Failed },
[Rebooting] = new() { Updated, Failed },
[Failed] = new() { RolledBack, Idle },
[RolledBack] = new() { Idle },
[Updated] = new() { Idle }
};
public DeviceUpdateState Transition(
DeviceUpdateState current, DeviceUpdateState next)
{
if (!ValidTransitions[current].Contains(next))
throw new InvalidStateTransitionException(current, next);
return next;
}
}Rollout Controller (.NET)
public class FirmwareRolloutService
{
public async Task AdvanceRingAsync(Guid rolloutId)
{
var rollout = await _repo.GetRolloutAsync(rolloutId);
var currentRing = rollout.CurrentRing;
// Check health of current ring before advancing
var stats = await _repo.GetRingStatsAsync(rolloutId, currentRing);
var failureRate = (double)stats.Failed / stats.Total;
if (failureRate > 0.02) // 2% threshold
{
await _repo.PauseRolloutAsync(rolloutId,
$"Ring {currentRing} failure rate {failureRate:P0} exceeds threshold");
await _alertService.NotifyAsync(rollout.OwnerId,
AlertType.RolloutPaused, rolloutId);
return;
}
// Queue next ring
var nextDevices = await _repo.GetDevicesForRingAsync(
rolloutId, currentRing + 1);
foreach (var device in nextDevices)
{
await _updateQueue.EnqueueAsync(new DeviceUpdateJob
{
DeviceId = device.Id,
FirmwareVersion = rollout.TargetVersion,
ScheduledWindow = rollout.UpdateWindowUtc
});
}
await _repo.AdvanceRingAsync(rolloutId);
}
}Case Study 4: Fuel Consumption Anomaly Detection
The Problem
A truck's historical average is 32 L/100 km. Today it reported 58 L/100 km over a 200 km run. Determine whether this is sensor noise, route variance, fuel theft, or mechanical fault — and alert with appropriate confidence.
Why Fleet-Wide Averages Are Wrong
Different truck models, different load weights, different terrain, different drivers all produce different baselines. A 58 L/100 km reading for a fully loaded 44-tonne articulated lorry climbing alpine roads is normal. The same reading for an empty van on a motorway is an emergency.
Baseline must be per vehicle, per route profile — not fleet-wide.
Anomaly Detection Model
public class FuelAnomalyDetector
{
// Z-score: how many standard deviations from vehicle's own baseline
public AnomalyResult Evaluate(
Guid vehicleId,
double reportedConsumption,
RouteProfile route)
{
var baseline = _stats.GetBaseline(vehicleId, route.TerrainCategory);
if (baseline.SampleCount < 10)
return AnomalyResult.InsufficientData(vehicleId);
var zScore = (reportedConsumption - baseline.Mean) / baseline.StdDev;
// Correlate with route factors before concluding
var adjustedConsumption = ApplyRouteCorrection(
reportedConsumption, route);
var adjustedZScore =
(adjustedConsumption - baseline.Mean) / baseline.StdDev;
return new AnomalyResult
{
VehicleId = vehicleId,
ReportedConsumption = reportedConsumption,
BaselineMean = baseline.Mean,
ZScore = adjustedZScore,
Severity = ClassifySeverity(adjustedZScore),
PossibleCauses = InferCauses(adjustedZScore, route, reportedConsumption)
};
}
private static AnomalySeverity ClassifySeverity(double zScore) =>
Math.Abs(zScore) switch
{
> 3.0 => AnomalySeverity.Critical, // Less than 0.3% probability if normal
> 2.0 => AnomalySeverity.High,
> 1.5 => AnomalySeverity.Medium,
_ => AnomalySeverity.None
};
private static IEnumerable<string> InferCauses(
double zScore, RouteProfile route, double consumption)
{
var causes = new List<string>();
if (zScore > 3.0 && route.AverageGradientPercent < 2)
causes.Add("Possible fuel theft — high consumption on flat route");
if (zScore > 2.0 && route.AverageLoadKg < 5000)
causes.Add("Possible mechanical fault — high consumption on light load");
if (zScore > 1.5 && route.AverageTemperatureCelsius < -10)
causes.Add("Cold weather variance — within expected range for conditions");
return causes;
}
private static double ApplyRouteCorrection(
double consumption, RouteProfile route)
{
// Each 1% average gradient adds approximately 8% to consumption
var gradientFactor = 1 + (route.AverageGradientPercent * 0.08);
// Each 1000 kg load adds approximately 2% to consumption
var loadFactor = 1 + (route.AverageLoadKg / 1000 * 0.02);
return consumption / (gradientFactor * loadFactor);
}
}Interview point: Present anomalies as a confidence range, not a binary flag. "58 L/100 km — 3.2 standard deviations above this vehicle's baseline on comparable routes. Possible causes: fuel theft (high confidence), mechanical fault (medium confidence)" is far more actionable than "ALERT: high fuel consumption."
Case Study 5: Maintenance Scheduling with Fleet Constraints
The Problem
Vehicles need service every 10,000 km or 6 months, whichever comes first. A fleet of 300 vehicles cannot have more than 15% (45 vehicles) in service simultaneously — operations would break down. Build a scheduler that plans upcoming maintenance, respects the capacity constraint, and notifies managers before due dates are breached.
Data Model
CREATE TABLE vehicles (
id UUID PRIMARY KEY,
fleet_id UUID NOT NULL,
registration TEXT NOT NULL,
current_odometer_km INT NOT NULL,
last_service_at TIMESTAMPTZ,
last_service_km INT
);
-- Due date is derived — never stored, always calculated
CREATE VIEW vehicle_maintenance_due AS
SELECT
v.id,
v.fleet_id,
v.registration,
LEAST(
v.last_service_at + INTERVAL '6 months',
NOW() + (
(v.last_service_km + 10000 - v.current_odometer_km)
-- Estimate days based on average daily km
/ NULLIF(avg_daily.km_per_day, 0)
* INTERVAL '1 day'
)
) AS due_at,
v.current_odometer_km - v.last_service_km AS km_since_service
FROM vehicles v
LEFT JOIN LATERAL (
SELECT AVG(daily_km) AS km_per_day
FROM vehicle_daily_usage
WHERE vehicle_id = v.id
AND recorded_date > NOW() - INTERVAL '30 days'
) avg_daily ON true;Scheduler Service (.NET)
public class MaintenanceScheduler
{
private const double MaxFleetInServicePercent = 0.15;
public async Task<SchedulePlan> GeneratePlanAsync(
Guid fleetId, DateRange planningHorizon)
{
var vehicles = await _repo.GetVehiclesDueInRangeAsync(
fleetId, planningHorizon);
var fleetSize = await _repo.GetFleetSizeAsync(fleetId);
var maxPerDay = (int)Math.Floor(fleetSize * MaxFleetInServicePercent);
// Sort by urgency: most overdue / closest to due first
var sorted = vehicles
.OrderBy(v => v.DueAt)
.ThenByDescending(v => v.KmSinceService)
.ToList();
var plan = new SchedulePlan();
var bookings = new Dictionary<DateOnly, int>(); // date → count booked
foreach (var vehicle in sorted)
{
// Find earliest available slot within capacity
var slot = FindEarliestSlot(
vehicle.DueAt, bookings, maxPerDay, planningHorizon);
if (slot is null)
{
plan.Unschedulable.Add(vehicle);
continue;
}
bookings[slot.Value] = bookings.GetValueOrDefault(slot.Value) + 1;
plan.Scheduled.Add(new ScheduledMaintenance
{
VehicleId = vehicle.Id,
ScheduledDate = slot.Value,
DaysUntilDue = (vehicle.DueAt.Date - DateTime.UtcNow.Date).Days,
IsOverdue = vehicle.DueAt < DateTime.UtcNow
});
}
return plan;
}
private static DateOnly? FindEarliestSlot(
DateTime dueAt,
Dictionary<DateOnly, int> bookings,
int maxPerDay,
DateRange horizon)
{
// Work backwards from due date — schedule as late as safe
var target = DateOnly.FromDateTime(dueAt);
for (var d = target; d >= DateOnly.FromDateTime(horizon.Start); d = d.AddDays(-1))
{
if (bookings.GetValueOrDefault(d) < maxPerDay)
return d;
}
// No slot before due — try after (overdue case)
for (var d = target.AddDays(1);
d <= DateOnly.FromDateTime(horizon.End);
d = d.AddDays(1))
{
if (bookings.GetValueOrDefault(d) < maxPerDay)
return d;
}
return null;
}
}Case Study 6: Multi-Tenant Access Control
The Problem
The platform serves 50 fleet companies. Company A's dispatchers must never see Company B's vehicles or data. But a logistics partner needs read-only access to specific vehicles across both companies. A national manager needs to see all sites in Norway but not in Sweden.
Why Simple JWT Claims Are Not Enough
A common mistake is to put tenant_id in the JWT and filter by it at the application layer. This works until:
- A developer forgets the filter in one query
- A bug in the auth middleware passes the wrong claim
- A token is replayed or forged
The correct pattern enforces isolation at the database layer, not just the application layer.
Row-Level Security in PostgreSQL
-- Enable RLS on all tenant-scoped tables
ALTER TABLE vehicles ENABLE ROW LEVEL SECURITY;
-- Policy: users see only their own tenant's rows
CREATE POLICY tenant_isolation ON vehicles
USING (fleet_id = current_setting('app.tenant_id')::uuid);
-- Service sets tenant context at connection time
-- Never trust client to pass tenant_id in query parametersHierarchical Permission Model
// Permission hierarchy: Organisation → Region → Site → Vehicle
public class PermissionScope
{
public Guid? OrganisationId { get; init; }
public Guid? RegionId { get; init; }
public Guid? SiteId { get; init; }
public Guid? VehicleId { get; init; }
public AccessLevel Level { get; init; }
}
public enum AccessLevel { Read, Write, Admin }
// Cross-tenant partner access: explicit vehicle list
public class CrossTenantGrant
{
public Guid GranteeOrganisationId { get; init; }
public Guid[] AllowedVehicleIds { get; init; }
public AccessLevel Level { get; init; }
public DateTime ExpiresAt { get; init; }
}
public class VehicleAuthorizationService
{
public async Task<bool> CanAccessVehicleAsync(
Guid userId, Guid vehicleId, AccessLevel required)
{
// Check own-tenant hierarchy first
var scope = await _repo.GetUserScopeAsync(userId);
if (await IsInScope(scope, vehicleId)) return true;
// Check cross-tenant grants (partner access)
var grants = await _repo.GetCrossTenantGrantsAsync(userId);
return grants.Any(g =>
g.AllowedVehicleIds.Contains(vehicleId) &&
g.Level >= required &&
g.ExpiresAt > DateTime.UtcNow);
}
}Interview point: Row-level security at the database level means even a buggy query cannot leak cross-tenant data. Defence in depth — app layer AND database layer both enforce isolation independently.
Case Study 7: Offline-Capable Mobile App for Field Technicians
The Problem
A field technician visits a facility to service devices. The building has no WiFi in the equipment rooms and poor mobile signal in the basement. The technician must be able to log maintenance activities, capture photos, scan barcodes, and update device status — all while offline. Data must sync correctly when connectivity returns.
Architecture: Offline-First
Device (React Native)
├── SQLite (local queue) ← all writes go here first
├── Background sync service ← monitors connectivity
└── Conflict resolver ← handles merge on reconnect
API Server
├── REST endpoints ← accept synced events
└── Append-only event log ← source of truthLocal Queue Design
// Every action is an event, not a mutation
interface LocalEvent {
id: string; // client-generated UUID
type: string; // "DEVICE_SERVICED" | "PHOTO_CAPTURED" etc
payload: object;
createdAt: string; // ISO timestamp from device clock
syncedAt: string | null;
retryCount: number;
}
class OfflineQueue {
async enqueue(type: string, payload: object): Promise<void> {
const event: LocalEvent = {
id: crypto.randomUUID(),
type,
payload,
createdAt: new Date().toISOString(),
syncedAt: null,
retryCount: 0,
};
await db.insert("local_events", event);
// Optimistic UI: update local state immediately
this.applyLocally(event);
}
async syncPending(): Promise<void> {
const pending = await db.query<LocalEvent>(
"SELECT * FROM local_events WHERE syncedAt IS NULL ORDER BY createdAt"
);
for (const event of pending) {
try {
await api.post("/events", event);
await db.update("local_events",
{ syncedAt: new Date().toISOString() },
{ id: event.id }
);
} catch {
await db.update("local_events",
{ retryCount: event.retryCount + 1 },
{ id: event.id }
);
// Exponential backoff — don't hammer a struggling server
await sleep(Math.min(1000 * 2 ** event.retryCount, 30000));
}
}
}
}Conflict Resolution Strategy
// Server-side: events are append-only — conflicts cannot occur on the log itself
// Conflict resolution only needed when deriving current state from events
public class DeviceStateProjector
{
public DeviceState Project(IEnumerable<DeviceEvent> events)
{
// Last writer wins for most fields
// Exception: status transitions must follow the state machine
var state = new DeviceState();
foreach (var evt in events.OrderBy(e => e.CreatedAt))
{
switch (evt.Type)
{
case "DEVICE_SERVICED":
// Only accept if it follows a valid state transition
if (_stateMachine.IsValidTransition(state.Status, "serviced"))
state.LastServicedAt = evt.CreatedAt;
break;
case "PHOTO_CAPTURED":
// Photos are additive — no conflict possible
state.Photos.Add(evt.Payload.Get<string>("blobUrl"));
break;
case "STATUS_UPDATED":
// Last writer wins — accept most recent timestamp
state.Status = evt.Payload.Get<string>("status");
break;
}
}
return state;
}
}Interview point: The append-only event log is the key insight. If two technicians both log maintenance on the same device while offline, both events are valid — they both happened. The conflict is only in deriving the "current" state, not in the log itself. This is event sourcing applied to mobile sync.
Case Study 8: Live ETA Calculation
The Problem
A customer is waiting for a delivery. The driver has 45 stops remaining and is currently 12 stops into the route. Build a system that shows the customer a live ETA that updates as the driver progresses, handles traffic, and communicates uncertainty honestly.
Why Point ETAs Destroy Trust
"Your delivery will arrive at 14:17" creates a precise expectation that will almost certainly be wrong. Traffic, a long stop, a wrong address — any of these shifts the ETA. When the customer sees 14:17 → 14:23 → 14:31, they lose trust in the platform.
Present ETAs as ranges: "Between 14:00 and 14:30" is almost always right, and customers feel better about it.
ETA Calculation Service (.NET)
public class LiveEtaService
{
public async Task<EtaEstimate> CalculateAsync(
Guid driverId, Guid targetStopId)
{
var route = await _repo.GetActiveRouteAsync(driverId);
var currentPosition = await _redis.GetPositionAsync(driverId);
var remainingStops = route.GetStopsAfterCurrent(targetStopId);
// Get traffic-adjusted travel time for each remaining leg
var legDurations = await _trafficService.GetDurationsAsync(
currentPosition,
remainingStops.Select(s => s.Location));
// Add service time per stop (historical average per stop type)
var serviceTime = remainingStops
.Sum(s => _historicalService.GetAverageServiceTime(s.StopType));
var totalSeconds = legDurations.Sum() + serviceTime;
// Uncertainty grows with number of remaining stops
// Each stop adds roughly 2 minutes of variance
var uncertaintySeconds = remainingStops.Count * 120;
var eta = DateTime.UtcNow.AddSeconds(totalSeconds);
return new EtaEstimate
{
EstimatedArrival = eta,
EarliestArrival = eta.AddSeconds(-uncertaintySeconds),
LatestArrival = eta.AddSeconds(uncertaintySeconds),
RemainingStops = remainingStops.Count,
ConfidenceLevel = CalculateConfidence(remainingStops.Count)
};
}
private static ConfidenceLevel CalculateConfidence(int remainingStops) =>
remainingStops switch
{
<= 3 => ConfidenceLevel.High,
<= 10 => ConfidenceLevel.Medium,
_ => ConfidenceLevel.Low
};
}Customer-Facing React Component
function EtaDisplay({ eta }: { eta: EtaEstimate }) {
const format = (d: Date) =>
d.toLocaleTimeString("en-GB", { hour: "2-digit", minute: "2-digit" });
return (
<div className="eta-card">
{eta.remainingStops <= 3 ? (
// High confidence — show narrow window
<p>
Arriving between{" "}
<strong>{format(eta.earliestArrival)}</strong> and{" "}
<strong>{format(eta.latestArrival)}</strong>
</p>
) : (
// Low confidence — show only the date and broad window
<p>
Expected today between{" "}
<strong>
{format(eta.earliestArrival)} – {format(eta.latestArrival)}
</strong>
</p>
)}
<p className="stops-remaining">
{eta.remainingStops} stop{eta.remainingStops !== 1 ? "s" : ""} before yours
</p>
</div>
);
}Push Updates via SSE
// Server-Sent Events endpoint — simpler than WebSockets for one-way push
[HttpGet("track/{trackingCode}")]
public async Task TrackDelivery(string trackingCode)
{
Response.Headers["Content-Type"] = "text/event-stream";
Response.Headers["Cache-Control"] = "no-cache";
while (!HttpContext.RequestAborted.IsCancellationRequested)
{
var eta = await _etaService.CalculateAsync(trackingCode);
var json = JsonSerializer.Serialize(eta);
await Response.WriteAsync($"data: {json}\n\n");
await Response.Body.FlushAsync();
// Recalculate every 30 seconds — not on every GPS ping
// Customers don't need sub-second ETA updates
await Task.Delay(30_000, HttpContext.RequestAborted);
}
}Patterns That Appear Across All Cases
| Pattern | When It Applies | Wrong Default | |---------|----------------|---------------| | Event sourcing | Any audit requirement, any offline sync | Mutable rows with UPDATE | | State machines | Driver hours, device updates, order status | Boolean flags and counters | | Derived values | ETAs, compliance scores, due dates | Stored computed columns | | Baseline per entity | Anomaly detection | Fleet-wide averages | | Row-level security | Multi-tenant data | App-layer filtering only | | Ring-based rollout | Any update to stateful devices | Big-bang deployment | | Buffered UI updates | Real-time dashboards | Rendering on every event | | Range ETAs | Any time estimate with uncertainty | Point estimates |
Azure Services Relevant to This Domain
| Problem | Azure Service | Why | |---------|--------------|-----| | Device telemetry ingestion | IoT Hub | Device auth, routing, twin state | | Event fan-out | Event Hub | High-throughput, multiple consumers | | Time-series storage | TimescaleDB on Azure VM / Azure Data Explorer | Efficient range queries on time data | | Live push to browser | SignalR Service | Managed WebSocket hub, scales horizontally | | Identity & multi-tenancy | Azure AD B2C | External customer identity, per-tenant policies | | Background jobs | Azure Service Bus | Reliable queue, dead-letter, scheduled messages | | File / photo storage | Azure Blob Storage | Cheap, durable, direct upload from mobile | | CI/CD | Azure DevOps / GitHub Actions | Ring-based deployment gates |
How to Answer These in an Interview
The evaluators are not looking for the perfect answer. They are looking for how you think under ambiguity. Three things win every time:
1. Question the requirement before designing. "500 vehicles every 5 seconds — is that 500 globally or 500 in one region? That changes whether we need a single Event Hub partition or a regional fan-out."
2. Identify the constraint that drives the architecture. "The core constraint here is that you cannot update a device while it is in active use. Everything else — the state machine, the scheduling window, the ring rollout — follows from that one constraint."
3. Know the failure modes. "The happy path is easy. What happens when the mobile app syncs 300 queued events after 8 hours offline? What happens when ring 2 has a 5% failure rate? What happens when two dispatchers edit the same route simultaneously?" Candidates who think about failure modes get hired. Candidates who only describe the happy path do not.
Enjoyed this article?
Explore the Backend Systems learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.