Learnixo
Back to blog
System Designadvanced

System Design: Ride-Sharing Platform in .NET — Real-Time Location, Matching, and Distributed State

Design an Uber-style ride-sharing platform in .NET: real-time driver location tracking with Redis geo-hash, driver-rider matching algorithm, trip state machine, surge pricing, and distributed coordination without distributed transactions.

Asma Hafeez KhanMay 26, 202621 min read
C#.NETRide-SharingRedisGeo-QueriesState MachineSystem DesignCase StudyReal-Time
Share:𝕏

System Design: Ride-Sharing Platform in .NET — Real-Time Location, Matching, and Distributed State

System: Urban ride-sharing — riders, drivers, real-time dispatch, payment
Stack: ASP.NET Core 9, SignalR, Redis 7 (Geo + Pub/Sub), PostgreSQL 16, MassTransit, Stateless 5
Architecture: Event-driven microservices with saga orchestration; no distributed transactions
Scale target: 50,000 active drivers, 200,000 concurrent riders, sub-2-second matching latency

This case study is about the decisions that are invisible in the happy path but catastrophic when they fail: why you cannot use a relational database for driver locations, why distributed transactions are the wrong tool for trip coordination, and how a state machine prevents the class of bugs where a trip ends up in an impossible state.


System Overview

The ride-sharing domain has four hard real-time constraints that shape the entire architecture:

  1. Driver locations change every 4 seconds. Writing 50,000 updates per second to PostgreSQL would consume all write IOPS before a single rider query runs.
  2. Matching must happen in under 2 seconds from request to driver assignment. This excludes any approach that serializes through a single database row lock.
  3. A rider must receive live driver location updates during approach and trip. This requires a push mechanism — polling is not acceptable at scale.
  4. Payment must execute exactly once even if the completion event is delivered multiple times (network retries, consumer restarts).
Write path for location updates:
  Driver App → POST /api/location
    → DriverLocationService → Redis GEOADD (O(log N))
    → Redis PUBLISH to channel "driver:{driverId}:location"
    → SignalR Hub → forward to subscribed rider (if trip in progress)

Matching path:
  Rider App → POST /api/trips
    → TripCommandHandler → persist TripRequest (Postgres)
    → MatchingService → Redis GEOSEARCH (find nearest available drivers)
    → Assign best candidate → update driver status in Redis + Postgres
    → Push TripAssigned event to rider via SignalR

Trip lifecycle:
  TripStateMachine (Stateless) governs all transitions
  State persisted to Postgres after every transition
  Events published to MassTransit → async workers handle payment, notifications

Data Model

Driver Location in Redis (Not PostgreSQL)

The driver location is not stored in PostgreSQL. It lives in Redis with a TTL of 30 seconds. If a driver's app crashes or loses connectivity, their location expires automatically — they cannot be matched to a new rider.

C#
// DriverLocationService.cs
public sealed class DriverLocationService(
    IConnectionMultiplexer redis,
    ILogger<DriverLocationService> logger)
{
    private const string AvailableDriversGeoKey = "drivers:available:geo";
    private const string DriverStatusKeyPrefix = "driver:status:";
    private static readonly TimeSpan LocationTtl = TimeSpan.FromSeconds(30);

    public async Task UpdateLocationAsync(
        Guid driverId,
        double latitude,
        double longitude,
        VehicleType vehicleType,
        CancellationToken ct)
    {
        var db = redis.GetDatabase();
        var driverIdStr = driverId.ToString();

        // GEOADD stores the driver in the geo set — O(log N)
        // The member name encodes vehicle type so we can filter in GEOSEARCH
        var member = $"{driverIdStr}:{vehicleType}";
        await db.GeoAddAsync(
            AvailableDriversGeoKey,
            new GeoEntry(longitude, latitude, member));

        // Refresh the driver-specific status hash with a rolling TTL
        var statusKey = $"{DriverStatusKeyPrefix}{driverIdStr}";
        var batch = db.CreateBatch();
        _ = batch.HashSetAsync(statusKey, new[]
        {
            new HashEntry("lat", latitude),
            new HashEntry("lon", longitude),
            new HashEntry("vehicleType", vehicleType.ToString()),
            new HashEntry("updatedAt", DateTimeOffset.UtcNow.ToUnixTimeSeconds())
        });
        _ = batch.KeyExpireAsync(statusKey, LocationTtl);
        batch.Execute();

        logger.LogDebug("Driver {DriverId} location updated: {Lat},{Lon}",
            driverId, latitude, longitude);
    }

    public async Task<IReadOnlyList<NearbyDriver>> FindNearbyDriversAsync(
        double riderLatitude,
        double riderLongitude,
        VehicleType? requiredVehicleType,
        double radiusKm,
        int maxResults,
        CancellationToken ct)
    {
        var db = redis.GetDatabase();

        // GEOSEARCH: find all members within radius, sorted by distance — O(N+log M)
        var results = await db.GeoSearchAsync(
            AvailableDriversGeoKey,
            riderLongitude,
            riderLatitude,
            new GeoSearchCircle(radiusKm, GeoUnit.Kilometers),
            count: maxResults * 3,    // over-fetch to allow vehicle type filtering
            order: Order.Ascending,
            options: GeoRadiusOptions.WithDistance | GeoRadiusOptions.WithCoordinates);

        var nearby = new List<NearbyDriver>(maxResults);
        foreach (var result in results)
        {
            // Member format: "{driverId}:{vehicleType}"
            var parts = result.Member.ToString().Split(':');
            if (parts.Length != 2) continue;

            if (!Guid.TryParse(parts[0], out var driverId)) continue;
            if (!Enum.TryParse<VehicleType>(parts[1], out var vehicleType)) continue;

            // Apply vehicle type filter if requested
            if (requiredVehicleType.HasValue && vehicleType != requiredVehicleType.Value)
                continue;

            nearby.Add(new NearbyDriver(
                driverId,
                vehicleType,
                result.Distance.GetValueOrDefault(),
                result.Position.GetValueOrDefault().Latitude,
                result.Position.GetValueOrDefault().Longitude));

            if (nearby.Count >= maxResults) break;
        }

        return nearby;
    }

    public async Task RemoveDriverFromAvailablePoolAsync(Guid driverId, VehicleType vehicleType)
    {
        var db = redis.GetDatabase();
        var member = $"{driverId}:{vehicleType}";
        await db.GeoRemoveAsync(AvailableDriversGeoKey, member);
    }
}

Trip Entity (PostgreSQL — the durable record)

C#
// Trip.cs — the aggregate that owns trip state
public sealed class Trip
{
    public Guid Id { get; private set; }
    public Guid RiderId { get; private set; }
    public Guid? DriverId { get; private set; }
    public TripStatus Status { get; private set; }
    public VehicleType RequestedVehicleType { get; private set; }

    public double PickupLatitude { get; private set; }
    public double PickupLongitude { get; private set; }
    public double DropoffLatitude { get; private set; }
    public double DropoffLongitude { get; private set; }

    public decimal? EstimatedFare { get; private set; }
    public decimal? ActualFare { get; private set; }
    public decimal? SurgeMultiplier { get; private set; }

    public DateTimeOffset RequestedAt { get; private set; }
    public DateTimeOffset? DriverAssignedAt { get; private set; }
    public DateTimeOffset? DriverArrivedAt { get; private set; }
    public DateTimeOffset? TripStartedAt { get; private set; }
    public DateTimeOffset? TripCompletedAt { get; private set; }
    public DateTimeOffset? PaymentProcessedAt { get; private set; }

    // Optimistic concurrency
    public uint Version { get; private set; }

    private readonly List<DomainEvent> _domainEvents = new();
    public IReadOnlyList<DomainEvent> DomainEvents => _domainEvents;

    private Trip() { }

    public static Trip Request(
        Guid riderId,
        double pickupLat, double pickupLon,
        double dropoffLat, double dropoffLon,
        VehicleType vehicleType,
        decimal surgeMultiplier)
    {
        var trip = new Trip
        {
            Id = Guid.NewGuid(),
            RiderId = riderId,
            Status = TripStatus.Requested,
            RequestedVehicleType = vehicleType,
            PickupLatitude = pickupLat,
            PickupLongitude = pickupLon,
            DropoffLatitude = dropoffLat,
            DropoffLongitude = dropoffLon,
            SurgeMultiplier = surgeMultiplier,
            RequestedAt = DateTimeOffset.UtcNow
        };
        trip._domainEvents.Add(new TripRequestedEvent(trip.Id, riderId));
        return trip;
    }

    public void AssignDriver(Guid driverId, decimal estimatedFare)
    {
        GuardAgainstInvalidTransition(TripStatus.Requested, TripStatus.DriverAssigned);
        DriverId = driverId;
        EstimatedFare = estimatedFare;
        Status = TripStatus.DriverAssigned;
        DriverAssignedAt = DateTimeOffset.UtcNow;
        _domainEvents.Add(new DriverAssignedEvent(Id, driverId, estimatedFare));
    }

    public void MarkDriverArrived()
    {
        GuardAgainstInvalidTransition(TripStatus.DriverAssigned, TripStatus.DriverArrived);
        Status = TripStatus.DriverArrived;
        DriverArrivedAt = DateTimeOffset.UtcNow;
        _domainEvents.Add(new DriverArrivedEvent(Id));
    }

    public void Start()
    {
        GuardAgainstInvalidTransition(TripStatus.DriverArrived, TripStatus.InProgress);
        Status = TripStatus.InProgress;
        TripStartedAt = DateTimeOffset.UtcNow;
        _domainEvents.Add(new TripStartedEvent(Id));
    }

    public void Complete(decimal actualFare)
    {
        GuardAgainstInvalidTransition(TripStatus.InProgress, TripStatus.Completed);
        Status = TripStatus.Completed;
        ActualFare = actualFare;
        TripCompletedAt = DateTimeOffset.UtcNow;
        _domainEvents.Add(new TripCompletedEvent(Id, actualFare));
    }

    public void MarkPaymentProcessed()
    {
        GuardAgainstInvalidTransition(TripStatus.Completed, TripStatus.Paid);
        Status = TripStatus.Paid;
        PaymentProcessedAt = DateTimeOffset.UtcNow;
        _domainEvents.Add(new PaymentProcessedEvent(Id, ActualFare!.Value));
    }

    private void GuardAgainstInvalidTransition(TripStatus expected, TripStatus next)
    {
        if (Status != expected)
            throw new InvalidOperationException(
                $"Cannot transition trip {Id} from {Status} to {next}. Expected current status: {expected}.");
    }

    public void ClearDomainEvents() => _domainEvents.Clear();
}

public enum TripStatus
{
    Requested,
    DriverAssigned,
    DriverArrived,
    InProgress,
    Completed,
    Paid,
    Cancelled
}

Key Design Decisions

Decision 1: Redis Geo vs PostgreSQL for Driver Locations

PostgreSQL has a PostGIS extension with full geospatial support. We evaluated it first because it would reduce infrastructure complexity (one less store). It failed the load test.

At 50,000 active drivers updating every 4 seconds, that is 12,500 writes per second to PostgreSQL. Combined with the matcher issuing GEOSEARCH-equivalent queries every time a rider requests a trip, and the analytical queries from the operations team, PostgreSQL's write buffer was saturated. Index maintenance (the GiST spatial index) competed with write throughput.

Redis GEOADD is O(log N) with in-memory speed. At 50,000 members, each GEOADD completes in under 0.2ms. The entire geo set fits in under 10MB of RAM. Redis Cluster replicates it across availability zones. The TTL-based expiry for disconnected drivers is a feature, not a workaround.

The key design principle: use the right tool for the access pattern, not the tool you already have.

Decision 2: The Trip State Machine with Stateless

Trip status is a state machine. A trip can only move forward through defined states. Writing this as a series of if checks scattered across controllers and services guarantees that you will eventually accept an impossible transition — a rider paying for a trip that never started, or a driver marked arrived before being assigned.

The Stateless library gives you a declarative state machine that raises InvalidOperationException on any invalid transition, with transition hooks for persistence and event publication.

C#
// TripStateMachine.cs
public sealed class TripStateMachineFactory
{
    public StateMachine<TripStatus, TripTrigger> Create(
        Trip trip,
        Func<TripStatus, Task> onTransition)
    {
        var machine = new StateMachine<TripStatus, TripTrigger>(
            stateAccessor: () => trip.Status,
            stateMutator: s => { /* state is mutated via trip.AssignDriver() etc., not directly */ });

        machine.Configure(TripStatus.Requested)
            .Permit(TripTrigger.AssignDriver, TripStatus.DriverAssigned)
            .Permit(TripTrigger.Cancel, TripStatus.Cancelled);

        machine.Configure(TripStatus.DriverAssigned)
            .Permit(TripTrigger.DriverArrived, TripStatus.DriverArrived)
            .Permit(TripTrigger.Cancel, TripStatus.Cancelled);

        machine.Configure(TripStatus.DriverArrived)
            .Permit(TripTrigger.StartTrip, TripStatus.InProgress)
            .Permit(TripTrigger.Cancel, TripStatus.Cancelled);

        machine.Configure(TripStatus.InProgress)
            .Permit(TripTrigger.CompleteTrip, TripStatus.Completed)
            // Cannot cancel an in-progress trip — driver must complete
            .OnEntry(() => _ = onTransition(TripStatus.InProgress));

        machine.Configure(TripStatus.Completed)
            .Permit(TripTrigger.ProcessPayment, TripStatus.Paid)
            .OnEntry(() => _ = onTransition(TripStatus.Completed));

        machine.Configure(TripStatus.Paid)
            .OnEntry(() => _ = onTransition(TripStatus.Paid));

        machine.OnTransitioned(t =>
            Console.WriteLine($"Trip transitioned {t.Source} → {t.Destination} via {t.Trigger}"));

        return machine;
    }
}

public enum TripTrigger
{
    AssignDriver,
    DriverArrived,
    StartTrip,
    CompleteTrip,
    ProcessPayment,
    Cancel
}

Decision 3: Why There Are No Distributed Transactions

When a rider requests a trip, three things must happen: create the trip record (PostgreSQL), mark the driver as unavailable (Redis + PostgreSQL), and notify the driver (SignalR + push notification). These span three different systems. A distributed transaction across all three is not possible in a system that includes Redis and SignalR.

Instead, we use a saga with compensating actions and accept that individual steps may fail and retry. The key insight: idempotency at each step makes retries safe.

  • Creating the trip record is idempotent: we use INSERT ... ON CONFLICT DO NOTHING with a rider-generated request ID.
  • Marking the driver unavailable is idempotent: removing a member from a Redis set twice has the same result as removing it once.
  • Sending a push notification is idempotent: the driver app discards duplicate assignment notifications by checking the trip ID.

The saga coordinator persists its state after each successful step. On restart, it resumes from the last completed step.


Challenges and Solutions

Challenge 1: The Matching Race Condition

Two concurrent trip requests arrive at the same millisecond. Both GEOSEARCH calls return the same nearest driver. Both sagas try to assign the same driver.

The naive solution — first write wins — means one saga assigns the driver, the second saga also assigns the same driver to a different trip, and you now have one driver serving two trips simultaneously.

The solution is optimistic locking on the driver assignment step. We use a Redis distributed lock (Redlock pattern via StackExchange.Redis) scoped to the driver ID, held only for the duration of the assignment check-and-write. The lock duration is 500ms — enough to complete the assignment write but short enough to not block other matching operations.

C#
// MatchingService.cs
public sealed class MatchingService(
    DriverLocationService locationService,
    IDriverRepository driverRepo,
    IConnectionMultiplexer redis,
    ILogger<MatchingService> logger)
{
    private static readonly TimeSpan LockDuration = TimeSpan.FromMilliseconds(500);
    private static readonly TimeSpan LockRetryDelay = TimeSpan.FromMilliseconds(50);
    private const int LockRetryCount = 3;

    public async Task<MatchResult> FindAndAssignDriverAsync(
        Trip trip,
        CancellationToken ct)
    {
        var candidates = await locationService.FindNearbyDriversAsync(
            trip.PickupLatitude,
            trip.PickupLongitude,
            requiredVehicleType: trip.RequestedVehicleType,
            radiusKm: 5.0,
            maxResults: 10,
            ct);

        if (candidates.Count == 0)
            return MatchResult.NoDriversAvailable();

        // Try candidates in order of proximity
        foreach (var candidate in candidates)
        {
            var lockKey = $"driver:assignment:lock:{candidate.DriverId}";
            var lockValue = Guid.NewGuid().ToString();

            bool acquired = false;
            for (int attempt = 0; attempt < LockRetryCount && !acquired; attempt++)
            {
                acquired = await redis.GetDatabase()
                    .StringSetAsync(lockKey, lockValue,
                        LockDuration,
                        When.NotExists);

                if (!acquired)
                    await Task.Delay(LockRetryDelay, ct);
            }

            if (!acquired)
            {
                logger.LogWarning("Could not acquire assignment lock for driver {DriverId}",
                    candidate.DriverId);
                continue;
            }

            try
            {
                // Verify driver is still available in PostgreSQL (Redis is the cache; Postgres is truth)
                var driver = await driverRepo.GetAvailableDriverAsync(candidate.DriverId, ct);
                if (driver is null) continue;

                // Mark driver as busy in both Redis and Postgres
                await locationService.RemoveDriverFromAvailablePoolAsync(
                    candidate.DriverId, candidate.VehicleType);

                await driverRepo.MarkDriverAsOnTripAsync(candidate.DriverId, trip.Id, ct);

                return MatchResult.Success(candidate.DriverId, candidate.DistanceKm);
            }
            finally
            {
                // Release lock only if we still own it (check value)
                var script = LuaScript.Prepare(
                    "if redis.call('get', KEYS[1]) == ARGV[1] then " +
                    "  return redis.call('del', KEYS[1]) " +
                    "else " +
                    "  return 0 " +
                    "end");
                await redis.GetDatabase().ScriptEvaluateAsync(script,
                    new RedisKey[] { lockKey },
                    new RedisValue[] { lockValue });
            }
        }

        return MatchResult.NoDriversAvailable();
    }
}

Challenge 2: Real-Time Location Broadcast to Rider

During approach (DriverAssigned → DriverArrived), the rider app shows the driver moving on a map. This requires the server to push driver location updates to the specific rider. Polling from the rider app is not acceptable — at 200,000 concurrent riders polling every 2 seconds, that is 100,000 HTTP requests per second just for location.

We use SignalR with Redis backplane. When the driver app posts a location update, the location service publishes to a Redis Pub/Sub channel named driver:{driverId}:location. A SignalR background service subscribes to this channel and forwards updates to the rider who is currently tracking that driver.

C#
// LocationBroadcastService.cs
public sealed class LocationBroadcastService(
    IConnectionMultiplexer redis,
    IHubContext<RiderHub, IRiderClient> hubContext,
    IActiveTripIndex activeTripIndex,
    ILogger<LocationBroadcastService> logger) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var subscriber = redis.GetSubscriber();

        await subscriber.SubscribeAsync(
            RedisChannel.Pattern("driver:*:location"),
            async (channel, message) =>
            {
                try
                {
                    // channel format: "driver:{driverId}:location"
                    var channelStr = channel.ToString();
                    var parts = channelStr.Split(':');
                    if (parts.Length < 3 || !Guid.TryParse(parts[1], out var driverId))
                        return;

                    // Find which rider is tracking this driver right now
                    var riderId = await activeTripIndex.GetRiderTrackingDriverAsync(driverId);
                    if (riderId is null) return;

                    var location = System.Text.Json.JsonSerializer
                        .Deserialize<DriverLocationUpdate>(message.ToString());
                    if (location is null) return;

                    // Push to rider's SignalR connection
                    await hubContext.Clients
                        .User(riderId.Value.ToString())
                        .ReceiveDriverLocation(location);
                }
                catch (Exception ex)
                {
                    logger.LogError(ex, "Failed to broadcast location update");
                }
            });

        await stoppingToken.WaitHandle.WaitOneAsync(stoppingToken);
    }
}

// IRiderClient.cs — strongly typed SignalR hub interface
public interface IRiderClient
{
    Task ReceiveDriverLocation(DriverLocationUpdate location);
    Task ReceiveTripStatusUpdate(TripStatusUpdate status);
    Task ReceiveDriverArrived();
    Task ReceiveTripCompleted(TripSummary summary);
}

Challenge 3: Surge Pricing Without Stale Data

Surge pricing depends on the ratio of available drivers to pending trip requests in a geographic cell. If this calculation uses stale data, you either under-charge during demand spikes (losing revenue) or over-charge when supply has recovered (losing riders to competitors).

We divide the city into H3 geo-cells at resolution 7 (approximately 5 km² per cell). Every minute, a background job counts active drivers and pending requests per cell and writes the result to Redis with a 90-second TTL. This means surge multipliers are at most 90 seconds stale — acceptable for pricing, which riders expect to change between app opens.

C#
// SurgePricingCalculator.cs
public sealed class SurgePricingCalculator(
    IConnectionMultiplexer redis,
    ILogger<SurgePricingCalculator> logger)
{
    // Surge thresholds: demand/supply ratio → multiplier
    private static readonly (double Threshold, decimal Multiplier)[] SurgeTiers =
    [
        (1.0,  1.0m),   // Normal: at most 1 rider per driver
        (1.5,  1.2m),   // Mild surge
        (2.0,  1.5m),   // Moderate surge
        (3.0,  1.8m),   // High surge
        (double.MaxValue, 2.5m) // Extreme surge cap
    ];

    public async Task<decimal> GetSurgeMultiplierAsync(
        double latitude,
        double longitude,
        CancellationToken ct)
    {
        // Convert lat/lon to H3 cell index at resolution 7
        var cellIndex = H3.LatLngToCell(latitude, longitude, resolution: 7);
        var surgeKey = $"surge:cell:{cellIndex}";

        var db = redis.GetDatabase();
        var cachedMultiplier = await db.StringGetAsync(surgeKey);

        if (cachedMultiplier.HasValue && decimal.TryParse(
                cachedMultiplier.ToString(), out var cached))
        {
            return cached;
        }

        // Cache miss — compute and store (race is acceptable; worst case two writes)
        logger.LogWarning("Surge cache miss for cell {CellIndex} — computing live", cellIndex);
        var multiplier = await ComputeSurgeMultiplierAsync(cellIndex, db);

        await db.StringSetAsync(surgeKey, multiplier.ToString("F2"),
            TimeSpan.FromSeconds(90));

        return multiplier;
    }

    private async Task<decimal> ComputeSurgeMultiplierAsync(ulong cellIndex, IDatabase db)
    {
        var driversKey = $"surge:drivers:{cellIndex}";
        var requestsKey = $"surge:requests:{cellIndex}";

        var driverCountStr = await db.StringGetAsync(driversKey);
        var requestCountStr = await db.StringGetAsync(requestsKey);

        var driverCount = driverCountStr.HasValue ? (int)driverCountStr : 1;
        var requestCount = requestCountStr.HasValue ? (int)requestCountStr : 0;

        // Avoid division by zero; floor driver count at 1
        var ratio = requestCount / Math.Max(driverCount, 1.0);

        foreach (var (threshold, multiplier) in SurgeTiers)
        {
            if (ratio <= threshold)
                return multiplier;
        }

        return SurgeTiers[^1].Multiplier;
    }
}

// SurgeDataRefreshWorker.cs — runs every 60 seconds
public sealed class SurgeDataRefreshWorker(
    IConnectionMultiplexer redis,
    IDriverRepository driverRepo,
    ITripRepository tripRepo,
    ILogger<SurgeDataRefreshWorker> logger) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        using var timer = new PeriodicTimer(TimeSpan.FromSeconds(60));
        while (await timer.WaitForNextTickAsync(stoppingToken))
        {
            try
            {
                await RefreshSurgeDataAsync(stoppingToken);
            }
            catch (Exception ex)
            {
                logger.LogError(ex, "Surge data refresh failed");
            }
        }
    }

    private async Task RefreshSurgeDataAsync(CancellationToken ct)
    {
        var db = redis.GetDatabase();
        var batch = db.CreateBatch();

        // Get all active drivers with their locations
        var activeDrivers = await driverRepo.GetActiveDriverLocationsAsync(ct);
        foreach (var (driverId, lat, lon) in activeDrivers)
        {
            var cellIndex = H3.LatLngToCell(lat, lon, resolution: 7);
            var key = $"surge:drivers:{cellIndex}";
            _ = batch.StringIncrementAsync(key);
            _ = batch.KeyExpireAsync(key, TimeSpan.FromSeconds(120));
        }

        // Get pending trip requests with pickup locations
        var pendingRequests = await tripRepo.GetPendingRequestLocationsAsync(ct);
        foreach (var (tripId, lat, lon) in pendingRequests)
        {
            var cellIndex = H3.LatLngToCell(lat, lon, resolution: 7);
            var key = $"surge:requests:{cellIndex}";
            _ = batch.StringIncrementAsync(key);
            _ = batch.KeyExpireAsync(key, TimeSpan.FromSeconds(120));
        }

        batch.Execute();
        logger.LogInformation("Surge data refreshed: {DriverCount} drivers, {RequestCount} requests",
            activeDrivers.Count, pendingRequests.Count);
    }
}

Challenge 4: Idempotent Payment in the Completion Saga

When a trip completes, the completion saga must: calculate fare, charge the rider's payment method, transfer funds to driver, and release the driver back to the available pool. The payment step is the only one with real money movement. It must execute exactly once.

The payment provider (Stripe in this case) supports idempotency keys natively. We derive the payment idempotency key from the trip ID: payment:{tripId}:charge. Even if the saga retries the payment step five times, Stripe returns the same charge result for the same idempotency key within 24 hours.

C#
// TripCompletionSaga.cs
public sealed class TripCompletionSaga(
    ITripRepository tripRepo,
    IFareCalculator fareCalculator,
    IPaymentService paymentService,
    IDriverAvailabilityService driverAvailability,
    INotificationService notifications,
    BankingDbContext dbContext)
{
    public async Task ExecuteAsync(Guid tripId, CancellationToken ct)
    {
        var trip = await tripRepo.GetAsync(tripId, ct)
            ?? throw new InvalidOperationException($"Trip {tripId} not found");

        // Step 1: Calculate actual fare (idempotent — deterministic from trip data)
        var fare = await fareCalculator.CalculateAsync(trip, ct);

        // Step 2: Complete the trip in Postgres (optimistic concurrency)
        trip.Complete(fare);

        try
        {
            await dbContext.SaveChangesAsync(ct);
        }
        catch (DbUpdateConcurrencyException)
        {
            // Another worker already completed this trip — load fresh state and continue
            await dbContext.Entry(trip).ReloadAsync(ct);
            if (trip.Status == TripStatus.Paid)
                return; // Already fully processed
        }

        // Step 3: Charge rider — idempotency key derived from trip ID
        var chargeIdempotencyKey = $"payment:{tripId}:charge";
        var chargeResult = await paymentService.ChargeRiderAsync(
            trip.RiderId,
            fare,
            trip.Currency,
            chargeIdempotencyKey,
            ct);

        if (!chargeResult.Succeeded)
        {
            // Payment failed — do NOT release driver yet; mark trip for manual review
            await MarkTripForPaymentReviewAsync(trip, chargeResult.FailureReason, ct);
            return;
        }

        // Step 4: Mark payment processed and release driver
        trip.MarkPaymentProcessed();
        await dbContext.SaveChangesAsync(ct);

        await driverAvailability.ReleaseDriverAsync(trip.DriverId!.Value, ct);

        // Step 5: Send receipts (best-effort, not part of saga guarantee)
        await notifications.SendTripReceiptAsync(trip.RiderId, trip.DriverId.Value, fare, ct);
    }
}

.NET Implementation — The Matching Endpoint

C#
// TripEndpoints.cs
app.MapPost("/api/v1/trips", async (
    [FromBody] TripRequest request,
    [FromServices] TripCommandHandler handler,
    [FromServices] IValidator<TripRequest> validator,
    HttpContext httpContext,
    CancellationToken ct) =>
{
    var validation = await validator.ValidateAsync(request, ct);
    if (!validation.IsValid)
        return Results.ValidationProblem(validation.ToDictionary());

    var riderId = httpContext.GetUserId();
    var result = await handler.HandleAsync(new CreateTripCommand(
        RiderId: riderId,
        PickupLatitude: request.PickupLatitude,
        PickupLongitude: request.PickupLongitude,
        DropoffLatitude: request.DropoffLatitude,
        DropoffLongitude: request.DropoffLongitude,
        VehicleType: request.VehicleType,
        RequestId: request.RequestId),  // client-generated for idempotency
        ct);

    return result.Status switch
    {
        CreateTripStatus.Created    => Results.Accepted(
                                        $"/api/v1/trips/{result.TripId}",
                                        new { tripId = result.TripId,
                                              estimatedFare = result.EstimatedFare,
                                              surgeMultiplier = result.SurgeMultiplier }),
        CreateTripStatus.NoDrivers  => Results.NotFound(new { error = "No drivers available in your area" }),
        CreateTripStatus.Duplicate  => Results.Ok(new { tripId = result.TripId, status = "already_created" }),
        _                           => Results.Problem("Unexpected error during trip creation")
    };
})
.WithName("CreateTrip")
.WithOpenApi()
.RequireAuthorization("RiderPolicy");

// Driver location update endpoint — called every 4 seconds by driver app
app.MapPost("/api/v1/drivers/{driverId:guid}/location", async (
    Guid driverId,
    [FromBody] LocationUpdate update,
    [FromServices] DriverLocationService locationService,
    HttpContext httpContext,
    CancellationToken ct) =>
{
    // Validate that the authenticated driver matches the route parameter
    var authenticatedDriverId = httpContext.GetUserId();
    if (authenticatedDriverId != driverId)
        return Results.Forbid();

    await locationService.UpdateLocationAsync(
        driverId,
        update.Latitude,
        update.Longitude,
        update.VehicleType,
        ct);

    return Results.NoContent();
})
.WithName("UpdateDriverLocation")
.WithOpenApi()
.RequireAuthorization("DriverPolicy");

EF Core Trip Configuration

C#
// TripConfiguration.cs
public sealed class TripConfiguration : IEntityTypeConfiguration<Trip>
{
    public void Configure(EntityTypeBuilder<Trip> builder)
    {
        builder.ToTable("trips");
        builder.HasKey(t => t.Id);

        builder.Property(t => t.Version)
            .IsRowVersion()
            .IsConcurrencyToken();

        builder.Property(t => t.Status)
            .HasConversion<string>()
            .HasMaxLength(20)
            .IsRequired();

        builder.Property(t => t.ActualFare)
            .HasPrecision(10, 2);

        builder.Property(t => t.EstimatedFare)
            .HasPrecision(10, 2);

        builder.Property(t => t.SurgeMultiplier)
            .HasPrecision(4, 2);

        // Index for "find all active trips for a driver" query
        builder.HasIndex(t => new { t.DriverId, t.Status })
            .HasFilter("status NOT IN ('Completed', 'Paid', 'Cancelled')");

        // Index for rider history queries
        builder.HasIndex(t => new { t.RiderId, t.RequestedAt });
    }
}

What We'd Do Differently

1. Use H3 Hierarchical Indexing from the Start

We initially used rectangular bounding boxes for geo-queries. Bounding boxes have a corner-case problem: a driver 4.9 km away in a straight line may be 6 km away by road due to rivers or highways, while a driver 5.1 km away (outside the bounding box) may be 5.2 km by road. H3 hexagonal cells have more uniform edge distances and compose hierarchically — you can query a coarse cell quickly and refine to fine cells only when needed.

Adding H3 after launch required a data migration of 8 million historical trip records to backfill pickup and dropoff cell indices. It should have been modelled from day one.

2. Separate the Matching Service from the Trip Service Earlier

Initially, matching logic lived inside the TripCommandHandler. As matching grew more sophisticated (vehicle type filtering, driver ratings, ETA vs distance trade-off, preferred driver history), the handler became a 400-line class with 12 dependencies.

The matching service was extracted into its own process with its own Redis replica. The trip service calls it via a lightweight gRPC contract. The key benefit: matching can be deployed and scaled independently during high-demand events without redeploying the trip persistence layer.

3. Invest in WebSocket Connection State Tracking Earlier

SignalR with Redis backplane works well for broadcast, but rider-to-server connection affinity is fragile under rolling deployments. When we deploy a new version, connected riders get disconnected for 2–3 seconds during the pod restart. We did not implement reconnection handling in the rider app until month 4.

The correct approach: the rider app should treat the SignalR connection as ephemeral and re-subscribe to trip updates on every reconnect. The server should replay the last known driver location and trip status on subscription, not just send future updates.

4. Plan for Driver Earnings Reconciliation from Day One

The payment saga charges the rider and marks the trip as paid, but driver payout is a separate batch process. We discovered — after go-live — that the trip completion event and the payout batch used different fare rounding conventions. A driver payout of £12.156 would round to £12.15 in the trip record but £12.16 in the payout batch. The 1p difference accumulated to £840 per month in discrepancies.

The fix was trivial (use Math.Round(fare, 2, MidpointRounding.AwayFromZero) consistently), but finding it required reconciling six months of trip and payout records. The lesson: define a canonical fare rounding rule in a shared library and enforce it at the type level.

C#
// Money.cs — a value object that enforces rounding at construction
public readonly record struct Money(decimal Amount, string Currency)
{
    private const MidpointRounding Rounding = MidpointRounding.AwayFromZero;

    public static Money Of(decimal amount, string currency) =>
        new(Math.Round(amount, 2, Rounding), currency);

    public Money Add(Money other)
    {
        if (Currency != other.Currency)
            throw new InvalidOperationException($"Cannot add {Currency} to {other.Currency}");
        return Of(Amount + other.Amount, Currency);
    }

    public Money Multiply(decimal factor) => Of(Amount * factor, Currency);

    public override string ToString() => $"{Amount:F2} {Currency}";
}

5. Add Circuit Breakers on the Redis Path

Redis is in the critical path for location updates, matching, and surge pricing. During one incident, a Redis Cluster rebalance caused 8 seconds of elevated latency. Every driver location update and matching query queued behind the slow Redis calls, leading to a thread pool exhaustion that cascaded to the PostgreSQL connection pool.

The mitigation: add Polly circuit breakers on all Redis calls with a 500ms timeout and a fallback to stale-or-degraded behavior. For location updates, a failed Redis write is logged but does not fail the HTTP response — the driver app will retry in 4 seconds. For matching, a Redis failure falls back to a reduced-radius PostgreSQL geo-query using PostGIS.

C#
// ResilienceExtensions.cs
public static IServiceCollection AddResilientRedis(this IServiceCollection services)
{
    services.AddResiliencePipeline("redis-location", builder =>
    {
        builder
            .AddTimeout(TimeSpan.FromMilliseconds(500))
            .AddCircuitBreaker(new CircuitBreakerStrategyOptions
            {
                FailureRatio = 0.5,
                SamplingDuration = TimeSpan.FromSeconds(10),
                MinimumThroughput = 20,
                BreakDuration = TimeSpan.FromSeconds(30),
                OnOpened = args =>
                {
                    // Log + alert — Redis circuit open means matching is degraded
                    return ValueTask.CompletedTask;
                }
            })
            .AddRetry(new RetryStrategyOptions
            {
                MaxRetryAttempts = 2,
                Delay = TimeSpan.FromMilliseconds(100),
                BackoffType = DelayBackoffType.Exponential
            });
    });

    return services;
}

The ride-sharing domain forces you to confront the hardest distributed systems problems in one product: you cannot avoid real-time data, you cannot use distributed transactions, and you cannot afford downtime during peak hours. The architecture described here — Redis geo for location, state machines for trip lifecycle, sagas for multi-service coordination, and circuit breakers on every external call — is not a response to hypothetical scale. Every component addresses a failure mode that will occur in production within the first six months of operation.

Enjoyed this article?

Explore the System Design learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.