Design a Ride-Sharing App (Uber / Lyft)

The Interview Question

"Design a ride-sharing application. Riders request a trip, drivers nearby are matched, and the system handles the full trip lifecycle from request to payment."

This question is one of the most complex in the interview canon. It combines real-time geospatial data, state machines, matching algorithms, surge pricing, and distributed coordination — all at massive scale.

Step 1: Requirements

Functional

Rider opens app and sees nearby drivers on a map
Rider requests a trip — system matches them to the best driver
Driver accepts or declines; if declined, next driver is tried
Both parties see each other's live location during the trip
Trip has full lifecycle: requested → matched → en_route → arrived → in_trip → completed
Payment processed at trip end

Non-functional

5 million trips per day (~58 per second)
3 million active drivers globally at peak
Driver location updates every 4 seconds — 750,000 location writes/second
Match a rider to a driver in under 3 seconds
Location data accurate to within 50 metres

Step 2: The Geospatial Problem — Finding Nearby Drivers

This is the heart of the question. How do you efficiently find all drivers within 2km of a given point when you have 3 million drivers moving every 4 seconds?

Geohash

Geohash divides the world into a grid of cells, each represented by a short string. The key property: cells with similar prefixes are geographically nearby.

Precision 6 geohash: each cell ≈ 1.2km × 0.6km
Precision 7 geohash: each cell ≈ 150m × 150m

Rider at position P → geohash = "dp3wj6"
Find nearby drivers → query for all drivers with geohash prefix "dp3wj" 
                       (neighbouring cells)

How it works in practice:

Driver location update:
  Compute geohash of (lat, lng) → "dp3wj6t"
  Redis GEOADD drivers_geo {lng} {lat} {driver_id}

Rider requests trip at (lat=40.71, lng=-74.00):
  Redis GEORADIUS drivers_geo -74.00 40.71 2 km ASC COUNT 10
  → returns 10 nearest drivers sorted by distance

Redis has native geospatial commands built on geohash. GEORADIUS returns the nearest N drivers within a radius in microseconds, even with millions of drivers indexed.

Step 3: Architecture

                           ┌───────────────┐
                           │ Rider App     │
                           └──────┬────────┘
                                  │ request trip
                    ┌─────────────▼────────────┐
                    │      API Gateway          │
                    └──┬──────────┬────────────┘
                       │          │
          ┌────────────▼──┐  ┌────▼────────────────┐
          │ Location      │  │   Trip Service       │
          │ Service       │  │ (state machine)      │
          │               │  └────────┬─────────────┘
          │ ← driver      │           │
          │   updates     │    ┌──────▼──────┐
          │   every 4s    │    │  Matching   │
          └───────┬───────┘    │  Service    │
                  │            └──────┬──────┘
          ┌───────▼───────┐           │ notify driver
          │  Redis Geo    │    ┌──────▼──────┐
          │  Index        │    │  Driver App │
          │  (live map)   │    └─────────────┘
          └───────────────┘
                  │ write history
          ┌───────▼───────┐
          │  Location DB  │
          │  (Cassandra)  │
          └───────────────┘

Step 4: Driver Location Updates — The Write Volume Problem

3 million drivers × 1 update every 4 seconds = 750,000 writes/second.

This cannot go directly to a relational database. Two separate concerns:

1. Live positions (for matching): Redis geospatial index. GEOADD updates a driver's position in microseconds. The entire live position dataset fits in memory — 3M drivers × ~100 bytes = 300MB.

2. Historical positions (for dispute resolution, analytics): Write to Cassandra. Partition by (driver_id, date), cluster by timestamp. Append-only, no updates, scales horizontally.

On every driver location update:
  → Update Redis geospatial index (synchronous, fast path)
  → Enqueue to Kafka (async, durable path)
  → Kafka consumer writes to Cassandra (background, doesn't block driver app)

If Redis loses its data (restart/failure), it replays from Kafka to rebuild the live index.

Step 5: Trip State Machine

Every trip progresses through states. Invalid transitions must be rejected — you cannot move from requested to completed without going through in_trip.

                   ┌──────────────┐
                   │  REQUESTED   │ ← rider submits trip request
                   └──────┬───────┘
                          │ driver matched
                   ┌──────▼───────┐
                   │   MATCHED    │ ← driver accepted
                   └──────┬───────┘
                          │ driver moving to rider
                   ┌──────▼───────┐
                   │  EN_ROUTE    │
                   └──────┬───────┘
                          │ driver at pickup location
                   ┌──────▼───────┐
                   │   ARRIVED    │
                   └──────┬───────┘
                          │ rider gets in, trip starts
                   ┌──────▼───────┐
                   │   IN_TRIP    │
                   └──────┬───────┘
                          │ destination reached
                   ┌──────▼───────┐
                   │  COMPLETED   │ → trigger payment
                   └──────────────┘

CANCELLED is reachable from: REQUESTED, MATCHED, EN_ROUTE, ARRIVED

This state machine lives in the Trip Service. The trip row in PostgreSQL stores current_state. Every state transition is an atomic DB update with an optimistic lock check: "only update to state X if current state is Y." If two events try to transition simultaneously, one wins and the other retries or fails gracefully.

Step 6: Matching Algorithm

The matching service has a deceptively hard sub-problem: which driver should be matched to this rider?

Naive: pick the nearest driver.

Reality: nearest driver might be stuck in traffic, might be heading away from the rider's destination, might decline frequently (lowering platform reliability).

Production factors:

Score = f(
  distance_to_rider,
  estimated_time_to_rider (ETA, not just distance — accounts for traffic),
  driver_acceptance_rate,
  driver_rating,
  direction_alignment (is the driver heading toward the rider's destination?)
)

For an interview, describe the scoring function rather than the specific weights. The important point: it's not just nearest-driver, it's a multi-variable optimisation problem.

Timeout and retry:

Match attempt:
  Find top 3 drivers by score
  Send match request to driver #1, wait 10 seconds
  → Accept: create trip
  → Decline or timeout: try driver #2
  → Repeat for driver #3
  → No match: notify rider, retry search with wider radius

Step 7: Surge Pricing — Why It's an Engineering Problem

Surge pricing isn't just a business decision — it's an infrastructure component that prevents cascade failures.

When demand spikes (e.g., rain, concerts, airport rush), without surge:

All riders request trips simultaneously
Matching service is overwhelmed
No drivers are available
Every request fails or times out

With surge:

Higher price → some riders wait or cancel (demand reduction)
Higher earnings → more drivers go online (supply increase)
System stabilises

Implementation:

Every 60 seconds:
  Count open ride requests per geohash zone (demand)
  Count available drivers per zone (supply)
  Surge multiplier = f(demand / supply) — published to a config cache
  
Client reads surge multiplier from cache before showing price
Trip price = base_fare × surge_multiplier

The surge multiplier is computed by a background job and cached — it does not need real-time precision.

Step 8: Database Choice Summary

| Data | Database | Why | |------|----------|-----| | Live driver positions | Redis Geospatial | In-memory, sub-ms geo queries | | Trip state / metadata | PostgreSQL | ACID, state machine transitions, payment records | | Historical locations | Cassandra | 750K writes/sec, append-only, time-series | | Driver/rider profiles | PostgreSQL | Relational, low write volume | | Surge pricing config | Redis | Config cache, read-heavy, fast TTL refresh |

What the Interviewer Is Actually Testing

Do you identify the geospatial index problem and propose Redis geohash or similar?
Do you separate live positions (Redis) from historical positions (Cassandra)?
Do you design a state machine for the trip lifecycle?
Can you explain matching beyond "find the nearest driver"?
Do you handle the 750K writes/second location update problem explicitly?
Do you understand that surge pricing is both a business and an engineering feature?