Design a Food Delivery App (Uber Eats / DoorDash)

The Interview Question

"Design a food delivery platform. Customers can browse restaurants, place orders, and track their delivery in real time. Drivers receive order requests and navigate to the restaurant, then to the customer. The platform must handle peak dinner-hour traffic across thousands of cities."

This question combines a marketplace (connecting customers, restaurants, and drivers), real-time location tracking, and state machine management. The interesting problems are in driver-order matching, live location broadcasting, and handling the burst load at 6pm.

Step 1: Requirements

Functional

Customers browse restaurants and menus (filtered by location, cuisine, rating)
Customers place orders and track delivery in real time
Restaurants receive orders and manage preparation status
Drivers receive nearby order requests, accept/reject, navigate to restaurant and customer
Real-time location tracking: customer sees driver position update every 4 seconds

Non-functional

50 million daily active users across 10,000 cities
Peak: 5x normal load from 6pm to 8pm local time
Driver location update: < 5 seconds latency
Order placement to driver assignment: < 30 seconds
System must handle partial failures (restaurant offline, driver app crashes mid-delivery)

Step 2: Three Actors, Three Different Systems

The fundamental insight is that this isn't one system — it's three systems with a coordination layer between them.

┌─────────────────┐     ┌────────────────────┐     ┌─────────────────┐
│   Customer App  │     │  Restaurant Tablet  │     │   Driver App    │
│                 │     │                     │     │                 │
│ - Browse menus  │     │ - Receive orders    │     │ - Receive req.  │
│ - Place order   │     │ - Update prep time  │     │ - Accept/reject │
│ - Track driver  │     │ - Mark ready        │     │ - Navigation    │
│ - Rate delivery │     │ - Manage menu       │     │ - Update loc.   │
└────────┬────────┘     └─────────┬───────────┘     └────────┬────────┘
         │                        │                           │
         └────────────────────────┼───────────────────────────┘
                                  │
                         ┌────────▼────────┐
                         │  Order Service  │
                         │  (coordinator)  │
                         └─────────────────┘

Each actor has very different read/write patterns:

Customer: read-heavy (browse), occasional write (order)
Restaurant: write-heavy (status updates), pull-based (polling for new orders)
Driver: write-heavy (continuous location), push-based (new order notifications)

Step 3: Order State Machine

Every order follows a strict state progression. Modelling this as a state machine prevents invalid transitions and makes the system auditable.

                    [PLACED]
                       │
             Restaurant confirms
                       │
                  [CONFIRMED]
                       │
              Restaurant preparing
                       │
                  [PREPARING]
                       │
          Driver assigned and en route
                       │
               [DRIVER_ASSIGNED]
                       │
          Driver arrives at restaurant
                       │
               [PICKED_UP]
                       │
          Driver arrives at customer
                       │
               [DELIVERED]
                       │
          Customer rates / closes
                       │
                [COMPLETED]

Side transitions (from any non-terminal state):
  → [CANCELLED]  (customer cancels, restaurant rejects, driver cancels)
  → [FAILED]     (payment failure, no driver available after timeout)

Why this matters in interviews: each state transition is an event. You can use Kafka to emit OrderStateChanged events, and different services subscribe — the customer app subscribes to show progress, the driver app subscribes to get pickup instructions, analytics subscribes to compute prep time metrics.

Order state transitions → Kafka topic: order.state.changed
  Payload: { order_id, from_state, to_state, timestamp, actor_id }

Consumers:
  notification.service  → notify customer "Your driver is on the way"
  analytics.service     → compute average prep time per restaurant
  driver.service        → release driver if order cancelled

Step 4: Driver Location — The Hard Part

50 million daily active users across 10,000 cities. During dinner rush in one city, you might have 10,000 active drivers all sending GPS updates every 4 seconds.

10,000 drivers per city × 1/4 updates per second = 2,500 writes/second per city
10,000 cities × 2,500 = 25,000,000 writes/second globally at peak

A relational database cannot absorb 25M writes/second. The solution has two parts:

Write path: Redis geospatial

Driver app sends location update:
  POST /location  { driver_id, lat, lng, timestamp }

Location Service:
  GEOADD drivers:{city_id} longitude latitude driver_id
  SET driver_meta:{driver_id} { status, last_seen, vehicle_type } TTL 60s

TTL = 60 seconds: if a driver goes offline, their position auto-expires

Redis GEOADD is O(log N). 25M writes/second is handled by a Redis Cluster with city-level sharding — each shard owns a set of cities.

Query path: find nearby drivers

Customer places order at (lat=59.91, lng=10.75) in Oslo:

GEORADIUS drivers:oslo_city_id 10.75 59.91 5 km ASC COUNT 20

Returns: [driver_789, driver_234, driver_102, ...]  ← 20 nearest drivers

Filter: only drivers with status=AVAILABLE
Dispatch: send order request to top 3 drivers simultaneously

Why not a SQL WHERE ST_Distance < 5km query? PostgreSQL PostGIS can handle this at moderate scale, but with 25M writes/second, you'd need a write-optimised storage layer regardless. Redis geospatial is the standard answer here because it handles both the write throughput and the radius query efficiently in one store.

Step 5: Driver-Order Matching

When an order is placed, you don't just want the nearest driver — you want the best driver given pickup time, travel time to customer, and driver rating.

Matching Service receives new order:

1. Find candidate drivers within 5km radius (Redis GEORADIUS)
2. Score each candidate:
   score = w1 × (1 / eta_to_restaurant)    ← faster to pickup = better
         + w2 × (1 / eta_to_customer)       ← faster delivery = better
         + w3 × driver_acceptance_rate      ← reliable drivers preferred
         + w4 × driver_rating               ← quality signal

3. Send request to top 3 drivers simultaneously (parallel dispatch)
4. First driver to accept gets the order

Timeout logic:
  15 seconds: if no acceptance, expand radius to 8km, dispatch next 3
  30 seconds: expand to 12km
  60 seconds: mark order as FAILED, notify customer

Why send to 3 drivers simultaneously rather than one at a time? Sequential dispatch means the first driver's 15-second timeout must expire before you try the second driver. At dinner rush, every second matters — parallel dispatch finds an available driver 3x faster.

Race condition: two drivers accept simultaneously. Use Redis atomic operations:

SETNX order_lock:{order_id} {driver_id}  ← Set if not exists
If returns 1: this driver wins the order
If returns 0: another driver was faster → send rejection to this driver

Step 6: Architecture

┌──────────────────────────────────────────────────────────────────┐
│  Customer App                  Driver App           Restaurant   │
└────────┬───────────────────────────┬──────────────────┬──────────┘
         │                           │                  │
┌────────▼───────────────────────────▼──────────────────▼──────────┐
│                         API Gateway                               │
│              (Auth, rate limiting, routing)                       │
└────┬────────────────┬────────────────┬────────────────┬───────────┘
     │                │                │                │
┌────▼────┐    ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐
│ Order   │    │  Location   │  │  Matching   │  │ Restaurant  │
│ Service │    │  Service    │  │  Service    │  │  Service    │
└────┬────┘    └──────┬──────┘  └─────────────┘  └─────────────┘
     │                │
     │         ┌──────▼──────┐
     │         │    Redis    │
     │         │  Geospatial │
     │         │  (location) │
     │         └─────────────┘
     │
┌────▼────────────────────────────────────────────────────────────┐
│                    Kafka (event bus)                             │
│  order.placed  order.state.changed  driver.location.updated     │
└────┬─────────────────────────────────────────────────────────────┘
     │
┌────▼────────────────────────────────────────────────────────────┐
│                PostgreSQL (orders, users, restaurants)           │
│                Elasticsearch (restaurant search)                 │
│                Redis (session, rate limits, deduplication)       │
└─────────────────────────────────────────────────────────────────┘

Step 7: Restaurant Search and Discovery

Customers search for restaurants near them. This is a two-step query: geospatial filter, then full-text search.

Customer searches "sushi" at location (lat, lng), radius 3km:

Step 1 — Geospatial filter (Redis or PostgreSQL PostGIS):
  Find all restaurant_ids within 3km

Step 2 — Full-text relevance search (Elasticsearch):
  GET /restaurants/_search
  {
    "query": {
      "bool": {
        "must": { "match": { "cuisine_tags": "sushi" } },
        "filter": { "ids": { "values": [restaurant_id_list] } }
      }
    },
    "sort": [{ "rating": "desc" }, { "estimated_delivery_time": "asc" }]
  }

Restaurant data synced to Elasticsearch asynchronously after any update (menu change, rating recalculation, hours change).

Why Elasticsearch and not just SQL LIKE '%sushi%'? Full-text search with relevance scoring, fuzzy matching ("sushii"), and filtering across multiple fields (cuisine, dietary, price range) is exactly what Elasticsearch is built for. SQL LIKE with a leading wildcard cannot use an index.

Step 8: Peak Traffic — The 6pm Problem

A food delivery platform is not uniformly loaded. In any given city, 70% of daily orders arrive between 5pm and 9pm.

Normal traffic (2pm):
  5,000 orders/hour per city
  1,400 writes/second (location + orders)

Peak traffic (6:30pm):
  25,000 orders/hour per city (5x)
  7,000 writes/second

At 10,000 cities: 70M writes/second globally at peak

Solutions:

Auto-scaling for stateless services (Order, Matching, Notification): horizontal scaling via Kubernetes HPA triggers when CPU > 60%. These are stateless — spin up 5x more pods.
Redis Cluster city sharding: location data is already sharded by city. Peak in Oslo doesn't affect Redis shards for London.
Read replicas for restaurant catalog: customers browsing menus are read-only. Route all catalog reads to PostgreSQL read replicas. Write only when menu changes (rare during dinner rush).
Throttle non-critical writes: during peak, stop flushing analytics events to the data warehouse in real time. Buffer in Kafka and process overnight.
Circuit breakers on payment: if payment provider is slow, don't block the order pipeline. Accept the order optimistically, process payment async. Flag order as PAYMENT_PENDING and cancel if payment fails within 30 seconds.

Step 9: Database Schema

ORDERS
  id               UUID  PRIMARY KEY
  customer_id      UUID
  restaurant_id    UUID
  driver_id        UUID  (nullable until assigned)
  status           ENUM  (placed, confirmed, preparing, driver_assigned, picked_up, delivered, cancelled, failed)
  total_amount     DECIMAL(10,2)
  delivery_address JSONB
  placed_at        TIMESTAMPTZ
  estimated_at     TIMESTAMPTZ
  delivered_at     TIMESTAMPTZ

ORDER_ITEMS
  order_id         UUID  REFERENCES orders(id)
  menu_item_id     UUID
  quantity         INT
  unit_price       DECIMAL(10,2)

RESTAURANTS
  id               UUID  PRIMARY KEY
  name             TEXT
  cuisine_tags     TEXT[]
  location         POINT  (PostGIS)
  rating           DECIMAL(3,2)
  is_open          BOOLEAN
  avg_prep_minutes INT

DRIVERS
  id               UUID  PRIMARY KEY
  name             TEXT
  phone            TEXT
  status           ENUM  (offline, available, assigned, delivering)
  vehicle_type     ENUM  (bicycle, moped, car)
  rating           DECIMAL(3,2)
  city_id          UUID

What the Interviewer Is Actually Testing

Do you model the three-actor system (customer / restaurant / driver) as separate domains with different patterns?
Can you explain why Redis geospatial handles 25M location writes/second better than SQL?
Do you model the order state machine and use events (Kafka) to coordinate state changes across services?
Do you handle the parallel driver dispatch to minimize matching latency?
Do you use SETNX for atomic driver assignment to prevent double-booking?
Do you address the 6pm peak traffic problem with concrete scaling strategies (sharding, read replicas, async writes)?
Do you use Elasticsearch for restaurant discovery and explain why SQL LIKE falls short?