API Design Interview Questions: REST, GraphQL, gRPC, WebSocket — Real Answers with Real Examples

How to Answer API Design Questions in an Interview

API design questions are not trick questions. Interviewers want to see three things:

You know the tradeoffs — not just what each style is, but when each one fails
You anchor decisions to real constraints — "we chose X because the mobile team needed Y"
You've made mistakes — the best answers include "we tried X, it broke because Y, we switched to Z"

Every answer below follows this structure: the short answer → the real-world example → the tradeoff the interviewer wants to hear.

Question 1: "When would you choose GraphQL over REST, and when would you stick with REST?"

What They're Testing

Whether you understand that GraphQL is not a replacement for REST — it is a solution to a specific problem. Candidates who say "GraphQL is better" fail this question.

The Answer

REST gives you fixed endpoints that return fixed shapes. This is simple, cacheable, and universally understood. It becomes painful when:

Multiple clients (mobile, web, partner integrations) need different shapes from the same data
A single screen requires data from three or four REST calls
The frontend team is blocked waiting for backend to add a new field to a response

GraphQL solves the shape mismatch problem. The client describes exactly what it needs in one query, and the server returns only that.

Real example — GitHub API v4:

GitHub's REST API (v3) required multiple calls to render a pull request page:

GET /repos/{owner}/{repo}/pulls/{number}         → PR details
GET /repos/{owner}/{repo}/pulls/{number}/reviews  → review list
GET /repos/{owner}/{repo}/pulls/{number}/files    → changed files
GET /repos/{owner}/{repo}/statuses/{sha}          → CI status

Four requests, significant over-fetching on each. The GitHub CLI team moved to GraphQL v4 so they could get everything in one query and request only the fields they actually rendered:

GRAPHQL

query GetPullRequest($owner: String!, $repo: String!, $number: Int!) {
  repository(owner: $owner, name: $repo) {
    pullRequest(number: $number) {
      title
      state
      author { login }
      reviews(last: 5) {
        nodes { author { login } state }
      }
      files(first: 20) {
        nodes { path additions deletions }
      }
      commits(last: 1) {
        nodes {
          commit {
            statusCheckRollup { state }
          }
        }
      }
    }
  }
}

One request. Only the fields the CLI renders.

When to stick with REST:

Simple CRUD with predictable shapes — REST is faster to build and easier to cache
Public APIs used by third-party developers — REST is universally understood, GraphQL has a learning curve
File uploads — GraphQL handles binary poorly
When HTTP caching matters — GraphQL uses POST, GET caching doesn't apply

The tradeoff to mention: GraphQL moves complexity to the server side. You need schema design, resolvers, DataLoader for N+1 prevention, and query depth/complexity limits to prevent expensive queries. For a team of two building an internal tool, REST wins. For a mobile app team tired of waiting for new endpoints, GraphQL wins.

Question 2: "How would you version a REST API without breaking existing clients?"

What They're Testing

Whether you've worked with an API that evolved over time and had to maintain backward compatibility.

The Answer

There are three common versioning strategies, each with real tradeoffs:

1. URL path versioning (most common):

/api/v1/appointments
/api/v2/appointments

Explicit, easy to test in a browser, easy to route. Downside: clients must update URLs to get new features. Used by: Twitter, Uber, most public APIs.

2. Header versioning (cleanest URLs):

HTTP

GET /appointments
Accept: application/vnd.mybcat.v2+json

The URL stays stable, versioning is in the negotiation headers. Downside: harder to test (can't just hit the URL in a browser), requires discipline from clients. Used by: GitHub (partially).

3. Date-based versioning (Stripe's approach — best for stability):

HTTP

GET /payment-intents
Stripe-Version: 2024-04-10

Real example — Stripe:

Stripe's approach is the most developer-friendly in the industry. Every API key has a default version set to the version that existed when the key was created. Breaking changes only apply if you opt in to a newer version. Old versions are supported for years.

When Stripe introduced a breaking change to the PaymentIntent object shape, existing integrations saw nothing change. Only clients that set the new Stripe-Version header got the new shape. This let Stripe evolve the API without breaking the thousands of businesses that built on it.

How to handle a breaking change in practice:

// v1 response shape
public record AppointmentV1(string Id, string Date, string ProviderName);

// v2 response shape — ProviderName split into Provider object
public record AppointmentV2(string Id, string Date, ProviderDto Provider);

[ApiController]
[Route("api/v1/appointments")]
public class AppointmentsV1Controller : ControllerBase
{
    [HttpGet("{id}")]
    public async Task<AppointmentV1> Get(string id)
    {
        var apt = await _service.GetAsync(id);
        return new AppointmentV1(apt.Id, apt.Date, apt.Provider.Name);
    }
}

[ApiController]
[Route("api/v2/appointments")]
public class AppointmentsV2Controller : ControllerBase
{
    [HttpGet("{id}")]
    public async Task<AppointmentV2> Get(string id)
    {
        var apt = await _service.GetAsync(id);
        return new AppointmentV2(apt.Id, apt.Date, new ProviderDto(apt.Provider));
    }
}
// Shared service layer — versioning is only in the API layer, not the domain

The tradeoff to mention:

Maintaining multiple versions has a cost. You are running two codepaths. The general rule: support at most two major versions simultaneously, give clients a sunset date (at least 6 months notice), use deprecation headers so they know the clock is ticking:

HTTP

HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 31 Dec 2026 23:59:59 GMT
Link: <https://docs.mybcat.com/migration/v1-to-v2>; rel="deprecation"

Question 3: "How would you design rate limiting for a public API?"

What They're Testing

Whether you understand that rate limiting is not just a number — it involves identity, fairness, communication, and graceful degradation.

The Answer

Rate limiting has three layers. Missing any one of them creates problems.

Layer 1 — Identify who is calling:

Rate limit by API key, not by IP. IP-based limiting breaks shared offices and mobile networks (hundreds of users behind one IP). Every API call carries a key in the header:

HTTP

GET /appointments
Authorization: Bearer <api_key>
X-Practice-ID: prc_123

Layer 2 — Apply the limit with the right algorithm:

The token bucket algorithm is the industry standard. Each API key gets a bucket of tokens. Each request consumes one token. Tokens refill at a fixed rate. Bursts are allowed as long as there are tokens.

Python

import redis
import time

def check_rate_limit(api_key: str, limit: int = 1000, window_seconds: int = 3600) -> dict:
    r = redis.Redis(host='localhost', port=6379)
    key = f"rate_limit:{api_key}"
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    # Remove requests outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count remaining requests in window
    pipe.zcard(key)
    # Add this request
    pipe.zadd(key, {str(now): now})
    # Set expiry
    pipe.expire(key, window_seconds)
    results = pipe.execute()

    request_count = results[1]

    if request_count >= limit:
        oldest = r.zrange(key, 0, 0, withscores=True)
        retry_after = int(oldest[0][1] + window_seconds - now) if oldest else window_seconds
        return {
            'allowed': False,
            'retry_after': retry_after,
            'limit': limit,
            'remaining': 0
        }

    return {
        'allowed': True,
        'limit': limit,
        'remaining': limit - request_count - 1,
        'reset': int(now + window_seconds)
    }

Layer 3 — Communicate the limit clearly:

HTTP

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1714694400

When rate limited:

HTTP

HTTP/1.1 429 Too Many Requests
Retry-After: 3600
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714694400
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "1000 requests per hour. Resets at 2026-05-01T10:00:00Z.",
  "docs": "https://docs.mybcat.com/rate-limits"
}

Real example — GitHub API:

GitHub uses tiered rate limits: unauthenticated calls get 60/hour, authenticated get 5,000/hour, GitHub Apps get 15,000/hour. The X-RateLimit-* headers are on every response. GitHub's response when you hit the limit includes the exact Unix timestamp when your quota resets — clients can sleep until exactly that moment rather than retrying blindly.

Real example — Slack:

Slack's API has per-method rate limits. The /chat.postMessage endpoint is Tier 3 (50 requests per minute). The /channels.list endpoint is Tier 2 (20 requests per minute). Slack returns Retry-After: 30 in the 429 response telling you exactly how long to wait. This design lets bots calculate whether they can finish a batch job before throttling kicks in.

Question 4: "How does gRPC differ from REST, and when would you choose it?"

What They're Testing

Whether you've used both in production and can explain the performance difference without just saying "gRPC is faster."

The Answer

REST and gRPC both run over HTTP but solve different problems.

The fundamental difference:

REST serialises data as JSON — human-readable text. gRPC serialises with Protocol Buffers — a compact binary format. A {"appointment_id": "apt_123", "status": "confirmed"} JSON string is ~50 bytes. The equivalent Protobuf binary is ~10 bytes. At 10 million calls per day, that difference is significant in both bandwidth cost and parsing CPU.

HTTP/1.1 (used by REST) processes one request per connection. HTTP/2 (used by gRPC) multiplexes many requests over the same connection. For microservices making thousands of internal calls per second, this matters.

The contract difference:

REST has no enforced contract between client and server. A field can be renamed or removed and nothing warns you until production breaks. gRPC's .proto file generates clients in any language. If you rename a field in the proto, every generated client fails to compile. Breaking changes fail at compile time, not runtime.

PROTOBUF

// This proto generates clients in Go, Python, Java, .NET, TypeScript
service AppointmentService {
  rpc BookAppointment(BookRequest) returns (BookResponse);
  
  // Streaming — not possible in standard REST
  rpc StreamStatusUpdates(WatchRequest) returns (stream StatusEvent);
}

Real example — Netflix:

Netflix's internal microservices handle hundreds of billions of internal calls per day. Their content metadata service — which serves show titles, descriptions, artwork, and episode lists — originally used REST/JSON. Moving internal calls to gRPC reduced payload size by ~60% and serialisation CPU time by ~70%. At Netflix's scale that translates to fewer servers, lower cost.

Real example — Google:

gRPC was built by Google and is used for virtually all internal API calls. Google's internal predecessor, Stubby, handled trillions of RPCs per day. gRPC is the open-source version of that.

When to choose gRPC:

Service-to-service communication where you control both ends
High-frequency internal calls (payment processing, real-time data pipelines)
Polyglot microservices — one proto file generates clients in Go, Python, Java, .NET
You need streaming (live data feeds, file transfer, bidirectional communication)

When REST wins:

Public APIs — browsers can't call gRPC natively
Third-party integrations — REST is universally understood
Simple CRUD services where the performance difference doesn't matter
When human-readable requests help with debugging

Question 5: "How would you secure a REST API — walk me through every layer?"

What They're Testing

Whether your security thinking is layered — network, authentication, authorisation, input validation, output — or whether you only think about "add a JWT."

The Answer

Security is not one thing. It is five independent layers that each stop a different attack.

Layer 1 — Transport (TLS):

Every API endpoint must be HTTPS. No exceptions. A plain HTTP endpoint exposes tokens in transit to anyone on the network path. Enforce it at the infrastructure level:

Python

# FastAPI — redirect HTTP to HTTPS
from fastapi.middleware.httpsredirect import HTTPSRedirectMiddleware
app.add_middleware(HTTPSRedirectMiddleware)

HCL

# AWS — SQS queue policy denies non-TLS
Condition = { Bool = { "aws:SecureTransport" = "false" } }
Effect    = "Deny"

Layer 2 — Authentication (who are you?):

JWT is the standard for stateless APIs. The token is issued at login, sent on every request in the Authorization: Bearer <token> header, and verified by the API without a database lookup.

// .NET — validate JWT on every request
builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
    .AddJwtBearer(options =>
    {
        options.Authority = "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_abc123";
        options.TokenValidationParameters = new TokenValidationParameters
        {
            ValidateIssuer = true,
            ValidateAudience = true,
            ValidateLifetime = true,         // reject expired tokens
            ClockSkew = TimeSpan.Zero        // no leniency on expiry
        };
    });

Layer 3 — Authorisation (are you allowed to do this?):

Authentication proves identity. Authorisation checks permission. These are different checks.

// Wrong — just checks authentication
[Authorize]
public async Task<IActionResult> GetAppointment(string id) { ... }

// Correct — checks authentication AND ownership
[Authorize]
public async Task<IActionResult> GetAppointment(string id)
{
    var appointment = await _service.GetByIdAsync(id);
    
    if (appointment is null)
        return NotFound();
    
    // Authorisation: this user can only see their own appointments
    var userId = User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
    if (appointment.PatientId != userId)
        return Forbid();  // 403, not 404 — don't leak existence of other patients' records
    
    return Ok(appointment);
}

Layer 4 — Input Validation:

Never trust input. Validate every field at the boundary before it touches your domain.

Python

from pydantic import BaseModel, validator, constr
import re

class BookAppointmentRequest(BaseModel):
    slot_id: constr(regex=r'^slot_[a-z0-9_]+$', max_length=50)
    reason: constr(min_length=1, max_length=500, strip_whitespace=True)
    idempotency_key: constr(regex=r'^[0-9a-f-]{36}$')  # UUID format

    @validator('reason')
    def no_html(cls, v):
        if re.search(r'<[^>]+>', v):
            raise ValueError('HTML not allowed in reason field')
        return v

Layer 5 — Rate Limiting and Abuse Prevention:

Even authenticated users can abuse an API. Rate limits at the API gateway level protect the backend from intentional or accidental flooding.

Real example — HIPAA healthcare API (MyBCAT pattern):

Healthcare data (PHI) requires all five layers plus audit logging. Every request that touches patient data must be logged: who accessed it, when, from where. This is a HIPAA technical safeguard requirement.

Python

@app.middleware("http")
async def audit_phi_access(request: Request, call_next):
    response = await call_next(request)
    
    if '/patients/' in request.url.path or '/appointments/' in request.url.path:
        await audit_log.write({
            'user_id': get_user_id(request),
            'action': f"{request.method} {request.url.path}",
            'timestamp': datetime.utcnow().isoformat(),
            'ip': request.client.host,
            'response_status': response.status_code
        })
    
    return response

Question 6: "A client reports they're seeing duplicate orders after a network timeout. How do you fix it?"

What They're Testing

Whether you understand idempotency and can design it end-to-end — not just define the word.

The Answer

The scenario is a classic distributed systems problem. The client sends a POST to create an order, the server processes it successfully, but the response never reaches the client (network timeout). The client retries. The server creates a second order. The customer pays twice.

The fix is idempotency: the client generates a unique key before making the request and sends it on every attempt. The server uses the key to guarantee the operation runs exactly once regardless of how many times the request arrives.

Python

# Client — generates key once, retries with same key
import uuid

idempotency_key = str(uuid.uuid4())  # generated ONCE before the first attempt

for attempt in range(3):
    try:
        response = requests.post('/orders',
            headers={'Idempotency-Key': idempotency_key},
            json=order_data,
            timeout=10
        )
        break
    except requests.Timeout:
        if attempt < 2:
            time.sleep(2 ** attempt)
            continue  # retry with SAME idempotency_key
        raise

Python

# Server — atomic deduplication with DynamoDB
import boto3
from boto3.dynamodb.conditions import Attr

dynamodb = boto3.resource('dynamodb')
idempotency_table = dynamodb.Table('idempotency-keys')

def create_order(request_data: dict, idempotency_key: str):
    # Atomic: store key only if it doesn't exist yet
    try:
        idempotency_table.put_item(
            Item={
                'key': idempotency_key,
                'status': 'processing',
                'created_at': datetime.utcnow().isoformat(),
                'ttl': int(time.time()) + 86400  # 24h expiry
            },
            ConditionExpression=Attr('key').not_exists()
        )
    except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
        # Key already exists — this is a duplicate request
        existing = idempotency_table.get_item(Key={'key': idempotency_key})['Item']
        
        if existing['status'] == 'complete':
            # Return the same response as the original successful request
            return existing['response']
        else:
            # Still processing — tell client to retry later
            raise ConflictError("Request is being processed")
    
    # First time — actually process
    order = process_new_order(request_data)
    
    # Store result against idempotency key
    idempotency_table.update_item(
        Key={'key': idempotency_key},
        UpdateExpression='SET #s = :s, response = :r',
        ExpressionAttributeNames={'#s': 'status'},
        ExpressionAttributeValues={
            ':s': 'complete',
            ':r': {'order_id': order.id, 'total': order.total}
        }
    )
    
    return order

Real example — Stripe:

Stripe pioneered this pattern. Every mutating API call accepts an Idempotency-Key header. Stripe stores the key and the response for 24 hours. If the same key arrives again within that window, Stripe returns the cached response without re-executing the payment. This is why payment failures never result in double charges with Stripe — clients always retry safely.

Real example — AWS SQS FIFO:

SQS FIFO queues have built-in deduplication. Set MessageDeduplicationId to any stable identifier for the message. If SQS receives a message with the same ID within 5 minutes, it silently drops the duplicate. No custom code needed.

Question 7: "How would you design a real-time notification system — for example, showing a patient when their appointment is confirmed?"

What They're Testing

Whether you can distinguish between polling, SSE, and WebSocket and pick the right tool. Whether you think about connection management and scale.

The Answer

There are four approaches, and the right one depends on traffic volume, bidirectionality, and browser support.

Option 1 — Polling (simplest, worst UX):

TYPESCRIPT

// Client checks every 5 seconds
useEffect(() => {
    const interval = setInterval(async () => {
        const status = await fetch(`/appointments/${id}/status`);
        if (status === 'confirmed') clearInterval(interval);
    }, 5000);
    return () => clearInterval(interval);
}, [id]);

Wastes server resources. Patient sees the confirmation up to 5 seconds late. 1,000 concurrent users = 200 requests/second of pure polling overhead. Do not use this for production.

Option 2 — Long Polling (better, still complex):

Client sends a request. Server holds it open (up to 30 seconds) until new data is available, then responds. Client immediately sends the next request. Works behind every proxy. Hard to implement correctly.

Option 3 — Server-Sent Events (best for this use case):

One-way push from server to browser over a standard HTTP connection. Browser handles reconnection automatically. No WebSocket handshake complexity.

Python

# FastAPI SSE endpoint
from sse_starlette.sse import EventSourceResponse

@app.get("/appointments/{id}/status-stream")
async def appointment_status_stream(id: str, request: Request):
    async def event_generator():
        while True:
            if await request.is_disconnected():
                break
            
            appointment = await appointment_service.get(id)
            
            yield {
                "event": "status_update",
                "data": json.dumps({
                    "appointment_id": id,
                    "status": appointment.status,
                    "updated_at": appointment.updated_at.isoformat()
                })
            }
            
            if appointment.status in ('confirmed', 'cancelled'):
                break  # stop streaming once terminal state reached
            
            await asyncio.sleep(1)
    
    return EventSourceResponse(event_generator())

TYPESCRIPT

// Browser — zero library needed, built into every browser
const source = new EventSource(`/appointments/${appointmentId}/status-stream`, {
    withCredentials: true
});

source.addEventListener('status_update', (event) => {
    const data = JSON.parse(event.data);
    setStatus(data.status);
    if (['confirmed', 'cancelled'].includes(data.status)) {
        source.close();
    }
});
// Browser auto-reconnects if connection drops

Option 4 — WebSocket (use for bidirectional):

If the patient also needs to send messages (chat with provider, typing indicators), WebSocket. If they only need to receive, SSE is simpler and more reliable.

Scale consideration:

10,000 concurrent patients each holding an SSE connection = 10,000 open HTTP connections. Each connection holds a small amount of memory but no processing until a message is sent. With Node.js or .NET async, 10,000 is easily handled on a single instance. With a load balancer, route by appointment_id consistent hash so the same server handles the connection lifecycle.

Real example — GitHub Actions:

When you watch a CI run in the browser, the log lines stream in real-time via SSE. The server streams each log line as the runner produces it. The browser doesn't send anything back — SSE is exactly right.

Question 8: "You have a webhook endpoint that receives Stripe payment events. How do you make it production-safe?"

What They're Testing

Whether you understand the four common webhook pitfalls: unverified signatures, timeouts, duplicates, and ordering problems.

The Answer

A naive webhook endpoint has four failure modes. Fix all four.

Failure 1 — Anyone can call your webhook:

Without signature verification, an attacker can POST fake payment events to your endpoint. Stripe signs every event with HMAC-SHA256.

Python

import hmac, hashlib

def verify_stripe_signature(payload: bytes, header: str, secret: str) -> bool:
    timestamp = header.split(',')[0].split('=')[1]
    
    # Reject events older than 5 minutes (replay attack prevention)
    if abs(time.time() - int(timestamp)) > 300:
        return False
    
    signature = hmac.new(
        secret.encode(),
        f"{timestamp}.{payload.decode()}".encode(),
        hashlib.sha256
    ).hexdigest()
    
    received = next(
        (p.split('=')[1] for p in header.split(',') if p.startswith('v1=')),
        ''
    )
    return hmac.compare_digest(signature, received)

Failure 2 — Processing takes longer than Stripe's 30-second timeout:

If your endpoint takes more than 30 seconds to respond, Stripe marks it as failed and retries. The retry causes a duplicate (failure 3). Fix: return 200 immediately, process asynchronously.

Python

@app.post("/webhooks/stripe")
async def stripe_webhook(request: Request, background_tasks: BackgroundTasks):
    payload = await request.body()
    
    if not verify_stripe_signature(payload, request.headers['Stripe-Signature'], STRIPE_SECRET):
        raise HTTPException(status_code=400, detail="Invalid signature")
    
    event = json.loads(payload)
    
    # Return 200 immediately — do not block on processing
    background_tasks.add_task(handle_stripe_event, event)
    return {"received": True}  # Stripe sees 200, marks delivery as success

Failure 3 — Stripe retries on failure, causing duplicate processing:

Stripe retries with the same event.id. Store processed event IDs.

Python

async def handle_stripe_event(event: dict):
    event_id = event['id']
    
    # Atomic: store only if not seen before
    stored = await redis.set(
        f"processed_stripe_event:{event_id}",
        "1",
        nx=True,      # Only set if Not eXists
        ex=86400      # Expire after 24 hours
    )
    
    if not stored:
        return  # Duplicate — already handled
    
    if event['type'] == 'payment_intent.succeeded':
        await mark_appointment_paid(event['data']['object']['metadata']['appointment_id'])
    elif event['type'] == 'payment_intent.payment_failed':
        await notify_payment_failure(event['data']['object']['metadata']['appointment_id'])

Failure 4 — Events arrive out of order:

Stripe does not guarantee event ordering. A payment_intent.succeeded event might arrive before a payment_intent.created event (rare but real). Never assume the current DB state matches what you expect based on the event alone — always load fresh data.

Python

async def mark_appointment_paid(appointment_id: str):
    # Load fresh from DB — don't rely on state from the event
    appointment = await db.get_appointment(appointment_id)
    
    if appointment.status == 'paid':
        return  # Already paid — idempotent
    
    if appointment.status == 'cancelled':
        # Payment arrived for cancelled appointment — trigger refund
        await stripe_client.refunds.create(payment_intent=appointment.payment_intent_id)
        return
    
    await db.update_appointment_status(appointment_id, 'paid')
    await send_confirmation_notification(appointment)

Real production checklist for any webhook endpoint:

✓ Verify signature before any processing
✓ Return 200 immediately — async processing only
✓ Idempotency check on event ID
✓ Load fresh state from DB — don't trust event ordering
✓ Alert if webhook error rate rises above 1%
✓ Log every event for audit trail

Question 9: "How would you design an API for a mobile app that works well on slow connections?"

What They're Testing

Whether you think about network conditions, not just correctness. Senior engineers design for the real constraints of mobile users.

The Answer

Mobile on a slow connection fails differently than desktop on a fast connection. The same API that works fine at home on WiFi times out in a hospital corridor on 3G.

Design for partial data:

Don't require the client to load everything before showing anything. Use pagination and field selection.

GET /appointments?page=1&pageSize=10&fields=id,date,provider.name,status

The client loads 10 appointments, shows the list immediately, loads more when the user scrolls. The fields parameter means the response is 70% smaller — critical on a slow connection.

Design for offline-first with conditional requests:

ETags let the client cache responses and validate them cheaply.

HTTP

# First request
GET /appointments/123
→ 200 OK
→ ETag: "v5-apt-123"
→ Cache-Control: max-age=300

# Second request (within 5 minutes) — browser serves from cache

# Third request (after 5 minutes) — revalidate cheaply
GET /appointments/123
If-None-Match: "v5-apt-123"
→ 304 Not Modified (empty body — save bandwidth)

# After server updates the appointment
GET /appointments/123
If-None-Match: "v5-apt-123"
→ 200 OK with new body and new ETag

The 304 Not Modified response has no body. On a slow connection with 10 API calls per screen load, 8 of them returning 304 instead of full JSON payloads makes the screen feel fast.

Design for retry with idempotency:

Mobile connections drop mid-request. Every mutating operation must be idempotent so the app can retry safely.

TYPESCRIPT

async function bookAppointment(slotId: string): Promise<Appointment> {
  // Key generated once per booking attempt — survives retries
  const idempotencyKey = crypto.randomUUID();
  
  const MAX_RETRIES = 3;
  for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
    try {
      const response = await fetch('/api/v1/appointments', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${getToken()}`,
          'Idempotency-Key': idempotencyKey,  // same key on every retry
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ slotId }),
        signal: AbortSignal.timeout(10000)  // 10 second timeout
      });
      
      if (response.status === 409) {
        // Slot taken — not a network error, don't retry
        throw new SlotUnavailableError();
      }
      
      return await response.json();
      
    } catch (error) {
      if (error instanceof SlotUnavailableError) throw error;
      if (attempt === MAX_RETRIES - 1) throw error;
      
      await sleep(Math.pow(2, attempt) * 1000); // 1s, 2s, 4s backoff
    }
  }
}

Compress responses:

// .NET — gzip compression for large responses
builder.Services.AddResponseCompression(options =>
{
    options.EnableForHttps = true;
    options.Providers.Add<GzipCompressionProvider>();
});

A JSON response that is 40KB uncompressed becomes 8KB gzipped. On a 1 Mbps mobile connection, that is 320ms vs 64ms for just the transfer.

Real example — WhatsApp:

WhatsApp was designed for 2G networks in developing markets. Every design decision — message batching, binary encoding (not JSON), aggressive caching, optimistic UI — reflects the constraint of a slow, unreliable connection. Most Western apps designed for fast WiFi perform terribly in these conditions.

Question 10: "Walk me through how you would design the API layer for a healthcare platform handling patient data."

What They're Testing

Whether you connect API design to security, compliance, and real-world consequences — not just technical correctness.

The Answer

Healthcare APIs are not just slower CRUD. PHI (Protected Health Information) has legal protection under HIPAA, and an API design mistake can expose it.

Authentication: short-lived tokens, no API keys:

Long-lived API keys are a HIPAA risk — if leaked, they provide persistent access to patient data. Use short-lived JWTs (15-minute expiry) with refresh tokens stored in httpOnly cookies (not localStorage — XSS can steal localStorage).

Access token:  15 minute expiry — used on every API call
Refresh token: 7 day expiry — httpOnly cookie, used to get new access tokens

Authorisation: resource-level ownership, not just role:

A doctor can see their own patients' appointments. They cannot see another doctor's patient records. A patient can see their own data only.

// Not enough — only checks role
[Authorize(Roles = "Patient")]
public async Task<IActionResult> GetMedicalRecord(string recordId) { ... }

// Correct — checks role AND ownership
[Authorize(Roles = "Patient")]
public async Task<IActionResult> GetMedicalRecord(string recordId)
{
    var record = await _recordService.GetAsync(recordId);
    
    if (record is null) return NotFound();
    
    var patientId = User.FindFirst("patient_id")?.Value;
    
    // Critical: verify ownership before returning
    if (record.PatientId != patientId)
        return Forbid(); // 403 — don't reveal that record exists
    
    return Ok(record);
}

Audit log every PHI access:

HIPAA requires logging who accessed PHI, when, from where, and what they did.

Python

@app.middleware("http")
async def phi_audit_middleware(request: Request, call_next):
    response = await call_next(request)
    
    # Log every access to patient data endpoints
    phi_paths = ['/patients/', '/appointments/', '/medical-records/']
    if any(p in request.url.path for p in phi_paths):
        await audit_service.log({
            'user_id': extract_user_id(request),
            'resource': request.url.path,
            'method': request.method,
            'status': response.status_code,
            'ip_address': request.client.host,
            'timestamp': datetime.utcnow().isoformat(),
            'session_id': extract_session_id(request)
        })
    
    return response

Minimum necessary data — return only what is needed:

HIPAA's minimum necessary standard: don't return PHI fields the caller has no reason to need.

Python

# Admin user viewing their list — needs name, date, status
# Does NOT need diagnosis, notes, medication
class AppointmentSummaryDto(BaseModel):
    id: str
    date: str
    provider_name: str
    status: str
    # No PHI fields

# Provider viewing their patient — needs clinical context
class AppointmentDetailDto(BaseModel):
    id: str
    date: str
    patient_name: str
    reason: str
    notes: str
    # PHI included because the provider needs it for care

The answer to tie it together:

"For a healthcare API I layer it: TLS everywhere enforced at the infrastructure level, short-lived JWTs with refresh via httpOnly cookie, authorisation at the resource level not just the route, an audit middleware that logs every PHI access to CloudTrail or a tamper-proof log store, and response DTOs designed on minimum-necessary so callers only receive the fields they legitimately need. The design is informed by HIPAA's technical safeguards — it's not just best practice, it's the legal requirement."

Quick-Fire Interview Answers

"What is the difference between 401 and 403?"

"401 Unauthorised means your credentials are missing or invalid — you have not authenticated. 403 Forbidden means you are authenticated but not allowed to do this specific thing. A request with no token gets 401. A patient requesting another patient's record gets 403."

"What is HATEOAS?"

"Hypermedia as the Engine of Application State — a REST constraint where responses include links to related actions. An appointment response includes links to cancel, reschedule, or view the provider. Clients navigate the API by following links rather than constructing URLs. Rarely implemented in practice."

"REST vs RPC?"

"REST is resource-oriented — you operate on nouns (appointments, patients). RPC is action-oriented — you call functions (bookAppointment, cancelAppointment). REST is better for external APIs because it is predictable and self-describing. RPC (especially gRPC) is better for internal service communication where you want typed contracts."

"What is an API gateway and why do you need one?"

"An API gateway sits in front of your services and handles cross-cutting concerns: authentication, rate limiting, SSL termination, routing, logging, and request transformation. Without it, every service implements these independently. With it, services only handle business logic. AWS API Gateway, Azure API Management, and Kong are common choices."

API Design Interview Questions: REST, GraphQL, gRPC, WebSocket — Real Answers with Real Examples

How to Answer API Design Questions in an Interview

Question 1: "When would you choose GraphQL over REST, and when would you stick with REST?"

What They're Testing

The Answer

Question 2: "How would you version a REST API without breaking existing clients?"

What They're Testing

The Answer

Question 3: "How would you design rate limiting for a public API?"

What They're Testing

The Answer

Question 4: "How does gRPC differ from REST, and when would you choose it?"

What They're Testing

The Answer

Question 5: "How would you secure a REST API — walk me through every layer?"

What They're Testing

The Answer

Question 6: "A client reports they're seeing duplicate orders after a network timeout. How do you fix it?"

What They're Testing

The Answer

Question 7: "How would you design a real-time notification system — for example, showing a patient when their appointment is confirmed?"

What They're Testing

The Answer

Question 8: "You have a webhook endpoint that receives Stripe payment events. How do you make it production-safe?"

What They're Testing

The Answer

Question 9: "How would you design an API for a mobile app that works well on slow connections?"

What They're Testing

The Answer

Question 10: "Walk me through how you would design the API layer for a healthcare platform handling patient data."

What They're Testing

The Answer

Quick-Fire Interview Answers

API Design Principles Knowledge Check

Enjoyed this article?

Leave a comment