API Design Interview Questions: REST, GraphQL, gRPC, WebSocket — Real Answers with Real Examples
10 senior-level API design interview questions with detailed answers. REST vs GraphQL vs gRPC vs WebSocket — when to use each, how to design versioning, handle rate limiting, secure endpoints, and make the right architectural call. Real examples from Stripe, GitHub, Uber, Netflix, and Slack.
How to Answer API Design Questions in an Interview
API design questions are not trick questions. Interviewers want to see three things:
- You know the tradeoffs — not just what each style is, but when each one fails
- You anchor decisions to real constraints — "we chose X because the mobile team needed Y"
- You've made mistakes — the best answers include "we tried X, it broke because Y, we switched to Z"
Every answer below follows this structure: the short answer → the real-world example → the tradeoff the interviewer wants to hear.
Question 1: "When would you choose GraphQL over REST, and when would you stick with REST?"
What They're Testing
Whether you understand that GraphQL is not a replacement for REST — it is a solution to a specific problem. Candidates who say "GraphQL is better" fail this question.
The Answer
REST gives you fixed endpoints that return fixed shapes. This is simple, cacheable, and universally understood. It becomes painful when:
- Multiple clients (mobile, web, partner integrations) need different shapes from the same data
- A single screen requires data from three or four REST calls
- The frontend team is blocked waiting for backend to add a new field to a response
GraphQL solves the shape mismatch problem. The client describes exactly what it needs in one query, and the server returns only that.
Real example — GitHub API v4:
GitHub's REST API (v3) required multiple calls to render a pull request page:
GET /repos/{owner}/{repo}/pulls/{number} → PR details
GET /repos/{owner}/{repo}/pulls/{number}/reviews → review list
GET /repos/{owner}/{repo}/pulls/{number}/files → changed files
GET /repos/{owner}/{repo}/statuses/{sha} → CI statusFour requests, significant over-fetching on each. The GitHub CLI team moved to GraphQL v4 so they could get everything in one query and request only the fields they actually rendered:
query GetPullRequest($owner: String!, $repo: String!, $number: Int!) {
repository(owner: $owner, name: $repo) {
pullRequest(number: $number) {
title
state
author { login }
reviews(last: 5) {
nodes { author { login } state }
}
files(first: 20) {
nodes { path additions deletions }
}
commits(last: 1) {
nodes {
commit {
statusCheckRollup { state }
}
}
}
}
}
}One request. Only the fields the CLI renders.
When to stick with REST:
- Simple CRUD with predictable shapes — REST is faster to build and easier to cache
- Public APIs used by third-party developers — REST is universally understood, GraphQL has a learning curve
- File uploads — GraphQL handles binary poorly
- When HTTP caching matters — GraphQL uses POST, GET caching doesn't apply
The tradeoff to mention: GraphQL moves complexity to the server side. You need schema design, resolvers, DataLoader for N+1 prevention, and query depth/complexity limits to prevent expensive queries. For a team of two building an internal tool, REST wins. For a mobile app team tired of waiting for new endpoints, GraphQL wins.
Question 2: "How would you version a REST API without breaking existing clients?"
What They're Testing
Whether you've worked with an API that evolved over time and had to maintain backward compatibility.
The Answer
There are three common versioning strategies, each with real tradeoffs:
1. URL path versioning (most common):
/api/v1/appointments
/api/v2/appointmentsExplicit, easy to test in a browser, easy to route. Downside: clients must update URLs to get new features. Used by: Twitter, Uber, most public APIs.
2. Header versioning (cleanest URLs):
GET /appointments
Accept: application/vnd.mybcat.v2+jsonThe URL stays stable, versioning is in the negotiation headers. Downside: harder to test (can't just hit the URL in a browser), requires discipline from clients. Used by: GitHub (partially).
3. Date-based versioning (Stripe's approach — best for stability):
GET /payment-intents
Stripe-Version: 2024-04-10Real example — Stripe:
Stripe's approach is the most developer-friendly in the industry. Every API key has a default version set to the version that existed when the key was created. Breaking changes only apply if you opt in to a newer version. Old versions are supported for years.
When Stripe introduced a breaking change to the PaymentIntent object shape, existing integrations saw nothing change. Only clients that set the new Stripe-Version header got the new shape. This let Stripe evolve the API without breaking the thousands of businesses that built on it.
How to handle a breaking change in practice:
// v1 response shape
public record AppointmentV1(string Id, string Date, string ProviderName);
// v2 response shape — ProviderName split into Provider object
public record AppointmentV2(string Id, string Date, ProviderDto Provider);
[ApiController]
[Route("api/v1/appointments")]
public class AppointmentsV1Controller : ControllerBase
{
[HttpGet("{id}")]
public async Task<AppointmentV1> Get(string id)
{
var apt = await _service.GetAsync(id);
return new AppointmentV1(apt.Id, apt.Date, apt.Provider.Name);
}
}
[ApiController]
[Route("api/v2/appointments")]
public class AppointmentsV2Controller : ControllerBase
{
[HttpGet("{id}")]
public async Task<AppointmentV2> Get(string id)
{
var apt = await _service.GetAsync(id);
return new AppointmentV2(apt.Id, apt.Date, new ProviderDto(apt.Provider));
}
}
// Shared service layer — versioning is only in the API layer, not the domainThe tradeoff to mention:
Maintaining multiple versions has a cost. You are running two codepaths. The general rule: support at most two major versions simultaneously, give clients a sunset date (at least 6 months notice), use deprecation headers so they know the clock is ticking:
HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 31 Dec 2026 23:59:59 GMT
Link: <https://docs.mybcat.com/migration/v1-to-v2>; rel="deprecation"Question 3: "How would you design rate limiting for a public API?"
What They're Testing
Whether you understand that rate limiting is not just a number — it involves identity, fairness, communication, and graceful degradation.
The Answer
Rate limiting has three layers. Missing any one of them creates problems.
Layer 1 — Identify who is calling:
Rate limit by API key, not by IP. IP-based limiting breaks shared offices and mobile networks (hundreds of users behind one IP). Every API call carries a key in the header:
GET /appointments
Authorization: Bearer <api_key>
X-Practice-ID: prc_123Layer 2 — Apply the limit with the right algorithm:
The token bucket algorithm is the industry standard. Each API key gets a bucket of tokens. Each request consumes one token. Tokens refill at a fixed rate. Bursts are allowed as long as there are tokens.
import redis
import time
def check_rate_limit(api_key: str, limit: int = 1000, window_seconds: int = 3600) -> dict:
r = redis.Redis(host='localhost', port=6379)
key = f"rate_limit:{api_key}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
# Remove requests outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count remaining requests in window
pipe.zcard(key)
# Add this request
pipe.zadd(key, {str(now): now})
# Set expiry
pipe.expire(key, window_seconds)
results = pipe.execute()
request_count = results[1]
if request_count >= limit:
oldest = r.zrange(key, 0, 0, withscores=True)
retry_after = int(oldest[0][1] + window_seconds - now) if oldest else window_seconds
return {
'allowed': False,
'retry_after': retry_after,
'limit': limit,
'remaining': 0
}
return {
'allowed': True,
'limit': limit,
'remaining': limit - request_count - 1,
'reset': int(now + window_seconds)
}Layer 3 — Communicate the limit clearly:
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 247
X-RateLimit-Reset: 1714694400When rate limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 3600
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714694400
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "1000 requests per hour. Resets at 2026-05-01T10:00:00Z.",
"docs": "https://docs.mybcat.com/rate-limits"
}Real example — GitHub API:
GitHub uses tiered rate limits: unauthenticated calls get 60/hour, authenticated get 5,000/hour, GitHub Apps get 15,000/hour. The X-RateLimit-* headers are on every response. GitHub's response when you hit the limit includes the exact Unix timestamp when your quota resets — clients can sleep until exactly that moment rather than retrying blindly.
Real example — Slack:
Slack's API has per-method rate limits. The /chat.postMessage endpoint is Tier 3 (50 requests per minute). The /channels.list endpoint is Tier 2 (20 requests per minute). Slack returns Retry-After: 30 in the 429 response telling you exactly how long to wait. This design lets bots calculate whether they can finish a batch job before throttling kicks in.
Question 4: "How does gRPC differ from REST, and when would you choose it?"
What They're Testing
Whether you've used both in production and can explain the performance difference without just saying "gRPC is faster."
The Answer
REST and gRPC both run over HTTP but solve different problems.
The fundamental difference:
REST serialises data as JSON — human-readable text. gRPC serialises with Protocol Buffers — a compact binary format. A {"appointment_id": "apt_123", "status": "confirmed"} JSON string is ~50 bytes. The equivalent Protobuf binary is ~10 bytes. At 10 million calls per day, that difference is significant in both bandwidth cost and parsing CPU.
HTTP/1.1 (used by REST) processes one request per connection. HTTP/2 (used by gRPC) multiplexes many requests over the same connection. For microservices making thousands of internal calls per second, this matters.
The contract difference:
REST has no enforced contract between client and server. A field can be renamed or removed and nothing warns you until production breaks. gRPC's .proto file generates clients in any language. If you rename a field in the proto, every generated client fails to compile. Breaking changes fail at compile time, not runtime.
// This proto generates clients in Go, Python, Java, .NET, TypeScript
service AppointmentService {
rpc BookAppointment(BookRequest) returns (BookResponse);
// Streaming — not possible in standard REST
rpc StreamStatusUpdates(WatchRequest) returns (stream StatusEvent);
}Real example — Netflix:
Netflix's internal microservices handle hundreds of billions of internal calls per day. Their content metadata service — which serves show titles, descriptions, artwork, and episode lists — originally used REST/JSON. Moving internal calls to gRPC reduced payload size by ~60% and serialisation CPU time by ~70%. At Netflix's scale that translates to fewer servers, lower cost.
Real example — Google:
gRPC was built by Google and is used for virtually all internal API calls. Google's internal predecessor, Stubby, handled trillions of RPCs per day. gRPC is the open-source version of that.
When to choose gRPC:
- Service-to-service communication where you control both ends
- High-frequency internal calls (payment processing, real-time data pipelines)
- Polyglot microservices — one proto file generates clients in Go, Python, Java, .NET
- You need streaming (live data feeds, file transfer, bidirectional communication)
When REST wins:
- Public APIs — browsers can't call gRPC natively
- Third-party integrations — REST is universally understood
- Simple CRUD services where the performance difference doesn't matter
- When human-readable requests help with debugging
Question 5: "How would you secure a REST API — walk me through every layer?"
What They're Testing
Whether your security thinking is layered — network, authentication, authorisation, input validation, output — or whether you only think about "add a JWT."
The Answer
Security is not one thing. It is five independent layers that each stop a different attack.
Layer 1 — Transport (TLS):
Every API endpoint must be HTTPS. No exceptions. A plain HTTP endpoint exposes tokens in transit to anyone on the network path. Enforce it at the infrastructure level:
# FastAPI — redirect HTTP to HTTPS
from fastapi.middleware.httpsredirect import HTTPSRedirectMiddleware
app.add_middleware(HTTPSRedirectMiddleware)# AWS — SQS queue policy denies non-TLS
Condition = { Bool = { "aws:SecureTransport" = "false" } }
Effect = "Deny"Layer 2 — Authentication (who are you?):
JWT is the standard for stateless APIs. The token is issued at login, sent on every request in the Authorization: Bearer <token> header, and verified by the API without a database lookup.
// .NET — validate JWT on every request
builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
.AddJwtBearer(options =>
{
options.Authority = "https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_abc123";
options.TokenValidationParameters = new TokenValidationParameters
{
ValidateIssuer = true,
ValidateAudience = true,
ValidateLifetime = true, // reject expired tokens
ClockSkew = TimeSpan.Zero // no leniency on expiry
};
});Layer 3 — Authorisation (are you allowed to do this?):
Authentication proves identity. Authorisation checks permission. These are different checks.
// Wrong — just checks authentication
[Authorize]
public async Task<IActionResult> GetAppointment(string id) { ... }
// Correct — checks authentication AND ownership
[Authorize]
public async Task<IActionResult> GetAppointment(string id)
{
var appointment = await _service.GetByIdAsync(id);
if (appointment is null)
return NotFound();
// Authorisation: this user can only see their own appointments
var userId = User.FindFirst(ClaimTypes.NameIdentifier)?.Value;
if (appointment.PatientId != userId)
return Forbid(); // 403, not 404 — don't leak existence of other patients' records
return Ok(appointment);
}Layer 4 — Input Validation:
Never trust input. Validate every field at the boundary before it touches your domain.
from pydantic import BaseModel, validator, constr
import re
class BookAppointmentRequest(BaseModel):
slot_id: constr(regex=r'^slot_[a-z0-9_]+$', max_length=50)
reason: constr(min_length=1, max_length=500, strip_whitespace=True)
idempotency_key: constr(regex=r'^[0-9a-f-]{36}$') # UUID format
@validator('reason')
def no_html(cls, v):
if re.search(r'<[^>]+>', v):
raise ValueError('HTML not allowed in reason field')
return vLayer 5 — Rate Limiting and Abuse Prevention:
Even authenticated users can abuse an API. Rate limits at the API gateway level protect the backend from intentional or accidental flooding.
Real example — HIPAA healthcare API (MyBCAT pattern):
Healthcare data (PHI) requires all five layers plus audit logging. Every request that touches patient data must be logged: who accessed it, when, from where. This is a HIPAA technical safeguard requirement.
@app.middleware("http")
async def audit_phi_access(request: Request, call_next):
response = await call_next(request)
if '/patients/' in request.url.path or '/appointments/' in request.url.path:
await audit_log.write({
'user_id': get_user_id(request),
'action': f"{request.method} {request.url.path}",
'timestamp': datetime.utcnow().isoformat(),
'ip': request.client.host,
'response_status': response.status_code
})
return responseQuestion 6: "A client reports they're seeing duplicate orders after a network timeout. How do you fix it?"
What They're Testing
Whether you understand idempotency and can design it end-to-end — not just define the word.
The Answer
The scenario is a classic distributed systems problem. The client sends a POST to create an order, the server processes it successfully, but the response never reaches the client (network timeout). The client retries. The server creates a second order. The customer pays twice.
The fix is idempotency: the client generates a unique key before making the request and sends it on every attempt. The server uses the key to guarantee the operation runs exactly once regardless of how many times the request arrives.
# Client — generates key once, retries with same key
import uuid
idempotency_key = str(uuid.uuid4()) # generated ONCE before the first attempt
for attempt in range(3):
try:
response = requests.post('/orders',
headers={'Idempotency-Key': idempotency_key},
json=order_data,
timeout=10
)
break
except requests.Timeout:
if attempt < 2:
time.sleep(2 ** attempt)
continue # retry with SAME idempotency_key
raise# Server — atomic deduplication with DynamoDB
import boto3
from boto3.dynamodb.conditions import Attr
dynamodb = boto3.resource('dynamodb')
idempotency_table = dynamodb.Table('idempotency-keys')
def create_order(request_data: dict, idempotency_key: str):
# Atomic: store key only if it doesn't exist yet
try:
idempotency_table.put_item(
Item={
'key': idempotency_key,
'status': 'processing',
'created_at': datetime.utcnow().isoformat(),
'ttl': int(time.time()) + 86400 # 24h expiry
},
ConditionExpression=Attr('key').not_exists()
)
except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
# Key already exists — this is a duplicate request
existing = idempotency_table.get_item(Key={'key': idempotency_key})['Item']
if existing['status'] == 'complete':
# Return the same response as the original successful request
return existing['response']
else:
# Still processing — tell client to retry later
raise ConflictError("Request is being processed")
# First time — actually process
order = process_new_order(request_data)
# Store result against idempotency key
idempotency_table.update_item(
Key={'key': idempotency_key},
UpdateExpression='SET #s = :s, response = :r',
ExpressionAttributeNames={'#s': 'status'},
ExpressionAttributeValues={
':s': 'complete',
':r': {'order_id': order.id, 'total': order.total}
}
)
return orderReal example — Stripe:
Stripe pioneered this pattern. Every mutating API call accepts an Idempotency-Key header. Stripe stores the key and the response for 24 hours. If the same key arrives again within that window, Stripe returns the cached response without re-executing the payment. This is why payment failures never result in double charges with Stripe — clients always retry safely.
Real example — AWS SQS FIFO:
SQS FIFO queues have built-in deduplication. Set MessageDeduplicationId to any stable identifier for the message. If SQS receives a message with the same ID within 5 minutes, it silently drops the duplicate. No custom code needed.
Question 7: "How would you design a real-time notification system — for example, showing a patient when their appointment is confirmed?"
What They're Testing
Whether you can distinguish between polling, SSE, and WebSocket and pick the right tool. Whether you think about connection management and scale.
The Answer
There are four approaches, and the right one depends on traffic volume, bidirectionality, and browser support.
Option 1 — Polling (simplest, worst UX):
// Client checks every 5 seconds
useEffect(() => {
const interval = setInterval(async () => {
const status = await fetch(`/appointments/${id}/status`);
if (status === 'confirmed') clearInterval(interval);
}, 5000);
return () => clearInterval(interval);
}, [id]);Wastes server resources. Patient sees the confirmation up to 5 seconds late. 1,000 concurrent users = 200 requests/second of pure polling overhead. Do not use this for production.
Option 2 — Long Polling (better, still complex):
Client sends a request. Server holds it open (up to 30 seconds) until new data is available, then responds. Client immediately sends the next request. Works behind every proxy. Hard to implement correctly.
Option 3 — Server-Sent Events (best for this use case):
One-way push from server to browser over a standard HTTP connection. Browser handles reconnection automatically. No WebSocket handshake complexity.
# FastAPI SSE endpoint
from sse_starlette.sse import EventSourceResponse
@app.get("/appointments/{id}/status-stream")
async def appointment_status_stream(id: str, request: Request):
async def event_generator():
while True:
if await request.is_disconnected():
break
appointment = await appointment_service.get(id)
yield {
"event": "status_update",
"data": json.dumps({
"appointment_id": id,
"status": appointment.status,
"updated_at": appointment.updated_at.isoformat()
})
}
if appointment.status in ('confirmed', 'cancelled'):
break # stop streaming once terminal state reached
await asyncio.sleep(1)
return EventSourceResponse(event_generator())// Browser — zero library needed, built into every browser
const source = new EventSource(`/appointments/${appointmentId}/status-stream`, {
withCredentials: true
});
source.addEventListener('status_update', (event) => {
const data = JSON.parse(event.data);
setStatus(data.status);
if (['confirmed', 'cancelled'].includes(data.status)) {
source.close();
}
});
// Browser auto-reconnects if connection dropsOption 4 — WebSocket (use for bidirectional):
If the patient also needs to send messages (chat with provider, typing indicators), WebSocket. If they only need to receive, SSE is simpler and more reliable.
Scale consideration:
10,000 concurrent patients each holding an SSE connection = 10,000 open HTTP connections. Each connection holds a small amount of memory but no processing until a message is sent. With Node.js or .NET async, 10,000 is easily handled on a single instance. With a load balancer, route by appointment_id consistent hash so the same server handles the connection lifecycle.
Real example — GitHub Actions:
When you watch a CI run in the browser, the log lines stream in real-time via SSE. The server streams each log line as the runner produces it. The browser doesn't send anything back — SSE is exactly right.
Question 8: "You have a webhook endpoint that receives Stripe payment events. How do you make it production-safe?"
What They're Testing
Whether you understand the four common webhook pitfalls: unverified signatures, timeouts, duplicates, and ordering problems.
The Answer
A naive webhook endpoint has four failure modes. Fix all four.
Failure 1 — Anyone can call your webhook:
Without signature verification, an attacker can POST fake payment events to your endpoint. Stripe signs every event with HMAC-SHA256.
import hmac, hashlib
def verify_stripe_signature(payload: bytes, header: str, secret: str) -> bool:
timestamp = header.split(',')[0].split('=')[1]
# Reject events older than 5 minutes (replay attack prevention)
if abs(time.time() - int(timestamp)) > 300:
return False
signature = hmac.new(
secret.encode(),
f"{timestamp}.{payload.decode()}".encode(),
hashlib.sha256
).hexdigest()
received = next(
(p.split('=')[1] for p in header.split(',') if p.startswith('v1=')),
''
)
return hmac.compare_digest(signature, received)Failure 2 — Processing takes longer than Stripe's 30-second timeout:
If your endpoint takes more than 30 seconds to respond, Stripe marks it as failed and retries. The retry causes a duplicate (failure 3). Fix: return 200 immediately, process asynchronously.
@app.post("/webhooks/stripe")
async def stripe_webhook(request: Request, background_tasks: BackgroundTasks):
payload = await request.body()
if not verify_stripe_signature(payload, request.headers['Stripe-Signature'], STRIPE_SECRET):
raise HTTPException(status_code=400, detail="Invalid signature")
event = json.loads(payload)
# Return 200 immediately — do not block on processing
background_tasks.add_task(handle_stripe_event, event)
return {"received": True} # Stripe sees 200, marks delivery as successFailure 3 — Stripe retries on failure, causing duplicate processing:
Stripe retries with the same event.id. Store processed event IDs.
async def handle_stripe_event(event: dict):
event_id = event['id']
# Atomic: store only if not seen before
stored = await redis.set(
f"processed_stripe_event:{event_id}",
"1",
nx=True, # Only set if Not eXists
ex=86400 # Expire after 24 hours
)
if not stored:
return # Duplicate — already handled
if event['type'] == 'payment_intent.succeeded':
await mark_appointment_paid(event['data']['object']['metadata']['appointment_id'])
elif event['type'] == 'payment_intent.payment_failed':
await notify_payment_failure(event['data']['object']['metadata']['appointment_id'])Failure 4 — Events arrive out of order:
Stripe does not guarantee event ordering. A payment_intent.succeeded event might arrive before a payment_intent.created event (rare but real). Never assume the current DB state matches what you expect based on the event alone — always load fresh data.
async def mark_appointment_paid(appointment_id: str):
# Load fresh from DB — don't rely on state from the event
appointment = await db.get_appointment(appointment_id)
if appointment.status == 'paid':
return # Already paid — idempotent
if appointment.status == 'cancelled':
# Payment arrived for cancelled appointment — trigger refund
await stripe_client.refunds.create(payment_intent=appointment.payment_intent_id)
return
await db.update_appointment_status(appointment_id, 'paid')
await send_confirmation_notification(appointment)Real production checklist for any webhook endpoint:
✓ Verify signature before any processing
✓ Return 200 immediately — async processing only
✓ Idempotency check on event ID
✓ Load fresh state from DB — don't trust event ordering
✓ Alert if webhook error rate rises above 1%
✓ Log every event for audit trailQuestion 9: "How would you design an API for a mobile app that works well on slow connections?"
What They're Testing
Whether you think about network conditions, not just correctness. Senior engineers design for the real constraints of mobile users.
The Answer
Mobile on a slow connection fails differently than desktop on a fast connection. The same API that works fine at home on WiFi times out in a hospital corridor on 3G.
Design for partial data:
Don't require the client to load everything before showing anything. Use pagination and field selection.
GET /appointments?page=1&pageSize=10&fields=id,date,provider.name,statusThe client loads 10 appointments, shows the list immediately, loads more when the user scrolls. The fields parameter means the response is 70% smaller — critical on a slow connection.
Design for offline-first with conditional requests:
ETags let the client cache responses and validate them cheaply.
# First request
GET /appointments/123
→ 200 OK
→ ETag: "v5-apt-123"
→ Cache-Control: max-age=300
# Second request (within 5 minutes) — browser serves from cache
# Third request (after 5 minutes) — revalidate cheaply
GET /appointments/123
If-None-Match: "v5-apt-123"
→ 304 Not Modified (empty body — save bandwidth)
# After server updates the appointment
GET /appointments/123
If-None-Match: "v5-apt-123"
→ 200 OK with new body and new ETagThe 304 Not Modified response has no body. On a slow connection with 10 API calls per screen load, 8 of them returning 304 instead of full JSON payloads makes the screen feel fast.
Design for retry with idempotency:
Mobile connections drop mid-request. Every mutating operation must be idempotent so the app can retry safely.
async function bookAppointment(slotId: string): Promise<Appointment> {
// Key generated once per booking attempt — survives retries
const idempotencyKey = crypto.randomUUID();
const MAX_RETRIES = 3;
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
try {
const response = await fetch('/api/v1/appointments', {
method: 'POST',
headers: {
'Authorization': `Bearer ${getToken()}`,
'Idempotency-Key': idempotencyKey, // same key on every retry
'Content-Type': 'application/json'
},
body: JSON.stringify({ slotId }),
signal: AbortSignal.timeout(10000) // 10 second timeout
});
if (response.status === 409) {
// Slot taken — not a network error, don't retry
throw new SlotUnavailableError();
}
return await response.json();
} catch (error) {
if (error instanceof SlotUnavailableError) throw error;
if (attempt === MAX_RETRIES - 1) throw error;
await sleep(Math.pow(2, attempt) * 1000); // 1s, 2s, 4s backoff
}
}
}Compress responses:
// .NET — gzip compression for large responses
builder.Services.AddResponseCompression(options =>
{
options.EnableForHttps = true;
options.Providers.Add<GzipCompressionProvider>();
});A JSON response that is 40KB uncompressed becomes 8KB gzipped. On a 1 Mbps mobile connection, that is 320ms vs 64ms for just the transfer.
Real example — WhatsApp:
WhatsApp was designed for 2G networks in developing markets. Every design decision — message batching, binary encoding (not JSON), aggressive caching, optimistic UI — reflects the constraint of a slow, unreliable connection. Most Western apps designed for fast WiFi perform terribly in these conditions.
Question 10: "Walk me through how you would design the API layer for a healthcare platform handling patient data."
What They're Testing
Whether you connect API design to security, compliance, and real-world consequences — not just technical correctness.
The Answer
Healthcare APIs are not just slower CRUD. PHI (Protected Health Information) has legal protection under HIPAA, and an API design mistake can expose it.
Authentication: short-lived tokens, no API keys:
Long-lived API keys are a HIPAA risk — if leaked, they provide persistent access to patient data. Use short-lived JWTs (15-minute expiry) with refresh tokens stored in httpOnly cookies (not localStorage — XSS can steal localStorage).
Access token: 15 minute expiry — used on every API call
Refresh token: 7 day expiry — httpOnly cookie, used to get new access tokensAuthorisation: resource-level ownership, not just role:
A doctor can see their own patients' appointments. They cannot see another doctor's patient records. A patient can see their own data only.
// Not enough — only checks role
[Authorize(Roles = "Patient")]
public async Task<IActionResult> GetMedicalRecord(string recordId) { ... }
// Correct — checks role AND ownership
[Authorize(Roles = "Patient")]
public async Task<IActionResult> GetMedicalRecord(string recordId)
{
var record = await _recordService.GetAsync(recordId);
if (record is null) return NotFound();
var patientId = User.FindFirst("patient_id")?.Value;
// Critical: verify ownership before returning
if (record.PatientId != patientId)
return Forbid(); // 403 — don't reveal that record exists
return Ok(record);
}Audit log every PHI access:
HIPAA requires logging who accessed PHI, when, from where, and what they did.
@app.middleware("http")
async def phi_audit_middleware(request: Request, call_next):
response = await call_next(request)
# Log every access to patient data endpoints
phi_paths = ['/patients/', '/appointments/', '/medical-records/']
if any(p in request.url.path for p in phi_paths):
await audit_service.log({
'user_id': extract_user_id(request),
'resource': request.url.path,
'method': request.method,
'status': response.status_code,
'ip_address': request.client.host,
'timestamp': datetime.utcnow().isoformat(),
'session_id': extract_session_id(request)
})
return responseMinimum necessary data — return only what is needed:
HIPAA's minimum necessary standard: don't return PHI fields the caller has no reason to need.
# Admin user viewing their list — needs name, date, status
# Does NOT need diagnosis, notes, medication
class AppointmentSummaryDto(BaseModel):
id: str
date: str
provider_name: str
status: str
# No PHI fields
# Provider viewing their patient — needs clinical context
class AppointmentDetailDto(BaseModel):
id: str
date: str
patient_name: str
reason: str
notes: str
# PHI included because the provider needs it for careThe answer to tie it together:
"For a healthcare API I layer it: TLS everywhere enforced at the infrastructure level, short-lived JWTs with refresh via httpOnly cookie, authorisation at the resource level not just the route, an audit middleware that logs every PHI access to CloudTrail or a tamper-proof log store, and response DTOs designed on minimum-necessary so callers only receive the fields they legitimately need. The design is informed by HIPAA's technical safeguards — it's not just best practice, it's the legal requirement."
Quick-Fire Interview Answers
"What is the difference between 401 and 403?"
"401 Unauthorised means your credentials are missing or invalid — you have not authenticated. 403 Forbidden means you are authenticated but not allowed to do this specific thing. A request with no token gets 401. A patient requesting another patient's record gets 403."
"What is HATEOAS?"
"Hypermedia as the Engine of Application State — a REST constraint where responses include links to related actions. An appointment response includes links to cancel, reschedule, or view the provider. Clients navigate the API by following links rather than constructing URLs. Rarely implemented in practice."
"REST vs RPC?"
"REST is resource-oriented — you operate on nouns (appointments, patients). RPC is action-oriented — you call functions (bookAppointment, cancelAppointment). REST is better for external APIs because it is predictable and self-describing. RPC (especially gRPC) is better for internal service communication where you want typed contracts."
"What is an API gateway and why do you need one?"
"An API gateway sits in front of your services and handles cross-cutting concerns: authentication, rate limiting, SSL termination, routing, logging, and request transformation. Without it, every service implements these independently. With it, services only handle business logic. AWS API Gateway, Azure API Management, and Kong are common choices."
API Design Principles Knowledge Check
5 questions · Test what you just learned · Instant explanations
Enjoyed this article?
Explore the Backend Systems learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.