API Gateway in Production: Real Examples from Real Companies
How Uber, Stripe, Netflix, GitHub, Slack, and healthcare platforms actually use API Gateway. Real request flows, real policy configs, real failures, and the exact code that fixed them.
Why Real Examples Matter
You can read every AWS doc and still freeze when an interviewer asks: "Tell me about a time API Gateway caused a production issue." Real examples give you the mental model to both explain concepts and reason about new problems you have never seen before.
Every example in this article follows the same structure:
Company / Scenario
ā The problem they had
ā The API Gateway feature that solved it
ā The exact flow (with code)
ā What breaks if you get it wrongEXAMPLE 1: Stripe ā Webhook Delivery at Scale
The Problem
Stripe sends webhooks to millions of merchant endpoints when payments succeed, fail, or are disputed. A merchant's server might be slow, down for maintenance, or returning 500 errors. Stripe cannot lose these events ā a missed "payment succeeded" webhook means a merchant never ships the customer's order.
Stripe's webhook infrastructure is essentially a massive API Gateway problem: route one event to one endpoint, guarantee delivery, handle failures gracefully.
The Flow
Payment event occurs (charge.succeeded)
ā
Stripe internal event bus
ā
Webhook delivery service (API Gateway equivalent):
1. POST https://merchant.com/webhooks/stripe
Headers:
Stripe-Signature: t=1745000000,v1=abc123... ā HMAC signature
Content-Type: application/json
Body: { "type": "charge.succeeded", "data": { ... } }
ā
2. Merchant server validates signature:
secret = "whsec_abc123..."
expected = HMAC-SHA256(secret, timestamp + "." + body)
if expected != header_signature ā return 400 (reject)
else ā process the event ā return 200
ā
3. If merchant returns non-200:
Retry with exponential backoff:
5s ā 30s ā 1min ā 5min ā 30min ā 2h ā 5h ā 10h ā 24h ā 48h ā 72h
Total: 11 attempts over 72 hours
ā
4. If all retries fail:
Event moves to dead letter queue
Merchant is notified via email and dashboard
Merchant can manually replay from Stripe dashboardWhat You Learn From This
HMAC signature validation is your API Gateway's equivalent of a Lambda Authoriser for unauthenticated webhooks. Your endpoint cannot use JWT (Stripe does not log in to your system). Instead:
- Stripe signs the payload with a secret you share
- You recompute the signature on your side
- If they match, the request genuinely came from Stripe
# Correct webhook validation in your Lambda
import hmac
import hashlib
def handler(event, context):
# Never skip this check ā without it, anyone can fake a payment event
stripe_sig = event['headers'].get('stripe-signature', '')
payload = event['body']
webhook_secret = get_secret('mybcat/prod/stripe_webhook_secret')
try:
stripe.WebhookSignature.verify_header(payload, stripe_sig, webhook_secret)
except stripe.error.SignatureVerificationError:
return {'statusCode': 400, 'body': 'Invalid signature'}
# Safe to process ā this genuinely came from Stripe
event_data = json.loads(payload)
if event_data['type'] == 'invoice.payment_succeeded':
activate_practice_subscription(event_data['data']['object'])
return {'statusCode': 200}Without signature validation:
A bad actor discovers your webhook URL. They POST a fake invoice.payment_succeeded event. Your system activates a subscription that was never paid for. HMAC validation costs 0.1ms and prevents this entirely.
EXAMPLE 2: Uber ā Real-Time Driver Location via WebSocket
The Problem
Uber's rider app shows your driver moving on a map in real time. The driver's location updates every 4 seconds. With 5 million concurrent trips, polling ā where the rider app asks "where is my driver?" every 4 seconds ā would be 5 million Ć 15 requests/minute = 75 million requests/minute. Expensive, slow, and wasteful.
WebSocket changes this: the driver app sends location updates, the server pushes them to the specific rider app. No polling.
The Flow
Driver app (phone):
Sends location every 4 seconds via WebSocket:
{ "type": "location_update", "lat": 51.507, "lng": -0.127, "tripId": "trip_789" }
ā
Driver WebSocket ā API Gateway WebSocket API ā location-update Lambda
Lambda:
1. Gets driver's connectionId from event
2. Gets tripId from message
3. Writes to DynamoDB: { PK: "TRIP#trip_789", lat: 51.507, lng: -0.127, updatedAt: now }
4. Gets rider's connectionId for this trip from DynamoDB
5. Pushes location to rider via API Gateway Management API
ā
Rider app (phone):
Receives pushed message: { "lat": 51.507, "lng": -0.127, "eta": "4 min" }
Map updates ā driver icon moves
No request was made by the rider appFull Code ā WebSocket Location Push
# location_update Lambda ā runs when driver sends location
import boto3
import json
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('uber-trips')
def handler(event, context):
connection_id = event['requestContext']['connectionId']
body = json.loads(event['body'])
trip_id = body['tripId']
lat = body['lat']
lng = body['lng']
# Store latest driver location
table.update_item(
Key={'PK': f'TRIP#{trip_id}', 'SK': 'LOCATION'},
UpdateExpression='SET lat = :lat, lng = :lng, updatedAt = :ts',
ExpressionAttributeValues={
':lat': str(lat), ':lng': str(lng),
':ts': int(time.time())
}
)
# Get rider's WebSocket connection for this trip
trip = table.get_item(
Key={'PK': f'TRIP#{trip_id}', 'SK': 'METADATA'}
).get('Item')
rider_connection = trip.get('riderConnectionId')
if not rider_connection:
return {'statusCode': 200} # rider app not connected right now
# Push location to rider ā no rider request needed
apigw = boto3.client(
'apigatewaymanagementapi',
endpoint_url=f"https://{event['requestContext']['domainName']}/{event['requestContext']['stage']}"
)
try:
apigw.post_to_connection(
ConnectionId=rider_connection,
Data=json.dumps({
'type': 'driver_location',
'lat': lat,
'lng': lng,
'tripId': trip_id
})
)
except apigw.exceptions.GoneException:
# Rider's app was backgrounded or closed ā remove stale connection
table.update_item(
Key={'PK': f'TRIP#{trip_id}', 'SK': 'METADATA'},
UpdateExpression='REMOVE riderConnectionId'
)
return {'statusCode': 200}What Breaks Without GoneException Handling
The rider closes their app. The WebSocket connection is gone. Your Lambda still tries to post_to_connection to the closed connectionId. Without catching GoneException, Lambda throws an unhandled exception, the error propagates, and DynamoDB has a stale connectionId that causes errors on every future location update for that trip.
Always catch GoneException and clean up stale connections.
EXAMPLE 3: Netflix ā Authoriser Caching Saves Millions of Dollars
The Problem
Netflix has 250 million subscribers. Every API call includes a JWT. Without authoriser caching:
250 million users Ć 50 API calls per viewing session
= 12.5 billion JWT validation calls per day
= ~145,000 Lambda authoriser invocations per second at peak
At $0.0000002/invocation:
= $2,500/day just for JWT validation
= $900,000/yearThe Solution: Authoriser Cache TTL = 300 Seconds
First request with token X:
API Gateway calls Authoriser Lambda
Lambda validates JWT, returns Allow policy
API Gateway caches: { tokenHash: "abc123", policy: Allow, userId: "u123", until: now+300s }
Next 1,000 requests with token X within 5 minutes:
API Gateway checks cache ā hit ā returns Allow immediately
Authoriser Lambda: NOT called
Latency: 0ms additional
Cost: $0 for those 1,000 validations
Reality: at 300s cache TTL, most JWTs are validated once per 5-minute window# Netflix-style Lambda Authoriser with context enrichment
def handler(event, context):
token = event['authorizationToken'].replace('Bearer ', '')
# Validate JWT locally ā no external call, fast
try:
payload = jwt.decode(
token,
get_public_key(), # cached in Lambda memory after first fetch
algorithms=['RS256'],
audience='netflix-api'
)
except jwt.ExpiredSignatureError:
raise Exception('Unauthorized')
except jwt.InvalidTokenError:
raise Exception('Unauthorized')
# Return policy + rich context ā downstream Lambdas get this for free
# No database call needed in downstream Lambda to get user details
return {
'principalId': payload['sub'],
'policyDocument': allow_policy(event['methodArn']),
'context': {
'userId': payload['sub'],
'plan': payload.get('plan', 'standard'), # basic/standard/premium
'region': payload.get('region', 'us-east-1'), # for content licensing
'deviceId': payload.get('device_id', ''),
}
}What downstream Lambda receives:
def get_recommended_content(event, context):
# These came from the authoriser context ā zero extra validation needed
user_id = event['requestContext']['authorizer']['userId']
plan = event['requestContext']['authorizer']['plan']
region = event['requestContext']['authorizer']['region']
if plan == 'basic':
return get_sd_content(user_id, region) # SD only for basic plan
elif plan == 'standard':
return get_hd_content(user_id, region) # HD for standard
else:
return get_4k_content(user_id, region) # 4K for premiumNo database call in the recommendation Lambda to determine the user's plan. The authoriser already did that work, cached it, and passed it forward.
EXAMPLE 4: GitHub ā API Versioning That Does Not Break Integrations
The Problem
GitHub's REST API v3 has been in production since 2012. Millions of integrations depend on it. In 2020, GitHub launched GraphQL API v4 ā a complete redesign. They cannot deprecate v3 overnight without breaking:
- Every CI/CD tool
- Every GitHub Action
- Every developer tool
- Every enterprise integration
Their Versioning Strategy
https://api.github.com/repos/{owner}/{repo} ā REST API v3 (still works)
https://api.github.com/graphql ā GraphQL API v4
Both routes live behind the same API Gateway.
Both are maintained simultaneously.
v3 gets security patches but no new features.
v4 gets all new features.
Migration path:
- Deprecation notices added to v3 response headers
- Documentation clearly states v4 is the future
- Tooling ecosystem gradually migrates over years
- v3 never forcibly removed (too many integrations depend on it)How You Implement This with API Gateway
# Route v3 to legacy Lambda
resource "aws_apigatewayv2_route" "repos_v3" {
api_id = aws_apigatewayv2_api.main.id
route_key = "GET /repos/{owner}/{repo}"
target = "integrations/${aws_apigatewayv2_integration.repos_v3.id}"
}
# Route v4 to new Lambda
resource "aws_apigatewayv2_route" "graphql" {
api_id = aws_apigatewayv2_api.main.id
route_key = "POST /graphql"
target = "integrations/${aws_apigatewayv2_integration.graphql.id}"
}# v3 Lambda ā adds deprecation headers to every response
def handler(event, context):
data = get_repo_data(
event['pathParameters']['owner'],
event['pathParameters']['repo']
)
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json',
'Deprecation': 'true',
'Sunset': 'Sat, 01 Jan 2027 00:00:00 GMT',
'Link': '<https://docs.github.com/graphql>; rel="successor-version"'
},
'body': json.dumps(data)
}Deprecation header tells API clients: this version has an end date. Tooling that reads HTTP headers (like the GitHub CLI itself) can warn developers automatically: "You are using a deprecated API endpoint. Please migrate to the GraphQL API."
EXAMPLE 5: Slack ā Rate Limiting That Protects Fair Use
The Problem
Slack's API is used by 500,000 third-party apps. Some are well-behaved, sending a few requests per minute. Some are poorly built ā infinite loops, missing exponential backoff, polling every 100ms. A single misbehaving app should not be able to degrade the API for everyone else.
Their Rate Limiting Design
Tier 1 ā Infrequent operations (workspace settings, user info):
1 request/minute
Use case: read configuration that rarely changes
Tier 2 ā Standard operations (post message, list channels):
20 requests/minute
Use case: normal bot interaction
Tier 3 ā High-frequency operations (read messages, list users):
50 requests/minute
Use case: dashboard apps that poll for updates
Tier 4 ā Special endpoints (RTM, Events API):
100 requests/minute
Use case: real-time event processing
All tiers are per-token (per-app, per-workspace)
ā One bad app cannot affect any other app
ā Limits apply independently per workspace
(App A in Workspace X does not share quota with App A in Workspace Y)When limit exceeded:
HTTP 429 Too Many Requests
Retry-After: 30 ā client must wait 30 seconds before retrying
X-Rate-Limit-Limit: 20
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1745000060How You Build This With API Gateway
AWS REST API ā Usage Plans:
# Terraform ā create usage plans for different integration tiers
resource "aws_api_gateway_usage_plan" "standard_integration" {
name = "standard-integration"
api_stages {
api_id = aws_api_gateway_rest_api.main.id
stage = aws_api_gateway_stage.prod.stage_name
}
throttle_settings {
burst_limit = 10 # max concurrent requests
rate_limit = 20 # requests per second sustained
}
quota_settings {
limit = 10000 # requests per day
period = "DAY"
}
}
resource "aws_api_gateway_usage_plan" "enterprise_integration" {
name = "enterprise-integration"
throttle_settings {
burst_limit = 100
rate_limit = 200
}
quota_settings {
limit = 1000000
period = "DAY"
}
}# Lambda ā return proper 429 with Retry-After
def handler(event, context):
practice_id = event['requestContext']['authorizer']['practice_id']
# Check rate limit in Redis (for sub-second granularity)
key = f'rate_limit:{practice_id}:post_message'
current = redis.incr(key)
if current == 1:
redis.expire(key, 60) # 1-minute window
if current > 20: # 20 per minute limit
return {
'statusCode': 429,
'headers': {
'Retry-After': str(redis.ttl(key)),
'X-Rate-Limit-Limit': '20',
'X-Rate-Limit-Remaining': '0',
'X-Rate-Limit-Reset': str(int(time.time()) + redis.ttl(key))
},
'body': json.dumps({'error': 'Rate limit exceeded', 'retryAfter': redis.ttl(key)})
}
# Proceed with normal processing
return process_request(event)What breaks without Retry-After header:
A poorly built client hits 429, does not understand it, retries immediately, hits 429 again, retries again ā tight retry loop that hammers your API. With Retry-After: 30, any HTTP client library that respects the standard automatically waits 30 seconds. The flood stops.
EXAMPLE 6: Healthcare Platform ā HIPAA-Compliant API Gateway
The Problem
A healthcare SaaS serves 500 medical practices. Each practice's patient data must be completely isolated. An agent at Practice A must never be able to access Practice B's records ā even accidentally, even if they guess a patient ID.
The Architecture
Request arrives:
GET /patients/pat_12345/appointments
Authorization: Bearer eyJhbGc...
Step 1 ā API Gateway JWT Authoriser:
Validates JWT signature (Cognito public key)
Extracts claims:
practice_id: "practice_p001"
role: "scheduling_agent"
user_id: "user_abc"
Returns Allow + context { practice_id, role, user_id }
Step 2 ā Lambda receives enriched event:
event.requestContext.authorizer.practice_id = "practice_p001"
event.pathParameters.patientId = "pat_12345"
Step 3 ā Lambda isolation check (defence in depth):
patient = dynamodb.get_item(PK=f"PATIENT#{patient_id}")
# Verify the patient belongs to the requesting practice
# This catches bugs where API Gateway context is accidentally wrong
if patient['practiceId'] != event['requestContext']['authorizer']['practice_id']:
logger.warning("Cross-practice access attempt",
extra={'userId': authorizer['user_id'],
'requestedPatient': patient_id,
'agentPractice': authorizer['practice_id'],
'patientPractice': patient['practiceId']})
return {'statusCode': 403, 'body': json.dumps({'error': 'Forbidden'})}
return {'statusCode': 200, 'body': json.dumps(sanitise_patient(patient))}What sanitise_patient Looks Like
def sanitise_patient(patient: dict) -> dict:
"""
Remove fields the API should never return.
Even if someone adds a sensitive field to DynamoDB,
it will not appear in API responses unless explicitly added here.
"""
ALLOWED_FIELDS = {
'id', 'firstName', 'lastName', 'dateOfBirth',
'phone', 'email', 'insuranceProvider',
'nextAppointment', 'lastVisit'
}
return {k: v for k, v in patient.items() if k in ALLOWED_FIELDS}Why whitelist, not blacklist:
If you blacklist fields (exclude ssn, internalNotes), every new field added to DynamoDB is automatically exposed in the API until someone remembers to add it to the blacklist. One forgotten field = HIPAA breach. Whitelisting means new fields are never exposed until explicitly approved.
The CloudWatch Alarm That Catches Attacks
# CloudWatch metric filter ā detects cross-practice access attempts
resource "aws_cloudwatch_metric_filter" "cross_practice_attempt" {
name = "cross-practice-access-attempt"
pattern = "Cross-practice access attempt"
log_group_name = "/aws/lambda/get-patient"
metric_transformation {
name = "CrossPracticeAttempts"
namespace = "MyBCAT/Security"
value = "1"
}
}
resource "aws_cloudwatch_metric_alarm" "cross_practice_alert" {
alarm_name = "cross-practice-access-detected"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "CrossPracticeAttempts"
namespace = "MyBCAT/Security"
period = 300
statistic = "Sum"
threshold = 0 # ANY attempt triggers the alarm
alarm_actions = [aws_sns_topic.security_alerts.arn]
alarm_description = "An agent attempted to access another practice's patient data"
}Zero tolerance. One cross-practice access attempt ā security team is paged immediately. This is HIPAA audit logging in action.
EXAMPLE 7: E-Commerce ā CORS in Production
The Problem
An e-commerce company runs:
shop.example.comā the customer-facing storefrontapi.example.comā the backend APIadmin.example.comā the merchant dashboard
Their React frontend on shop.example.com makes API calls to api.example.com. For months everything works. Then marketing launches a promotion widget hosted on promo.example.com. The widget needs to call the same API. Suddenly: CORS errors.
The Exact Error (What Appears in the Browser Console)
Access to fetch at 'https://api.example.com/products' from origin
'https://promo.example.com' has been blocked by CORS policy:
No 'Access-Control-Allow-Origin' header is present on the requested resource.The Wrong Fix and The Right Fix
# WRONG FIX ā engineers add wildcard to make it work
allow_origins = ["*"]
# This works but now ANY website can make API calls using a user's cookie/session
# If a user is logged into shop.example.com, a malicious site can
# make authenticated requests on their behalf ā CSRF attack
# RIGHT FIX ā explicit list of allowed origins
allow_origins = [
"https://shop.example.com",
"https://admin.example.com",
"https://promo.example.com"
]
# Each new origin goes through a review process before being addedThe Preflight in Detail
Browser on promo.example.com tries to call api.example.com:
1. Browser sends preflight OPTIONS (automatically, before your fetch):
OPTIONS https://api.example.com/products
Origin: https://promo.example.com
Access-Control-Request-Method: GET
Access-Control-Request-Headers: Authorization
2. API Gateway checks: is promo.example.com in allow_origins?
YES ā respond with permissions:
HTTP 200
Access-Control-Allow-Origin: https://promo.example.com
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Headers: Authorization
Access-Control-Max-Age: 86400 ā browser caches this for 24 hours
no preflight needed for 24 hours
for the same origin+method+headers combo
NO ā respond with:
HTTP 403 (or just no CORS headers)
ā Browser blocks the actual request
ā Developer sees the CORS error
3. Browser receives permission ā sends actual GET request
GET https://api.example.com/products
Origin: https://promo.example.com
Authorization: Bearer eyJ...Access-Control-Max-Age: 86400 is important for performance. Without it, every single API call triggers a preflight OPTIONS request first ā doubling the number of requests. With a 24-hour cache, the preflight happens once per day per unique origin/method/header combination.
EXAMPLE 8: Twilio ā Async API Pattern for Long Operations
The Problem
Twilio lets you send bulk SMS to 100,000 numbers simultaneously. Each message involves: number lookup, carrier routing, rate compliance per country, and queuing with telco partners. Processing 100,000 messages takes minutes ā far beyond API Gateway's 29-second timeout.
The Pattern: Accept ā Process Async ā Poll
Step 1: Client submits bulk job
POST https://api.twilio.com/bulk-messages
Body: { "messages": [...100,000 messages...] }
ā
Lambda (runs in < 1 second):
1. Validates the request (auth, format, account limits)
2. Writes job to DynamoDB: { jobId: "job_abc", status: "QUEUED", total: 100000 }
3. Publishes to SQS queue for async processing
4. Returns immediately:
HTTP 202 Accepted
{ "jobId": "job_abc", "statusUrl": "/bulk-messages/job_abc/status" }
Step 2: Background processing (happens over next few minutes)
SQS ā Lambda workers (N parallel):
Each worker processes a batch of messages
Updates DynamoDB: { processed: 45000, failed: 12, status: "IN_PROGRESS" }
Step 3: Client polls for status
GET https://api.twilio.com/bulk-messages/job_abc/status
ā
Lambda (runs in < 100ms):
Reads DynamoDB: { processed: 67000, failed: 18, status: "IN_PROGRESS" }
Returns:
HTTP 200
{ "status": "IN_PROGRESS", "processed": 67000, "total": 100000, "failed": 18 }
Step 4: Job completes
GET .../status
Returns:
HTTP 200
{ "status": "COMPLETED", "processed": 100000, "failed": 23,
"failedNumbers": [...], "completedAt": "2026-04-21T14:23:01Z" }The Code
# Submission Lambda ā fast, returns 202
def submit_bulk_job(event, context):
body = json.loads(event['body'])
user_id = event['requestContext']['authorizer']['userId']
# Validate
if len(body['messages']) > 1_000_000:
return {'statusCode': 400, 'body': json.dumps({'error': 'Max 1,000,000 messages per job'})}
# Create job record
job_id = str(uuid.uuid4())
dynamodb.put_item(Item={
'PK': f'JOB#{job_id}',
'userId': user_id,
'status': 'QUEUED',
'total': len(body['messages']),
'processed': 0,
'failed': 0,
'submittedAt': datetime.utcnow().isoformat()
})
# Queue for async processing ā batched to stay under SQS 256KB limit
batches = chunk(body['messages'], size=1000)
for i, batch in enumerate(batches):
sqs.send_message(
QueueUrl=PROCESSING_QUEUE_URL,
MessageBody=json.dumps({'jobId': job_id, 'batchIndex': i, 'messages': batch})
)
# Return immediately ā do not wait for processing
return {
'statusCode': 202,
'body': json.dumps({
'jobId': job_id,
'status': 'QUEUED',
'statusUrl': f'/bulk-messages/{job_id}/status'
})
}
# Status Lambda ā fast read from DynamoDB
def get_job_status(event, context):
job_id = event['pathParameters']['jobId']
user_id = event['requestContext']['authorizer']['userId']
job = dynamodb.get_item(Key={'PK': f'JOB#{job_id}'}).get('Item')
if not job:
return {'statusCode': 404, 'body': json.dumps({'error': 'Job not found'})}
# Prevent one user seeing another user's job
if job['userId'] != user_id:
return {'statusCode': 403, 'body': json.dumps({'error': 'Forbidden'})}
return {'statusCode': 200, 'body': json.dumps({
'jobId': job_id,
'status': job['status'],
'total': job['total'],
'processed': job['processed'],
'failed': job['failed'],
'completedAt': job.get('completedAt')
})}React client with automatic polling:
function BulkMessageStatus({ jobId }: { jobId: string }) {
const { data } = useQuery({
queryKey: ['bulk-job', jobId],
queryFn: () => fetchJobStatus(jobId),
// Polls every 2 seconds while in progress, stops when done
refetchInterval: (data) =>
data?.status === 'COMPLETED' || data?.status === 'FAILED' ? false : 2000,
});
if (!data) return <Spinner />;
if (data.status === 'COMPLETED') return <SuccessPanel job={data} />;
if (data.status === 'FAILED') return <ErrorPanel job={data} />;
return (
<ProgressBar
value={data.processed}
max={data.total}
label={`${data.processed.toLocaleString()} / ${data.total.toLocaleString()} sent`}
/>
);
}React Query automatically stops polling when the job completes. No clearInterval needed. No memory leaks.
EXAMPLE 9: Azure API Management ā Policies in Production
Real Scenario: A Bank's Open Banking API
Under PSD2 (European banking regulation), banks must expose APIs for third-party financial apps. HSBC's Open Banking API must:
- Rate limit per registered application
- Validate the client certificate (mTLS) ā not just a JWT
- Transform responses to remove internal account codes
- Add mandatory regulatory headers
- Log every request with a correlation ID for audit
This is too complex for AWS API Gateway. Azure API Management handles all of it via policies:
<!-- APIM Policy ā applied to every request -->
<policies>
<!-- ā INBOUND ā runs before the backend is called -->
<inbound>
<!-- Validate client certificate (mTLS) ā PSD2 requires this -->
<validate-client-certificate
validate-revocation="true"
validate-trust="true"
validate-not-before="true"
validate-not-after="true" />
<!-- Rate limit per registered TPP (Third Party Provider) application -->
<rate-limit-by-key
calls="500"
renewal-period="60"
counter-key="@(context.Request.Certificate?.Thumbprint ?? context.Request.IpAddress)"
increment-condition="@(context.Response.StatusCode < 500)" />
<!-- Generate correlation ID for audit trail (regulatory requirement) -->
<set-header name="X-Correlation-Id" exists-action="skip">
<value>@(Guid.NewGuid().ToString())</value>
</set-header>
<!-- Validate the OAuth2 token -->
<validate-jwt header-name="Authorization"
failed-validation-httpcode="401"
failed-validation-error-message="Invalid or expired access token">
<openid-config url="https://login.hsbc.com/.well-known/openid-configuration" />
<required-claims>
<claim name="scope" match="any">
<value>accounts:read</value>
<value>payments:write</value>
</claim>
</required-claims>
</validate-jwt>
</inbound>
<!-- ā” BACKEND ā the actual call to the backend service -->
<backend>
<retry condition="@(context.Response.StatusCode >= 500)"
count="3"
interval="2">
<forward-request />
</retry>
</backend>
<!-- ⢠OUTBOUND ā runs on the backend response -->
<outbound>
<!-- Remove internal account codes ā clients see IBAN only -->
<json-to-xml apply="always" consider-accept-header="false" />
<xsl-transform>
<xsl:stylesheet version="1.0">
<xsl:template match="@internalAccountCode|@legacySortCode" />
</xsl:stylesheet>
</xsl-transform>
<xml-to-json kind="direct" apply="always" consider-accept-header="false" />
<!-- Add mandatory PSD2 regulatory headers -->
<set-header name="X-Request-Id" exists-action="override">
<value>@(context.RequestId)</value>
</set-header>
<set-header name="X-Correlation-Id" exists-action="override">
<value>@(context.Request.Headers.GetValueOrDefault("X-Correlation-Id", ""))</value>
</set-header>
</outbound>
<!-- ⣠ERROR ā runs when backend throws an exception -->
<on-error>
<!-- Never return raw backend errors to external clients -->
<return-response>
<set-status code="500" reason="Internal Server Error" />
<set-body>@{
return new JObject(
new JProperty("error", "An unexpected error occurred"),
new JProperty("correlationId", context.Request.Headers.GetValueOrDefault("X-Correlation-Id", ""))
).ToString();
}</set-body>
</return-response>
</on-error>
</policies>This handles mTLS, rate limiting, JWT validation, retry, response transformation, regulatory headers, and error sanitisation ā all without any backend code change.
EXAMPLE 10: MyBCAT ā The Complete API Gateway Setup
Putting it all together for the actual healthcare platform from the job description:
Request Flow: Agent Books an Appointment
Sarah (scheduling agent, Practice A) opens the MyBCAT dashboard
and books a patient appointment for Thursday 10am.
1. React app calls:
POST https://api.mybcat.com/appointments
Headers:
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9... (Cognito JWT)
Content-Type: application/json
X-Idempotency-Key: client-generated-uuid (prevents double-submit)
Body: { "patientId": "pat_789", "slotId": "slot_20260424_1000", "type": "comprehensive_exam" }
2. CloudFront receives request:
- Is this origin allowed? ā app.mybcat.com ā yes
- WAF check: SQL injection in body? ā no
- WAF check: IP rate limit exceeded? ā no
- Forwards to API Gateway
3. API Gateway HTTP API:
- Route match: POST /appointments ā yes, exists
- JWT validation: decode token, check Cognito signature ā valid
- Token expiry check: exp = 1745003600, now = 1745000000 ā not expired
- Extracts authorizer context:
{ practice_id: "practice_p001", role: "scheduling_agent", user_id: "user_sarah" }
- Invokes book-appointment Lambda
4. book-appointment Lambda:
a. Read idempotency key from header
b. Check DynamoDB: has this idempotency key been processed?
ā No ā proceed
c. Check slot availability (ConsistentRead=True):
slot.status = "available" ā proceed
d. DynamoDB TransactWriteItems (atomic):
- Update slot: status "available" ā "booked"
WITH ConditionExpression: status = "available" (prevents race condition)
- Create appointment record
- Write idempotency key with TTL=24h (prevents retry creating duplicate)
e. Publish to EventBridge:
{ type: "APPOINTMENT_BOOKED", practiceId: "practice_p001",
patientId: "pat_789", slotId: "slot_20260424_1000" }
f. Return:
HTTP 201 Created
{ "appointmentId": "appt_abc123", "status": "confirmed",
"date": "2026-04-24", "time": "10:00", "type": "comprehensive_exam" }
5. EventBridge routes event to 3 SQS queues simultaneously:
- PatientNotificationQueue ā Lambda sends SMS confirmation to patient
- CRMQueue ā Lambda updates HubSpot with appointment details
- AnalyticsQueue ā Lambda increments practice's daily booking counter
6. React app receives 201:
- React Query invalidates ["appointments", practiceId] cache
- Calendar view refetches and shows the new appointment
- Sarah sees the booking confirmed instantlyWhat Happens When Sarah Accidentally Clicks "Book" Twice
First click (t=0ms):
POST /appointments, X-Idempotency-Key: "key_abc123"
Lambda checks DynamoDB: key_abc123 not found
DynamoDB transaction succeeds: slot booked
DynamoDB writes: { idempotency_key: "key_abc123", result: {...}, ttl: now+24h }
Response: 201 Created { appointmentId: "appt_abc123" }
Second click (t=200ms ā double-click):
POST /appointments, X-Idempotency-Key: "key_abc123" (same key ā same request)
Lambda checks DynamoDB: key_abc123 FOUND
Returns cached result immediately: 201 Created { appointmentId: "appt_abc123" }
No DynamoDB transaction ā no second booking
Result: one appointment, not two
Cost: one DynamoDB read for the duplicate
Sarah sees: "Appointment confirmed" ā same result, no errorThe Mental Model ā What Every Example Has in Common
Looking across all ten examples, the same principles appear every time:
1. VALIDATE EARLY, FAIL FAST
Bad tokens, bad signatures, bad schemas ā reject at the gateway
Never let invalid requests consume Lambda compute or database capacity
(Stripe signature check, JWT validation, schema validation)
2. RETURN IMMEDIATELY FOR LONG OPERATIONS
API Gateway timeout is 29 seconds
Anything longer ā 202 Accepted + async processing + polling endpoint
(Twilio bulk SMS, any report generation, any batch operation)
3. PROTECT THE CRITICAL PATH WITH RESERVED CONCURRENCY
Your booking Lambda must always have headroom
Other Lambdas must not be able to starve it
(Netflix auth, Uber location, healthcare booking)
4. ISOLATE FAILURES WITH QUEUES
Fan-out via SNS ā SQS means one consumer's failure is contained
A slow CRM integration cannot delay a patient booking confirmation
(MyBCAT post-booking, NHS patient events, Deliveroo dispatch)
5. NEVER EXPOSE INTERNAL ERRORS
Stack traces, table names, column names ā attacker intelligence
Sanitise all error responses at the gateway or Lambda boundary
Always include a correlationId so you can find the real error in CloudWatch
(Healthcare HIPAA, Open Banking PSD2, every public API)
6. CACHE AGGRESSIVELY, INVALIDATE PRECISELY
Authoriser cache: 300s ā saves millions in Lambda invocations
Response cache: minutes for stable data
React Query: invalidate specific queryKeys on mutation, not the whole cache
(Netflix auth, insurance plans, GitHub repo data)These are not AWS or Azure specifics. They are the principles behind every well-designed API Gateway, regardless of the cloud provider.
WebSocket & Real-Time Knowledge Check
5 questions Ā· Test what you just learned Ā· Instant explanations
Enjoyed this article?
Explore the Backend Systems learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.