API Gateway in Production: Real Examples from Real Companies

Why Real Examples Matter

You can read every AWS doc and still freeze when an interviewer asks: "Tell me about a time API Gateway caused a production issue." Real examples give you the mental model to both explain concepts and reason about new problems you have never seen before.

Every example in this article follows the same structure:

Company / Scenario
  → The problem they had
  → The API Gateway feature that solved it
  → The exact flow (with code)
  → What breaks if you get it wrong

EXAMPLE 1: Stripe — Webhook Delivery at Scale

The Problem

Stripe sends webhooks to millions of merchant endpoints when payments succeed, fail, or are disputed. A merchant's server might be slow, down for maintenance, or returning 500 errors. Stripe cannot lose these events — a missed "payment succeeded" webhook means a merchant never ships the customer's order.

Stripe's webhook infrastructure is essentially a massive API Gateway problem: route one event to one endpoint, guarantee delivery, handle failures gracefully.

The Flow

Payment event occurs (charge.succeeded)
          ↓
Stripe internal event bus
          ↓
Webhook delivery service (API Gateway equivalent):
  1. POST https://merchant.com/webhooks/stripe
     Headers:
       Stripe-Signature: t=1745000000,v1=abc123...  ← HMAC signature
       Content-Type: application/json
     Body: { "type": "charge.succeeded", "data": { ... } }
          ↓
  2. Merchant server validates signature:
       secret = "whsec_abc123..."
       expected = HMAC-SHA256(secret, timestamp + "." + body)
       if expected != header_signature → return 400 (reject)
       else → process the event → return 200
          ↓
  3. If merchant returns non-200:
       Retry with exponential backoff:
       5s → 30s → 1min → 5min → 30min → 2h → 5h → 10h → 24h → 48h → 72h
       Total: 11 attempts over 72 hours
          ↓
  4. If all retries fail:
       Event moves to dead letter queue
       Merchant is notified via email and dashboard
       Merchant can manually replay from Stripe dashboard

What You Learn From This

HMAC signature validation is your API Gateway's equivalent of a Lambda Authoriser for unauthenticated webhooks. Your endpoint cannot use JWT (Stripe does not log in to your system). Instead:

Stripe signs the payload with a secret you share
You recompute the signature on your side
If they match, the request genuinely came from Stripe

Python

# Correct webhook validation in your Lambda
import hmac
import hashlib

def handler(event, context):
    # Never skip this check — without it, anyone can fake a payment event
    stripe_sig = event['headers'].get('stripe-signature', '')
    payload = event['body']
    webhook_secret = get_secret('mybcat/prod/stripe_webhook_secret')

    try:
        stripe.WebhookSignature.verify_header(payload, stripe_sig, webhook_secret)
    except stripe.error.SignatureVerificationError:
        return {'statusCode': 400, 'body': 'Invalid signature'}

    # Safe to process — this genuinely came from Stripe
    event_data = json.loads(payload)
    if event_data['type'] == 'invoice.payment_succeeded':
        activate_practice_subscription(event_data['data']['object'])

    return {'statusCode': 200}

Without signature validation:

A bad actor discovers your webhook URL. They POST a fake invoice.payment_succeeded event. Your system activates a subscription that was never paid for. HMAC validation costs 0.1ms and prevents this entirely.

EXAMPLE 2: Uber — Real-Time Driver Location via WebSocket

The Problem

Uber's rider app shows your driver moving on a map in real time. The driver's location updates every 4 seconds. With 5 million concurrent trips, polling — where the rider app asks "where is my driver?" every 4 seconds — would be 5 million × 15 requests/minute = 75 million requests/minute. Expensive, slow, and wasteful.

WebSocket changes this: the driver app sends location updates, the server pushes them to the specific rider app. No polling.

The Flow

Driver app (phone):
  Sends location every 4 seconds via WebSocket:
  { "type": "location_update", "lat": 51.507, "lng": -0.127, "tripId": "trip_789" }
          ↓
Driver WebSocket → API Gateway WebSocket API → location-update Lambda
  Lambda:
    1. Gets driver's connectionId from event
    2. Gets tripId from message
    3. Writes to DynamoDB: { PK: "TRIP#trip_789", lat: 51.507, lng: -0.127, updatedAt: now }
    4. Gets rider's connectionId for this trip from DynamoDB
    5. Pushes location to rider via API Gateway Management API
          ↓
Rider app (phone):
  Receives pushed message: { "lat": 51.507, "lng": -0.127, "eta": "4 min" }
  Map updates — driver icon moves
  No request was made by the rider app

Full Code — WebSocket Location Push

Python

# location_update Lambda — runs when driver sends location
import boto3
import json

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('uber-trips')

def handler(event, context):
    connection_id = event['requestContext']['connectionId']
    body = json.loads(event['body'])

    trip_id = body['tripId']
    lat = body['lat']
    lng = body['lng']

    # Store latest driver location
    table.update_item(
        Key={'PK': f'TRIP#{trip_id}', 'SK': 'LOCATION'},
        UpdateExpression='SET lat = :lat, lng = :lng, updatedAt = :ts',
        ExpressionAttributeValues={
            ':lat': str(lat), ':lng': str(lng),
            ':ts': int(time.time())
        }
    )

    # Get rider's WebSocket connection for this trip
    trip = table.get_item(
        Key={'PK': f'TRIP#{trip_id}', 'SK': 'METADATA'}
    ).get('Item')

    rider_connection = trip.get('riderConnectionId')
    if not rider_connection:
        return {'statusCode': 200}  # rider app not connected right now

    # Push location to rider — no rider request needed
    apigw = boto3.client(
        'apigatewaymanagementapi',
        endpoint_url=f"https://{event['requestContext']['domainName']}/{event['requestContext']['stage']}"
    )

    try:
        apigw.post_to_connection(
            ConnectionId=rider_connection,
            Data=json.dumps({
                'type': 'driver_location',
                'lat': lat,
                'lng': lng,
                'tripId': trip_id
            })
        )
    except apigw.exceptions.GoneException:
        # Rider's app was backgrounded or closed — remove stale connection
        table.update_item(
            Key={'PK': f'TRIP#{trip_id}', 'SK': 'METADATA'},
            UpdateExpression='REMOVE riderConnectionId'
        )

    return {'statusCode': 200}

What Breaks Without GoneException Handling

The rider closes their app. The WebSocket connection is gone. Your Lambda still tries to post_to_connection to the closed connectionId. Without catching GoneException, Lambda throws an unhandled exception, the error propagates, and DynamoDB has a stale connectionId that causes errors on every future location update for that trip.

Always catch GoneException and clean up stale connections.

EXAMPLE 3: Netflix — Authoriser Caching Saves Millions of Dollars

The Problem

Netflix has 250 million subscribers. Every API call includes a JWT. Without authoriser caching:

250 million users × 50 API calls per viewing session
= 12.5 billion JWT validation calls per day
= ~145,000 Lambda authoriser invocations per second at peak

At $0.0000002/invocation:
= $2,500/day just for JWT validation
= $900,000/year

The Solution: Authoriser Cache TTL = 300 Seconds

First request with token X:
  API Gateway calls Authoriser Lambda
  Lambda validates JWT, returns Allow policy
  API Gateway caches: { tokenHash: "abc123", policy: Allow, userId: "u123", until: now+300s }

Next 1,000 requests with token X within 5 minutes:
  API Gateway checks cache → hit → returns Allow immediately
  Authoriser Lambda: NOT called
  Latency: 0ms additional
  Cost: $0 for those 1,000 validations

Reality: at 300s cache TTL, most JWTs are validated once per 5-minute window

Python

# Netflix-style Lambda Authoriser with context enrichment
def handler(event, context):
    token = event['authorizationToken'].replace('Bearer ', '')

    # Validate JWT locally — no external call, fast
    try:
        payload = jwt.decode(
            token,
            get_public_key(),     # cached in Lambda memory after first fetch
            algorithms=['RS256'],
            audience='netflix-api'
        )
    except jwt.ExpiredSignatureError:
        raise Exception('Unauthorized')
    except jwt.InvalidTokenError:
        raise Exception('Unauthorized')

    # Return policy + rich context — downstream Lambdas get this for free
    # No database call needed in downstream Lambda to get user details
    return {
        'principalId': payload['sub'],
        'policyDocument': allow_policy(event['methodArn']),
        'context': {
            'userId': payload['sub'],
            'plan': payload.get('plan', 'standard'),      # basic/standard/premium
            'region': payload.get('region', 'us-east-1'), # for content licensing
            'deviceId': payload.get('device_id', ''),
        }
    }

What downstream Lambda receives:

Python

def get_recommended_content(event, context):
    # These came from the authoriser context — zero extra validation needed
    user_id = event['requestContext']['authorizer']['userId']
    plan = event['requestContext']['authorizer']['plan']
    region = event['requestContext']['authorizer']['region']

    if plan == 'basic':
        return get_sd_content(user_id, region)     # SD only for basic plan
    elif plan == 'standard':
        return get_hd_content(user_id, region)     # HD for standard
    else:
        return get_4k_content(user_id, region)     # 4K for premium

No database call in the recommendation Lambda to determine the user's plan. The authoriser already did that work, cached it, and passed it forward.

EXAMPLE 4: GitHub — API Versioning That Does Not Break Integrations

The Problem

GitHub's REST API v3 has been in production since 2012. Millions of integrations depend on it. In 2020, GitHub launched GraphQL API v4 — a complete redesign. They cannot deprecate v3 overnight without breaking:

Every CI/CD tool
Every GitHub Action
Every developer tool
Every enterprise integration

Their Versioning Strategy

https://api.github.com/repos/{owner}/{repo}           → REST API v3 (still works)
https://api.github.com/graphql                        → GraphQL API v4

Both routes live behind the same API Gateway.
Both are maintained simultaneously.
v3 gets security patches but no new features.
v4 gets all new features.

Migration path:
  - Deprecation notices added to v3 response headers
  - Documentation clearly states v4 is the future
  - Tooling ecosystem gradually migrates over years
  - v3 never forcibly removed (too many integrations depend on it)

How You Implement This with API Gateway

HCL

# Route v3 to legacy Lambda
resource "aws_apigatewayv2_route" "repos_v3" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "GET /repos/{owner}/{repo}"
  target    = "integrations/${aws_apigatewayv2_integration.repos_v3.id}"
}

# Route v4 to new Lambda
resource "aws_apigatewayv2_route" "graphql" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "POST /graphql"
  target    = "integrations/${aws_apigatewayv2_integration.graphql.id}"
}

Python

# v3 Lambda — adds deprecation headers to every response
def handler(event, context):
    data = get_repo_data(
        event['pathParameters']['owner'],
        event['pathParameters']['repo']
    )
    return {
        'statusCode': 200,
        'headers': {
            'Content-Type': 'application/json',
            'Deprecation': 'true',
            'Sunset': 'Sat, 01 Jan 2027 00:00:00 GMT',
            'Link': '<https://docs.github.com/graphql>; rel="successor-version"'
        },
        'body': json.dumps(data)
    }

Deprecation header tells API clients: this version has an end date. Tooling that reads HTTP headers (like the GitHub CLI itself) can warn developers automatically: "You are using a deprecated API endpoint. Please migrate to the GraphQL API."

EXAMPLE 5: Slack — Rate Limiting That Protects Fair Use

The Problem

Slack's API is used by 500,000 third-party apps. Some are well-behaved, sending a few requests per minute. Some are poorly built — infinite loops, missing exponential backoff, polling every 100ms. A single misbehaving app should not be able to degrade the API for everyone else.

Their Rate Limiting Design

Tier 1 — Infrequent operations (workspace settings, user info):
  1 request/minute
  Use case: read configuration that rarely changes

Tier 2 — Standard operations (post message, list channels):
  20 requests/minute
  Use case: normal bot interaction

Tier 3 — High-frequency operations (read messages, list users):
  50 requests/minute
  Use case: dashboard apps that poll for updates

Tier 4 — Special endpoints (RTM, Events API):
  100 requests/minute
  Use case: real-time event processing

All tiers are per-token (per-app, per-workspace)
→ One bad app cannot affect any other app
→ Limits apply independently per workspace
   (App A in Workspace X does not share quota with App A in Workspace Y)

When limit exceeded:
  HTTP 429 Too Many Requests
  Retry-After: 30  ← client must wait 30 seconds before retrying
  X-Rate-Limit-Limit: 20
  X-Rate-Limit-Remaining: 0
  X-Rate-Limit-Reset: 1745000060

How You Build This With API Gateway

AWS REST API — Usage Plans:

Python

# Terraform — create usage plans for different integration tiers
resource "aws_api_gateway_usage_plan" "standard_integration" {
  name = "standard-integration"

  api_stages {
    api_id = aws_api_gateway_rest_api.main.id
    stage  = aws_api_gateway_stage.prod.stage_name
  }

  throttle_settings {
    burst_limit = 10   # max concurrent requests
    rate_limit  = 20   # requests per second sustained
  }

  quota_settings {
    limit  = 10000    # requests per day
    period = "DAY"
  }
}

resource "aws_api_gateway_usage_plan" "enterprise_integration" {
  name = "enterprise-integration"

  throttle_settings {
    burst_limit = 100
    rate_limit  = 200
  }

  quota_settings {
    limit  = 1000000
    period = "DAY"
  }
}

Python

# Lambda — return proper 429 with Retry-After
def handler(event, context):
    practice_id = event['requestContext']['authorizer']['practice_id']

    # Check rate limit in Redis (for sub-second granularity)
    key = f'rate_limit:{practice_id}:post_message'
    current = redis.incr(key)
    if current == 1:
        redis.expire(key, 60)  # 1-minute window

    if current > 20:  # 20 per minute limit
        return {
            'statusCode': 429,
            'headers': {
                'Retry-After': str(redis.ttl(key)),
                'X-Rate-Limit-Limit': '20',
                'X-Rate-Limit-Remaining': '0',
                'X-Rate-Limit-Reset': str(int(time.time()) + redis.ttl(key))
            },
            'body': json.dumps({'error': 'Rate limit exceeded', 'retryAfter': redis.ttl(key)})
        }

    # Proceed with normal processing
    return process_request(event)

What breaks without Retry-After header:

A poorly built client hits 429, does not understand it, retries immediately, hits 429 again, retries again — tight retry loop that hammers your API. With Retry-After: 30, any HTTP client library that respects the standard automatically waits 30 seconds. The flood stops.

EXAMPLE 6: Healthcare Platform — HIPAA-Compliant API Gateway

The Problem

A healthcare SaaS serves 500 medical practices. Each practice's patient data must be completely isolated. An agent at Practice A must never be able to access Practice B's records — even accidentally, even if they guess a patient ID.

The Architecture

Request arrives:
  GET /patients/pat_12345/appointments
  Authorization: Bearer eyJhbGc...

Step 1 — API Gateway JWT Authoriser:
  Validates JWT signature (Cognito public key)
  Extracts claims:
    practice_id: "practice_p001"
    role: "scheduling_agent"
    user_id: "user_abc"
  Returns Allow + context { practice_id, role, user_id }

Step 2 — Lambda receives enriched event:
  event.requestContext.authorizer.practice_id = "practice_p001"
  event.pathParameters.patientId = "pat_12345"

Step 3 — Lambda isolation check (defence in depth):
  patient = dynamodb.get_item(PK=f"PATIENT#{patient_id}")
  
  # Verify the patient belongs to the requesting practice
  # This catches bugs where API Gateway context is accidentally wrong
  if patient['practiceId'] != event['requestContext']['authorizer']['practice_id']:
      logger.warning("Cross-practice access attempt",
                     extra={'userId': authorizer['user_id'],
                            'requestedPatient': patient_id,
                            'agentPractice': authorizer['practice_id'],
                            'patientPractice': patient['practiceId']})
      return {'statusCode': 403, 'body': json.dumps({'error': 'Forbidden'})}

  return {'statusCode': 200, 'body': json.dumps(sanitise_patient(patient))}

What `sanitise_patient` Looks Like

Python

def sanitise_patient(patient: dict) -> dict:
    """
    Remove fields the API should never return.
    Even if someone adds a sensitive field to DynamoDB,
    it will not appear in API responses unless explicitly added here.
    """
    ALLOWED_FIELDS = {
        'id', 'firstName', 'lastName', 'dateOfBirth',
        'phone', 'email', 'insuranceProvider',
        'nextAppointment', 'lastVisit'
    }
    return {k: v for k, v in patient.items() if k in ALLOWED_FIELDS}

Why whitelist, not blacklist:

If you blacklist fields (exclude ssn, internalNotes), every new field added to DynamoDB is automatically exposed in the API until someone remembers to add it to the blacklist. One forgotten field = HIPAA breach. Whitelisting means new fields are never exposed until explicitly approved.

The CloudWatch Alarm That Catches Attacks

Python

# CloudWatch metric filter — detects cross-practice access attempts
resource "aws_cloudwatch_metric_filter" "cross_practice_attempt" {
  name           = "cross-practice-access-attempt"
  pattern        = "Cross-practice access attempt"
  log_group_name = "/aws/lambda/get-patient"

  metric_transformation {
    name      = "CrossPracticeAttempts"
    namespace = "MyBCAT/Security"
    value     = "1"
  }
}

resource "aws_cloudwatch_metric_alarm" "cross_practice_alert" {
  alarm_name          = "cross-practice-access-detected"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "CrossPracticeAttempts"
  namespace           = "MyBCAT/Security"
  period              = 300
  statistic           = "Sum"
  threshold           = 0  # ANY attempt triggers the alarm
  alarm_actions       = [aws_sns_topic.security_alerts.arn]
  alarm_description   = "An agent attempted to access another practice's patient data"
}

Zero tolerance. One cross-practice access attempt → security team is paged immediately. This is HIPAA audit logging in action.

EXAMPLE 7: E-Commerce — CORS in Production

The Problem

An e-commerce company runs:

shop.example.com — the customer-facing storefront
api.example.com — the backend API
admin.example.com — the merchant dashboard

Their React frontend on shop.example.com makes API calls to api.example.com. For months everything works. Then marketing launches a promotion widget hosted on promo.example.com. The widget needs to call the same API. Suddenly: CORS errors.

The Exact Error (What Appears in the Browser Console)

Access to fetch at 'https://api.example.com/products' from origin 
'https://promo.example.com' has been blocked by CORS policy: 
No 'Access-Control-Allow-Origin' header is present on the requested resource.

The Wrong Fix and The Right Fix

Python

# WRONG FIX — engineers add wildcard to make it work
allow_origins = ["*"]

# This works but now ANY website can make API calls using a user's cookie/session
# If a user is logged into shop.example.com, a malicious site can
# make authenticated requests on their behalf — CSRF attack

# RIGHT FIX — explicit list of allowed origins
allow_origins = [
    "https://shop.example.com",
    "https://admin.example.com",
    "https://promo.example.com"
]
# Each new origin goes through a review process before being added

The Preflight in Detail

Browser on promo.example.com tries to call api.example.com:

1. Browser sends preflight OPTIONS (automatically, before your fetch):
   OPTIONS https://api.example.com/products
   Origin: https://promo.example.com
   Access-Control-Request-Method: GET
   Access-Control-Request-Headers: Authorization

2. API Gateway checks: is promo.example.com in allow_origins?
   YES → respond with permissions:
   HTTP 200
   Access-Control-Allow-Origin: https://promo.example.com
   Access-Control-Allow-Methods: GET, POST
   Access-Control-Allow-Headers: Authorization
   Access-Control-Max-Age: 86400  ← browser caches this for 24 hours
                                     no preflight needed for 24 hours
                                     for the same origin+method+headers combo

   NO → respond with:
   HTTP 403 (or just no CORS headers)
   → Browser blocks the actual request
   → Developer sees the CORS error

3. Browser receives permission → sends actual GET request
   GET https://api.example.com/products
   Origin: https://promo.example.com
   Authorization: Bearer eyJ...

Access-Control-Max-Age: 86400 is important for performance. Without it, every single API call triggers a preflight OPTIONS request first — doubling the number of requests. With a 24-hour cache, the preflight happens once per day per unique origin/method/header combination.

EXAMPLE 8: Twilio — Async API Pattern for Long Operations

The Problem

Twilio lets you send bulk SMS to 100,000 numbers simultaneously. Each message involves: number lookup, carrier routing, rate compliance per country, and queuing with telco partners. Processing 100,000 messages takes minutes — far beyond API Gateway's 29-second timeout.

The Pattern: Accept → Process Async → Poll

Step 1: Client submits bulk job
  POST https://api.twilio.com/bulk-messages
  Body: { "messages": [...100,000 messages...] }
          ↓
  Lambda (runs in < 1 second):
    1. Validates the request (auth, format, account limits)
    2. Writes job to DynamoDB: { jobId: "job_abc", status: "QUEUED", total: 100000 }
    3. Publishes to SQS queue for async processing
    4. Returns immediately:
       HTTP 202 Accepted
       { "jobId": "job_abc", "statusUrl": "/bulk-messages/job_abc/status" }

Step 2: Background processing (happens over next few minutes)
  SQS → Lambda workers (N parallel):
    Each worker processes a batch of messages
    Updates DynamoDB: { processed: 45000, failed: 12, status: "IN_PROGRESS" }

Step 3: Client polls for status
  GET https://api.twilio.com/bulk-messages/job_abc/status
          ↓
  Lambda (runs in < 100ms):
    Reads DynamoDB: { processed: 67000, failed: 18, status: "IN_PROGRESS" }
    Returns:
      HTTP 200
      { "status": "IN_PROGRESS", "processed": 67000, "total": 100000, "failed": 18 }

Step 4: Job completes
  GET .../status
    Returns:
      HTTP 200
      { "status": "COMPLETED", "processed": 100000, "failed": 23,
        "failedNumbers": [...], "completedAt": "2026-04-21T14:23:01Z" }

The Code

Python

# Submission Lambda — fast, returns 202
def submit_bulk_job(event, context):
    body = json.loads(event['body'])
    user_id = event['requestContext']['authorizer']['userId']

    # Validate
    if len(body['messages']) > 1_000_000:
        return {'statusCode': 400, 'body': json.dumps({'error': 'Max 1,000,000 messages per job'})}

    # Create job record
    job_id = str(uuid.uuid4())
    dynamodb.put_item(Item={
        'PK': f'JOB#{job_id}',
        'userId': user_id,
        'status': 'QUEUED',
        'total': len(body['messages']),
        'processed': 0,
        'failed': 0,
        'submittedAt': datetime.utcnow().isoformat()
    })

    # Queue for async processing — batched to stay under SQS 256KB limit
    batches = chunk(body['messages'], size=1000)
    for i, batch in enumerate(batches):
        sqs.send_message(
            QueueUrl=PROCESSING_QUEUE_URL,
            MessageBody=json.dumps({'jobId': job_id, 'batchIndex': i, 'messages': batch})
        )

    # Return immediately — do not wait for processing
    return {
        'statusCode': 202,
        'body': json.dumps({
            'jobId': job_id,
            'status': 'QUEUED',
            'statusUrl': f'/bulk-messages/{job_id}/status'
        })
    }


# Status Lambda — fast read from DynamoDB
def get_job_status(event, context):
    job_id = event['pathParameters']['jobId']
    user_id = event['requestContext']['authorizer']['userId']

    job = dynamodb.get_item(Key={'PK': f'JOB#{job_id}'}).get('Item')

    if not job:
        return {'statusCode': 404, 'body': json.dumps({'error': 'Job not found'})}

    # Prevent one user seeing another user's job
    if job['userId'] != user_id:
        return {'statusCode': 403, 'body': json.dumps({'error': 'Forbidden'})}

    return {'statusCode': 200, 'body': json.dumps({
        'jobId': job_id,
        'status': job['status'],
        'total': job['total'],
        'processed': job['processed'],
        'failed': job['failed'],
        'completedAt': job.get('completedAt')
    })}

React client with automatic polling:

TSX

function BulkMessageStatus({ jobId }: { jobId: string }) {
  const { data } = useQuery({
    queryKey: ['bulk-job', jobId],
    queryFn: () => fetchJobStatus(jobId),
    // Polls every 2 seconds while in progress, stops when done
    refetchInterval: (data) =>
      data?.status === 'COMPLETED' || data?.status === 'FAILED' ? false : 2000,
  });

  if (!data) return <Spinner />;
  if (data.status === 'COMPLETED') return <SuccessPanel job={data} />;
  if (data.status === 'FAILED') return <ErrorPanel job={data} />;

  return (
    <ProgressBar
      value={data.processed}
      max={data.total}
      label={`${data.processed.toLocaleString()} / ${data.total.toLocaleString()} sent`}
    />
  );
}

React Query automatically stops polling when the job completes. No clearInterval needed. No memory leaks.

EXAMPLE 9: Azure API Management — Policies in Production

Real Scenario: A Bank's Open Banking API

Under PSD2 (European banking regulation), banks must expose APIs for third-party financial apps. HSBC's Open Banking API must:

Rate limit per registered application
Validate the client certificate (mTLS) — not just a JWT
Transform responses to remove internal account codes
Add mandatory regulatory headers
Log every request with a correlation ID for audit

This is too complex for AWS API Gateway. Azure API Management handles all of it via policies:

XML

<!-- APIM Policy — applied to every request -->
<policies>

  <!-- ① INBOUND — runs before the backend is called -->
  <inbound>

    <!-- Validate client certificate (mTLS) — PSD2 requires this -->
    <validate-client-certificate
      validate-revocation="true"
      validate-trust="true"
      validate-not-before="true"
      validate-not-after="true" />

    <!-- Rate limit per registered TPP (Third Party Provider) application -->
    <rate-limit-by-key
      calls="500"
      renewal-period="60"
      counter-key="@(context.Request.Certificate?.Thumbprint ?? context.Request.IpAddress)"
      increment-condition="@(context.Response.StatusCode < 500)" />

    <!-- Generate correlation ID for audit trail (regulatory requirement) -->
    <set-header name="X-Correlation-Id" exists-action="skip">
      <value>@(Guid.NewGuid().ToString())</value>
    </set-header>

    <!-- Validate the OAuth2 token -->
    <validate-jwt header-name="Authorization"
                  failed-validation-httpcode="401"
                  failed-validation-error-message="Invalid or expired access token">
      <openid-config url="https://login.hsbc.com/.well-known/openid-configuration" />
      <required-claims>
        <claim name="scope" match="any">
          <value>accounts:read</value>
          <value>payments:write</value>
        </claim>
      </required-claims>
    </validate-jwt>

  </inbound>

  <!-- ② BACKEND — the actual call to the backend service -->
  <backend>
    <retry condition="@(context.Response.StatusCode >= 500)"
           count="3"
           interval="2">
      <forward-request />
    </retry>
  </backend>

  <!-- ③ OUTBOUND — runs on the backend response -->
  <outbound>

    <!-- Remove internal account codes — clients see IBAN only -->
    <json-to-xml apply="always" consider-accept-header="false" />
    <xsl-transform>
      <xsl:stylesheet version="1.0">
        <xsl:template match="@internalAccountCode|@legacySortCode" />
      </xsl:stylesheet>
    </xsl-transform>
    <xml-to-json kind="direct" apply="always" consider-accept-header="false" />

    <!-- Add mandatory PSD2 regulatory headers -->
    <set-header name="X-Request-Id" exists-action="override">
      <value>@(context.RequestId)</value>
    </set-header>
    <set-header name="X-Correlation-Id" exists-action="override">
      <value>@(context.Request.Headers.GetValueOrDefault("X-Correlation-Id", ""))</value>
    </set-header>

  </outbound>

  <!-- ④ ERROR — runs when backend throws an exception -->
  <on-error>
    <!-- Never return raw backend errors to external clients -->
    <return-response>
      <set-status code="500" reason="Internal Server Error" />
      <set-body>@{
        return new JObject(
          new JProperty("error", "An unexpected error occurred"),
          new JProperty("correlationId", context.Request.Headers.GetValueOrDefault("X-Correlation-Id", ""))
        ).ToString();
      }</set-body>
    </return-response>
  </on-error>

</policies>

This handles mTLS, rate limiting, JWT validation, retry, response transformation, regulatory headers, and error sanitisation — all without any backend code change.

EXAMPLE 10: MyBCAT — The Complete API Gateway Setup

Putting it all together for the actual healthcare platform from the job description:

Request Flow: Agent Books an Appointment

Sarah (scheduling agent, Practice A) opens the MyBCAT dashboard
and books a patient appointment for Thursday 10am.

1. React app calls:
   POST https://api.mybcat.com/appointments
   Headers:
     Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...  (Cognito JWT)
     Content-Type: application/json
     X-Idempotency-Key: client-generated-uuid       (prevents double-submit)
   Body: { "patientId": "pat_789", "slotId": "slot_20260424_1000", "type": "comprehensive_exam" }

2. CloudFront receives request:
   - Is this origin allowed? → app.mybcat.com → yes
   - WAF check: SQL injection in body? → no
   - WAF check: IP rate limit exceeded? → no
   - Forwards to API Gateway

3. API Gateway HTTP API:
   - Route match: POST /appointments → yes, exists
   - JWT validation: decode token, check Cognito signature → valid
   - Token expiry check: exp = 1745003600, now = 1745000000 → not expired
   - Extracts authorizer context:
     { practice_id: "practice_p001", role: "scheduling_agent", user_id: "user_sarah" }
   - Invokes book-appointment Lambda

4. book-appointment Lambda:
   a. Read idempotency key from header
   b. Check DynamoDB: has this idempotency key been processed?
      → No → proceed
   c. Check slot availability (ConsistentRead=True):
      slot.status = "available" → proceed
   d. DynamoDB TransactWriteItems (atomic):
      - Update slot: status "available" → "booked"
        WITH ConditionExpression: status = "available" (prevents race condition)
      - Create appointment record
      - Write idempotency key with TTL=24h (prevents retry creating duplicate)
   e. Publish to EventBridge:
      { type: "APPOINTMENT_BOOKED", practiceId: "practice_p001",
        patientId: "pat_789", slotId: "slot_20260424_1000" }
   f. Return:
      HTTP 201 Created
      { "appointmentId": "appt_abc123", "status": "confirmed",
        "date": "2026-04-24", "time": "10:00", "type": "comprehensive_exam" }

5. EventBridge routes event to 3 SQS queues simultaneously:
   - PatientNotificationQueue → Lambda sends SMS confirmation to patient
   - CRMQueue → Lambda updates HubSpot with appointment details
   - AnalyticsQueue → Lambda increments practice's daily booking counter

6. React app receives 201:
   - React Query invalidates ["appointments", practiceId] cache
   - Calendar view refetches and shows the new appointment
   - Sarah sees the booking confirmed instantly

What Happens When Sarah Accidentally Clicks "Book" Twice

First click (t=0ms):
  POST /appointments, X-Idempotency-Key: "key_abc123"
  Lambda checks DynamoDB: key_abc123 not found
  DynamoDB transaction succeeds: slot booked
  DynamoDB writes: { idempotency_key: "key_abc123", result: {...}, ttl: now+24h }
  Response: 201 Created { appointmentId: "appt_abc123" }

Second click (t=200ms — double-click):
  POST /appointments, X-Idempotency-Key: "key_abc123"  (same key — same request)
  Lambda checks DynamoDB: key_abc123 FOUND
  Returns cached result immediately: 201 Created { appointmentId: "appt_abc123" }
  No DynamoDB transaction — no second booking
  
Result: one appointment, not two
Cost: one DynamoDB read for the duplicate
Sarah sees: "Appointment confirmed" — same result, no error

The Mental Model — What Every Example Has in Common

Looking across all ten examples, the same principles appear every time:

1. VALIDATE EARLY, FAIL FAST
   Bad tokens, bad signatures, bad schemas → reject at the gateway
   Never let invalid requests consume Lambda compute or database capacity
   (Stripe signature check, JWT validation, schema validation)

2. RETURN IMMEDIATELY FOR LONG OPERATIONS
   API Gateway timeout is 29 seconds
   Anything longer → 202 Accepted + async processing + polling endpoint
   (Twilio bulk SMS, any report generation, any batch operation)

3. PROTECT THE CRITICAL PATH WITH RESERVED CONCURRENCY
   Your booking Lambda must always have headroom
   Other Lambdas must not be able to starve it
   (Netflix auth, Uber location, healthcare booking)

4. ISOLATE FAILURES WITH QUEUES
   Fan-out via SNS → SQS means one consumer's failure is contained
   A slow CRM integration cannot delay a patient booking confirmation
   (MyBCAT post-booking, NHS patient events, Deliveroo dispatch)

5. NEVER EXPOSE INTERNAL ERRORS
   Stack traces, table names, column names → attacker intelligence
   Sanitise all error responses at the gateway or Lambda boundary
   Always include a correlationId so you can find the real error in CloudWatch
   (Healthcare HIPAA, Open Banking PSD2, every public API)

6. CACHE AGGRESSIVELY, INVALIDATE PRECISELY
   Authoriser cache: 300s — saves millions in Lambda invocations
   Response cache: minutes for stable data
   React Query: invalidate specific queryKeys on mutation, not the whole cache
   (Netflix auth, insurance plans, GitHub repo data)

These are not AWS or Azure specifics. They are the principles behind every well-designed API Gateway, regardless of the cloud provider.

Why Real Examples Matter

EXAMPLE 1: Stripe — Webhook Delivery at Scale

The Problem

The Flow

What You Learn From This

EXAMPLE 2: Uber — Real-Time Driver Location via WebSocket

The Problem

The Flow

Full Code — WebSocket Location Push

What Breaks Without GoneException Handling

EXAMPLE 3: Netflix — Authoriser Caching Saves Millions of Dollars

The Problem

The Solution: Authoriser Cache TTL = 300 Seconds

EXAMPLE 4: GitHub — API Versioning That Does Not Break Integrations

The Problem

Their Versioning Strategy

How You Implement This with API Gateway

EXAMPLE 5: Slack — Rate Limiting That Protects Fair Use

The Problem

Their Rate Limiting Design

How You Build This With API Gateway

EXAMPLE 6: Healthcare Platform — HIPAA-Compliant API Gateway

The Problem

The Architecture

What sanitise_patient Looks Like

The CloudWatch Alarm That Catches Attacks

EXAMPLE 7: E-Commerce — CORS in Production

The Problem

The Exact Error (What Appears in the Browser Console)

The Wrong Fix and The Right Fix

The Preflight in Detail

EXAMPLE 8: Twilio — Async API Pattern for Long Operations

The Problem

The Pattern: Accept → Process Async → Poll

The Code

EXAMPLE 9: Azure API Management — Policies in Production

Real Scenario: A Bank's Open Banking API

EXAMPLE 10: MyBCAT — The Complete API Gateway Setup

Request Flow: Agent Books an Appointment

What Happens When Sarah Accidentally Clicks "Book" Twice

The Mental Model — What Every Example Has in Common

WebSocket & Real-Time Knowledge Check

Enjoyed this article?

Leave a comment

What `sanitise_patient` Looks Like