Back to blog
Backend Systemsintermediate

DynamoDB: Single-Table Design, Access Patterns & Queries

Master DynamoDB data modeling β€” partition keys, sort keys, GSIs, single-table design, query patterns, condition expressions, streams, and Python SDK patterns for production serverless apps.

LearnixoApril 16, 20266 min read
DynamoDBAWSNoSQLDatabaseServerlessPythonData Modeling
Share:𝕏

Why DynamoDB?

DynamoDB is AWS's fully managed NoSQL database. Unlike SQL databases, it scales to millions of requests per second with single-digit millisecond latency β€” and you pay only for what you use. It's the backbone of most serverless architectures on AWS.

Key characteristics:

  • Schemaless β€” only primary key attributes are required
  • Key-value + document β€” items can be simple {id, name} or deeply nested JSON
  • Provisioned or on-demand capacity β€” on-demand auto-scales with traffic
  • DynamoDB Streams β€” CDC event stream for every write (great for real-time pipelines)

Core Concepts

Primary Key

Every item in DynamoDB is identified by its primary key. Two options:

Simple primary key β€” just a partition key (PK):

PK: "CLINIC#CLN-001"  β†’  {name: "Sunrise Eye Care", state: "CA"}

Composite primary key β€” partition key (PK) + sort key (SK):

PK: "CLINIC#CLN-001",  SK: "APPT#2026-04-16#APT-789"

The combination of PK + SK must be unique per item. You can have many items with the same PK but different SKs β€” this is how you store "one-to-many" relationships in a single table.

Capacity Units

  • Read Capacity Unit (RCU): 1 strongly consistent read of up to 4 KB
  • Write Capacity Unit (WCU): 1 write of up to 1 KB

Use on-demand billing for variable workloads (you pay per request). Use provisioned with auto-scaling for predictable traffic.


Single-Table Design

In relational databases you normalize data into separate tables and JOIN them. DynamoDB has no JOINs β€” fetching related data in multiple round-trips is expensive. The solution: store everything in one table, design your keys so a single Query returns all data for a use case.

Naming Conventions

Use generic key names (PK, SK) and encode the entity type in the value:

PK               SK                     Type
──────────────── ───────────────────── ──────────
CLINIC#CLN-001   META                   Clinic
CLINIC#CLN-001   STAFF#USR-123          Staff member
CLINIC#CLN-001   APPT#2026-04-16#APT-1  Appointment
PATIENT#PAT-456  META                   Patient
PATIENT#PAT-456  APPT#2026-04-16#APT-1  Patient→Appt link

A single Query(PK="CLINIC#CLN-001") returns the clinic + all its staff + all its appointments.

Defining Access Patterns First

Before writing a single line of code, list every query your application needs:

| Access Pattern | Key Condition | |---------------|---------------| | Get clinic by ID | PK = CLINIC#, SK = META | | Get all appointments for a clinic | PK = CLINIC#, SK begins_with APPT# | | Get appointments for a date | PK = CLINIC#, SK begins_with APPT#2026-04-16 | | Get patient by ID | PK = PATIENT#, SK = META | | Get all appointments for a patient | PK = PATIENT#, SK begins_with APPT# |

Design your keys to satisfy these patterns.


Python SDK (boto3)

Setup

Python
import boto3
import os
from boto3.dynamodb.conditions import Key, Attr
from decimal import Decimal

dynamodb = boto3.resource("dynamodb", region_name=os.environ["AWS_REGION"])
table = dynamodb.Table(os.environ["TABLE_NAME"])

PutItem

Python
def create_appointment(appt: dict) -> None:
    table.put_item(
        Item={
            "PK": f"CLINIC#{appt['clinic_id']}",
            "SK": f"APPT#{appt['date']}#{appt['id']}",
            "GSI1PK": f"PATIENT#{appt['patient_id']}",
            "GSI1SK": f"APPT#{appt['date']}#{appt['id']}",
            "type": "APPOINTMENT",
            "id": appt["id"],
            "clinic_id": appt["clinic_id"],
            "patient_id": appt["patient_id"],
            "datetime": appt["datetime"],
            "status": "scheduled",
            "created_at": appt["created_at"],
        },
        ConditionExpression="attribute_not_exists(PK)"  # prevent overwrites
    )

Query β€” all appointments for a clinic

Python
def get_clinic_appointments(clinic_id: str, date_prefix: str = None) -> list:
    key_condition = Key("PK").eq(f"CLINIC#{clinic_id}")
    
    if date_prefix:
        key_condition &= Key("SK").begins_with(f"APPT#{date_prefix}")
    else:
        key_condition &= Key("SK").begins_with("APPT#")

    response = table.query(KeyConditionExpression=key_condition)
    return response["Items"]

# Get all appointments for clinic today
appointments = get_clinic_appointments("CLN-001", "2026-04-16")

Query with pagination

DynamoDB returns up to 1MB per call. Use LastEvaluatedKey to paginate:

Python
def get_all_clinic_appointments(clinic_id: str) -> list:
    items = []
    kwargs = {
        "KeyConditionExpression": Key("PK").eq(f"CLINIC#{clinic_id}") 
                                  & Key("SK").begins_with("APPT#")
    }
    
    while True:
        response = table.query(**kwargs)
        items.extend(response["Items"])
        
        if "LastEvaluatedKey" not in response:
            break
        kwargs["ExclusiveStartKey"] = response["LastEvaluatedKey"]
    
    return items

UpdateItem β€” conditional update

Python
def update_appointment_status(clinic_id: str, appt_id: str, 
                               date: str, new_status: str) -> None:
    table.update_item(
        Key={
            "PK": f"CLINIC#{clinic_id}",
            "SK": f"APPT#{date}#{appt_id}"
        },
        UpdateExpression="SET #status = :status, updated_at = :ts",
        ConditionExpression="#status = :current_status",
        ExpressionAttributeNames={"#status": "status"},  # 'status' is a reserved word
        ExpressionAttributeValues={
            ":status": new_status,
            ":ts": datetime.utcnow().isoformat(),
            ":current_status": "scheduled"  # only update if still scheduled
        }
    )

DeleteItem

Python
def cancel_appointment(clinic_id: str, appt_id: str, date: str) -> None:
    table.delete_item(
        Key={
            "PK": f"CLINIC#{clinic_id}",
            "SK": f"APPT#{date}#{appt_id}"
        },
        ConditionExpression="attribute_exists(PK)"
    )

Global Secondary Indexes (GSI)

Your primary key satisfies some access patterns, but you'll often need to query by a different attribute. That's what GSIs are for.

Use case: Get all appointments for a patient (across all clinics)

The main table has PK = CLINIC#{id}. A patient query needs PK = PATIENT#{id}. Define a GSI:

HCL
resource "aws_dynamodb_table" "main" {
  name         = "learnixo-${var.env}"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "PK"
  range_key    = "SK"

  attribute { name = "PK";     type = "S" }
  attribute { name = "SK";     type = "S" }
  attribute { name = "GSI1PK"; type = "S" }
  attribute { name = "GSI1SK"; type = "S" }

  global_secondary_index {
    name            = "GSI1"
    hash_key        = "GSI1PK"
    range_key       = "GSI1SK"
    projection_type = "ALL"
  }
}

Query the GSI in Python:

Python
def get_patient_appointments(patient_id: str) -> list:
    response = table.query(
        IndexName="GSI1",
        KeyConditionExpression=Key("GSI1PK").eq(f"PATIENT#{patient_id}")
                               & Key("GSI1SK").begins_with("APPT#")
    )
    return response["Items"]

Handling Decimal Types

DynamoDB returns numbers as Decimal objects in Python. Convert for JSON serialization:

Python
import json
from decimal import Decimal

def decimal_default(obj):
    if isinstance(obj, Decimal):
        return int(obj) if obj % 1 == 0 else float(obj)
    raise TypeError(f"Object of type {type(obj)} is not JSON serializable")

# Usage
body = json.dumps(items, default=decimal_default)

DynamoDB Streams

Enable streams to react to every table change β€” great for real-time analytics, search indexing, or triggering downstream workflows:

HCL
resource "aws_dynamodb_table" "main" {
  ...
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"
}

resource "aws_lambda_event_source_mapping" "stream_processor" {
  event_source_arn  = aws_dynamodb_table.main.stream_arn
  function_name     = aws_lambda_function.stream_processor.arn
  starting_position = "LATEST"
  batch_size        = 100
  bisect_batch_on_function_error = true
}

Lambda receives batches of change records:

Python
def handler(event, context):
    for record in event["Records"]:
        event_name = record["eventName"]  # INSERT, MODIFY, REMOVE
        
        if event_name == "INSERT":
            new_item = record["dynamodb"]["NewImage"]
            item_type = new_item.get("type", {}).get("S")
            
            if item_type == "APPOINTMENT":
                appt_id = new_item["id"]["S"]
                notify_new_appointment(appt_id)
        
        elif event_name == "MODIFY":
            old = record["dynamodb"]["OldImage"]
            new = record["dynamodb"]["NewImage"]
            
            old_status = old.get("status", {}).get("S")
            new_status = new.get("status", {}).get("S")
            
            if old_status != new_status:
                handle_status_change(new["id"]["S"], old_status, new_status)

Common Mistakes

| Mistake | Fix | |---------|-----| | Hot partition key | Use high-cardinality PK β€” avoid sequential IDs or status values | | Scan instead of Query | Always Query with a key condition β€” Scan reads the entire table | | Forgetting reserved words | Use ExpressionAttributeNames for words like name, status, type | | Missing GSI for access pattern | Model all access patterns before building β€” adding GSIs later requires time | | Storing large blobs in items | Max item size is 400 KB β€” store files in S3, store the S3 key in DynamoDB | | Not handling pagination | Always loop on LastEvaluatedKey β€” DynamoDB caps results at 1 MB |

Enjoyed this article?

Explore the Backend Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.