System Design Interview
Design a Photo Sharing App (Instagram-Style)
Users upload photos, others comment and react — how do you store, sort, and serve it all at scale?
The Interview Question
"Design a photo sharing application where users can upload photos, others can comment on those photos, and comments can be sorted by newest, oldest, or most liked."
This question tests whether you can think through storage for two very different data types (binary blobs and structured text), design a feed that scales with follower counts, and make a real database choice — not just say "use a database."
Step 1: Define the Requirements
Before drawing anything, clarify scope. Interviewers reward candidates who ask before they assume.
Functional requirements
- Users can upload photos (JPEG/PNG, up to 10MB)
- Users can follow other users
- Home feed shows photos from people you follow, newest first
- Anyone can comment on any photo
- Comments can be sorted: newest, oldest, most liked
- Users can like photos and comments
Non-functional requirements
- 100 million daily active users
- 10 million photo uploads per day (~115 per second)
- 500 million feed reads per day (read-heavy: 50:1 read/write ratio)
- Photos must be available within 2 seconds of upload confirmation
- Comment sort must return in under 200ms
Step 2: High-Level Architecture
┌─────────────────────────────────────┐
│ Client (mobile/web) │
└──────────────┬──────────────────────┘
│
┌──────────────▼──────────────────────┐
│ API Gateway / CDN │
│ (routes, auth, rate limiting) │
└──┬──────────┬──────────┬────────────┘
│ │ │
┌────────────▼─┐ ┌─────▼──────┐ ┌▼─────────────┐
│ Upload │ │ Feed │ │ Comment │
│ Service │ │ Service │ │ Service │
└──────┬───────┘ └─────┬──────┘ └──────┬───────┘
│ │ │
┌──────────▼───┐ ┌───────▼──────┐ ┌────▼──────────┐
│ Blob Store │ │ Feed Cache │ │ Comment DB │
│ (S3/Azure) │ │ (Redis) │ │ (PostgreSQL) │
└──────────────┘ └───────┬──────┘ └───────────────┘
│
┌───────▼──────┐
│ Photo DB │
│ (PostgreSQL) │
└──────────────┘Step 3: Photo Storage — Why Not a Database
The wrong answer: "Store photos in the database as BLOBs."
Databases are optimised for structured data with transactions, indexes, and relational queries. Photos are binary blobs that need none of that. Storing them in a database:
- Balloons the database size (10M photos/day × 3MB average = 30TB/month)
- Slows down every backup
- Wastes expensive IOPS on reads the DB can't optimise
The right answer: Object storage (S3, Azure Blob Storage, Google Cloud Storage).
Upload flow:
Client → Upload Service → validate (type, size)
→ store in Blob Storage → get back a URL
→ write metadata to Photo DB (url, user_id, created_at)
→ push to CDN for global edge cachingThe database stores metadata about the photo — who uploaded it, when, the URL, like count. The photo bytes live in object storage.
Cost note: S3 Standard is ~$0.023/GB. At 30TB/month that's $690/month just for storage. Move photos older than 30 days to S3 Glacier Instant Retrieval (~$0.004/GB) for an 83% cost reduction on cold content.
Step 4: The Database Schema
photos
──────────────────────────────────────────
id UUID PRIMARY KEY
user_id UUID NOT NULL (FK → users)
storage_url TEXT NOT NULL
caption TEXT
like_count INT DEFAULT 0
comment_count INT DEFAULT 0
created_at TIMESTAMPTZ
comments
──────────────────────────────────────────
id UUID PRIMARY KEY
photo_id UUID NOT NULL (FK → photos)
user_id UUID NOT NULL (FK → users)
body TEXT NOT NULL
like_count INT DEFAULT 0
created_at TIMESTAMPTZ
follows
──────────────────────────────────────────
follower_id UUID NOT NULL
followee_id UUID NOT NULL
created_at TIMESTAMPTZ
PRIMARY KEY (follower_id, followee_id)Why PostgreSQL here?
Photos, comments, users, and follows are all relational — they join, they have foreign key constraints, and they need ACID transactions (you don't want a photo row to exist without an associated user). PostgreSQL handles this well up to hundreds of millions of rows with proper indexing.
Step 5: Comment Sorting — The Interesting Problem
This is the part interviewers actually want to hear about.
Sort by newest / oldest: trivial. ORDER BY created_at DESC with an index on (photo_id, created_at). Fast.
Sort by most liked: harder.
Option A — sort in the database: ORDER BY like_count DESC
Index: (photo_id, like_count DESC)
Query: SELECT * FROM comments WHERE photo_id = $1 ORDER BY like_count DESC LIMIT 20This works at small scale. At 500M comments and high write volume, like_count updates on every like create index churn — you're updating a sorted index on every like action, which is expensive under concurrent load.
Option B — pre-sort in cache
When a photo is fetched, the top comments by likes are pre-computed and stored in Redis:
Key: comments:top:{photo_id}
Value: [comment_id_1, comment_id_2, ...] (ordered list, TTL 5 min)
On like: increment Redis sorted set score for that comment_id
invalidate or update the top listRedis sorted sets (ZADD / ZRANGE) are O(log N) for inserts and O(log N + M) for range queries. For a photo with 10,000 comments, this is the right structure.
The trade-off: slight staleness (up to TTL) on like counts. Acceptable for a social app; not acceptable for a financial system. Say this explicitly in the interview.
Step 6: The Feed — Fan-Out on Write vs Fan-Out on Read
The feed problem is where many candidates lose marks.
Naive approach (fan-out on read):
Feed request → SELECT photo_id FROM follows WHERE follower_id = me
→ SELECT * FROM photos WHERE user_id IN (following_ids)
→ ORDER BY created_at DESC
→ LIMIT 20Works fine for 50 followees. For a user following 5,000 accounts, this query hits 5,000 rows in the follows table and joins against photos. Slow.
Fan-out on write (push model):
Upload photo → find all N followers
→ write photo_id into each follower's feed timeline cache
Feed request → read pre-built timeline from cache — O(1)Fast reads. But: a celebrity with 50M followers uploads a photo. You need to write to 50M Redis keys. This takes minutes and creates a "hot key" write storm.
Hybrid (what Instagram actually uses):
Regular users (< 10K followers) → fan-out on write
(writes are cheap, reads are instant)
Celebrities (> 10K followers) → fan-out on read
(read their posts lazily at feed time)
Feed assembly:
pre-built timeline (from write fan-out)
+ lazy merge of followed celebrities' recent posts
= final ranked feedThe threshold between "regular" and "celebrity" is configurable. This is the answer that impresses interviewers.
Step 7: Cost Control
| Component | Naive | Optimised | Saving |
|-----------|-------|-----------|--------|
| Photo storage | S3 Standard for all | Glacier after 30 days | 83% on cold storage |
| CDN | Origin every request | Cache at edge, 24h TTL | 95% origin offload |
| Feed reads | DB query per request | Redis timeline cache | 99% DB read reduction |
| Comment counts | DB query | Denormalised counter in photos.comment_count | Eliminates COUNT(*) queries |
The CDN point is critical. Once a photo is cached at the CDN edge, subsequent reads cost ~$0.009/GB (CDN egress) vs $0.09/GB (origin egress) — 10x cheaper, plus lower latency.
What the Interviewer Is Actually Testing
- Do you separate blob storage from structured storage without being prompted?
- Do you understand the fan-out problem and propose the hybrid approach?
- Can you explain the comment sort trade-off (DB index vs Redis sorted set) and choose based on scale?
- Do you think about cost, not just correctness?
- Do you clarify requirements before designing?
Getting every detail right matters less than demonstrating structured thinking across the full problem.
Related Case Studies
Go Deeper
Case studies teach the "what". Our courses teach the "how" — the patterns behind these decisions, built up from first principles.
Explore Courses