How would you design a global image CDN with regional replication?

A global image CDN delivers pictures from edge locations close to users while authoritative copies live in multiple regions. The core goal is simple. Keep hot objects near users, fetch from the nearest healthy region on cache miss, and control freshness with versioned URLs. This lowers latency, smooths traffic spikes, and keeps origin load and egress spend under control during real world peaks.

Why It Matters

Image heavy products win or lose on time to first pixel. Regional replication trims cross ocean hops and reduces tail latency. It also improves availability because a single region outage does not take media offline. In a system design interview this topic tests knowledge of cache keys, origin shield, replication lag, TTL policy, failure handling, and cost modeling. In production the payoff is higher conversion, better retention, and healthier margins.

How It Works Step by Step

1. URL schema with immutability

Encode identity and immutability into the path app media id version format width height
Use version bumps for edits so old versions remain cacheable
Add a short lived token for access control and hotlink prevention when needed

2. Upload and write path

Clients upload to the closest edge or to a regional write endpoint using signed requests
The write region stores the original in durable object storage and records metadata in a small catalog keyed by content id
If the product needs variants or thumbnails, kick transforms to workers via a queue and produce AVIF, WebP, and JPEG in common sizes

3. Regional replication strategy

Enable cross region replication from the write region to read regions using storage level replication rules
Treat replication as eventual and design read logic that can fall back to the write region
Use object metadata or a compact manifest so any region can validate the latest version and variants

4. Edge routing and cache behavior

Use anycast DNS so users land on the nearest CDN edge
Compute a cache key with resource id, version, requested size, and requested format
On a miss, fetch from an origin shield near the chosen read region to coalesce requests and protect buckets

5. Nearest healthy region selection

Prefer the closest region that has the object
If the replica is missing, fall back to the write region and cache the response at the shield
Maintain a small health feed that marks regions as degraded and route away when needed

6. Freshness and invalidation

Choose long TTL for immutable versions and rely on version bumps for updates
For rare critical fixes where the URL cannot change, push an explicit purge to the CDN and shield
Support serve stale while revalidate so users see fast responses while freshness is restored in the background

7. Format and size negotiation

Use accept header hints to return AVIF or WebP when the client supports them
If generating on demand, cache the derivative at the shield and optionally persist it in the regional bucket to avoid repeated compute

8. Access control and privacy

For private images use short lived signed URLs
Keep secrets out of the path and place the signature in the query string
For public assets do not let the signature reduce cache hit ratio by leaking into the cache key

9. Observability and safety

Track edge and shield hit ratio, replication lag per region, origin read QPS, and tail latency
Define SLOs for p95 latency per continent and overall success rate
Run automated drills that simulate a read region loss and confirm expected failover behavior

10. Cost model and guardrails

Most spend is bandwidth egress and origin operations on misses
A high shield hit ratio plus smart size grids reduce miss traffic
Cap the number of on demand sizes and garbage collect cold variants to control footprint

Real World Example

Consider a social photo app with users in North America, Europe, and South Asia. The write region processes uploads and moderation. Storage replication copies the object to regional buckets in Europe and Asia. A user in Paris requests a profile picture. The request hits a nearby edge and on a miss the edge asks the EU shield. If the EU bucket has the object the shield serves and caches it. If replication has not completed the shield falls back to the write region, caches the response, and subsequent EU requests become hot. If the EU read region is marked unhealthy routing switches to the next closest region. Versioned URLs avoid broad purges. Dashboards show hit ratio and replication lag so the team can tune TTL and shield size.

Common Pitfalls or Trade offs

Over purging reduces hit ratio and wastes egress. Prefer immutable version bumps
Cache stampede during launches requires an origin shield with request coalescing
Too many dynamic sizes fragment the cache. Use a small grid of canonical widths
Replication lag is normal. Always support fallback and export lag metrics
Blindly serving AVIF can regress some devices. Feature detect and fall back to JPEG
Signed URL mishandling can kill hit ratio for public assets. Separate public and private paths

Interview Tip

When asked for a concrete plan, state the exact cache key fields, a TTL policy for immutable versions, and a failover path. Add simple math. For example, with a shield hit ratio of seventy percent and an edge hit ratio of ninety percent, only three percent of total requests reach origin, which lowers origin QPS by a factor near thirty three and stabilizes cost during spikes.

Key Takeaways

Regional replication cuts cross ocean latency and improves availability
Versioned URLs are the most reliable invalidation strategy for images
An origin shield with request coalescing prevents thundering herds
Watch hit ratio, replication lag, and tail latency to tune capacity and spend
Keep a tight grid of sizes and formats for efficient caches

Table of Comparison

Approach	Latency	Consistency	Operational complexity	Cost profile	Best fit
CDN with single origin region	Good near origin, weak across oceans	Strong by default	Low	Lower storage, higher egress on misses	Small apps or early stage
CDN with regional replication	Consistently low across continents	Eventual for new objects	Medium	Balanced with strong cache savings	Global consumer apps and marketplaces
Multi CDN with shared replicated origins	Very low in more networks	Eventual for new objects	High due to multi vendor ops	Higher base, potential egress reduction via better peering	Very large scale or strict SLA needs
Peer to peer or device side caching	Very low within clustered users	Weak and complex	High	Unpredictable	Special cases like live events

FAQs

Q1. What is the difference between a CDN cache and a regional replica?

A CDN cache is an ephemeral copy at the edge or at a shield. A regional replica is a durable copy in object storage inside a specific cloud region.

Q2. How do I invalidate images safely at scale?

Use versioned URLs so updates create a new immutable path. Reserve purges for emergencies or very short lived dynamic assets.

Q3. Should I generate image variants on demand or ahead of time?

If the size grid is small and predictable, pre generate. If device diversity is large, generate on demand, cache at the shield, and persist hot sizes to avoid repeated compute.

Q4. How do signed URLs interact with caching?

For public images avoid including the signature in the cache key. For private images include the signature so only authorized viewers can access the content.

Q5. What TTL should I choose for images?

Choose long TTL for immutable versions, often days or weeks. For avatars or admin sensitive assets consider shorter TTL plus serve stale while revalidate.

Q6. How do I handle replication lag gracefully?

Expose lag per region, prefer the nearest healthy region, and fall back to the write region when a replica is missing. Cache the fallback response at the shield to bridge the gap.

Further Learning

To master CDN design, caching layers, and replication strategies, continue with these expert courses from DesignGurus.io:

Grokking the System Design Interview: Learn how to design scalable and fault-tolerant systems step by step. This course includes deep dives into caching, load balancing, data replication, and real interview-style design problems.
Grokking Scalable Systems for Interviews: Explore large-scale architecture patterns, distributed storage, and real-world design trade-offs for global systems like CDNs, databases, and messaging pipelines. Perfect for engineers aiming for FAANG-level interviews.