Optimizing content delivery network choices for global users

A Content Delivery Network (CDN) is a globally distributed network of edge servers that cache and serve content from locations physically close to users—reducing latency from 150–200ms (cross-continental origin fetch) to under 5ms (local edge hit). YouTube, Netflix, and Amazon Prime collectively generate roughly 80% of internet traffic, and CDNs are the infrastructure that makes this possible without collapsing the internet backbone. In system design interviews, CDNs appear in nearly every design that serves global users: "Design YouTube," "Design Instagram," "Design a news feed." Interviewers evaluate whether you understand how CDNs work internally (two-tier cache hierarchy, anycast routing, cache invalidation), when to use push versus pull strategies, and how to choose between providers based on requirements. Simply saying "put a CDN in front of it" is insufficient—you need to explain what content to cache, how to set TTLs, how to invalidate stale content, and what the cost implications are.

Key Takeaways

  • A CDN reduces latency by serving content from edge servers close to users instead of a distant origin server. A user in Tokyo fetching a cached image from a Tokyo PoP experiences under 5ms latency versus 150–200ms from a Virginia origin.
  • CDNs use a two-tier cache hierarchy: edge PoPs (100–300 cities worldwide) serve users directly; regional mid-tier caches sit between edge and origin, absorbing cache misses before they reach the origin server.
  • Three cache invalidation strategies exist: TTL-based expiration (simplest), explicit purge via API (fastest for urgent updates), and versioned URLs (safest—URL changes automatically bypass stale cache). Production systems use all three.
  • CDN provider selection depends on your requirements: CloudFront for deep AWS integration (600+ edge locations), Cloudflare for security and DDoS protection (anycast across 330+ cities), Fastly for real-time purging and edge compute (sub-150ms global purge propagation).
  • CDN costs are primarily egress-based (0.01–0.085/GB depending on region and volume). In interviews, mention cost optimization: compression (Brotli/gzip), cache hit ratio targets (95%+), and tiered caching to reduce origin fetches.

How CDNs Work: The Architecture

Request Flow

A user requests an image from your application. DNS resolves the domain to the CDN (via CNAME record). The CDN uses BGP anycast routing to direct the user to the topologically nearest edge PoP. The edge server checks its local cache. On a cache hit, the content is returned immediately (under 5ms). On a cache miss, the edge queries its regional mid-tier cache. If the mid-tier has it, it returns the content and the edge caches it. If neither has it, the mid-tier fetches from the origin server, caches the response, and propagates it back through the edge to the user.

Two-Tier Cache Hierarchy

Edge PoPs (Tier 1): Deployed in 100–300+ cities worldwide. Each PoP contains multiple cache servers. They are the first cache layer users hit. Optimized for the lowest possible latency.

Regional mid-tier caches (Tier 2): Sit between edge PoPs and the origin server. When an edge cache misses, the request goes to the regional cache instead of directly to the origin. This absorbs the majority of cache misses that would otherwise overwhelm the origin during traffic spikes or cache cold starts.

Why two tiers matter in interviews: Without mid-tier caching, every cache miss at every edge PoP hits the origin directly. During a cache-busting deployment or viral content event, hundreds of edge PoPs simultaneously miss and slam the origin—a "thundering herd" that can crash it. The mid-tier absorbs this by serving as a shared cache for all edge PoPs in a region.

Routing: Anycast vs DNS-Based

BGP anycast (modern standard): The CDN announces the same IP address from every PoP. Internet routing protocols (BGP) automatically direct each user to the topologically nearest PoP. This requires no DNS-level geo-routing and provides inherent DDoS resilience—attack traffic is absorbed across all PoPs rather than overwhelming a single IP. Cloudflare uses anycast for all traffic.

DNS-based geo-routing (legacy approach): The CDN's DNS server resolves different IP addresses based on the user's geographic location. Simpler but less precise than anycast—DNS caching can direct users to suboptimal PoPs.

Caching Strategies

What to Cache

Static content (always cache): Images, CSS, JavaScript bundles, fonts, video segments. These change infrequently and benefit enormously from edge caching. Set long TTLs (24 hours to 1 year) with versioned filenames for cache busting.

Semi-static content (cache with short TTL): API responses that change periodically (product catalogs, news feeds). Cache for 1–60 seconds at the edge. Even a 5-second cache reduces origin load by 90% at high QPS.

Dynamic content (do not cache by default): Personalized responses, authentication endpoints, real-time data. These vary per user and per request. However, modern CDNs can accelerate dynamic content through persistent origin connections and optimized routing—reducing latency even without caching.

Cache Control Headers

HTTP Cache-Control headers are how your origin tells the CDN what to cache and for how long.

HeaderMeaningExample
max-age=3600Cache for 1 hourStatic assets
s-maxage=60CDN caches for 60 seconds (overrides max-age for CDN)Semi-static API responses
no-storeNever cacheAuthentication tokens, PII
privateBrowser can cache; CDN cannotUser-specific data
stale-while-revalidate=30Serve stale content for 30s while fetching fresh copySmooth transitions during updates

Interview application: "I would set Cache-Control: public, max-age=31536000 for versioned static assets (images, JS bundles with hash in filename)—cached for 1 year because the filename changes when content changes. For the product listing API, I would set s-maxage=10 to cache at the CDN edge for 10 seconds—reducing origin load by 90% while ensuring near-real-time freshness."

Cache Invalidation

Invalidating stale content is the hardest operational challenge in CDN management.

TTL-based expiration: Content expires after its TTL and the next request fetches a fresh copy. Simplest approach but you cannot force immediate updates—users see stale content until TTL expires.

Explicit purge via API: Send a purge request to the CDN for specific URLs or cache tags. Cloudflare and Fastly propagate purges globally in under 150ms. Use for urgent updates (price changes, content takedowns, security patches).

Tag-based purge (surrogate keys): Attach cache tags to responses. Purge by tag to invalidate all URLs sharing that tag with a single API call. Example: tag all product pages with "product-123." When product 123 changes, purge the tag—every cached page referencing that product is invalidated simultaneously.

Versioned URLs (safest): Embed a content hash or version in the URL: /assets/app.a1b2c3.js. When content changes, the URL changes. The old URL naturally becomes irrelevant because no one requests it. No purge needed. This is the recommended approach for static assets.

CDN Provider Comparison

ProviderEdge LocationsKey StrengthPurge SpeedPricing ModelBest For
CloudFront600+ PoPs, 100+ citiesDeep AWS integration (S3, Lambda@Edge)Seconds–minutesPay-per-use ($0.085/GB first 10TB)AWS-native architectures
Cloudflare330+ citiesSecurity (DDoS, WAF, Bot Management), anycast<150ms globalFree tier; Pro from $20/monthSecurity-first, startups
Fastly90+ PoPsReal-time purging (<150ms), edge compute (Compute@Edge)<150ms globalUsage-based ($0.12/GB first 10TB)Dynamic content, real-time purging
Akamai4,200+ PoPs, 130+ countriesLargest network, enterprise featuresMinutesEnterprise contractsEnterprise, media streaming

Interview application: "For an AWS-native architecture, I would use CloudFront because it integrates natively with S3 (no data transfer cost between S3 and CloudFront in the same region), supports Lambda@Edge for edge compute, and provides 600+ PoPs. If security is the primary concern—DDoS protection, bot management—I would use Cloudflare for its anycast-based DDoS absorption across 330+ cities. If we need sub-second cache purging for a content platform where freshness is critical, I would choose Fastly."

Edge Computing: Beyond Caching

Modern CDNs run custom code at edge locations, moving computation closer to users.

CloudFront Functions / Lambda@Edge: Run JavaScript or Python at CloudFront edge locations. Use cases: URL rewriting, A/B testing (route users to different origins based on cookies), header manipulation, and authentication token validation at the edge before the request reaches the origin.

Cloudflare Workers: Run JavaScript at every Cloudflare PoP (330+ cities). More capable than CloudFront Functions—supports full request/response transformation, KV storage at the edge, and even entire applications running without an origin server.

Fastly Compute@Edge: Run Wasm-compiled code at Fastly edge locations. Sub-millisecond cold starts. Supports Rust, Go, JavaScript compiled to WebAssembly.

Cost Optimization

CDN costs are primarily egress-based. Optimizing cache efficiency directly reduces your bill.

Target 95%+ cache hit ratio. Every cache miss costs egress from your origin plus CDN processing. Monitor cache hit ratio as a primary operational metric. Below 90% indicates misconfigured cache headers or insufficient TTLs.

Enable compression. Brotli compression reduces text-based assets (HTML, CSS, JS, JSON) by 15–25% more than gzip. All major CDNs support Brotli. This directly reduces egress bytes and cost.

Use tiered caching. Mid-tier caches reduce origin fetches. CloudFront Origin Shield adds a third cache tier that consolidates requests to your origin from all regions through a single cache layer.

Optimize image delivery. Serve WebP/AVIF formats to supported browsers. Resize images at the edge based on device viewport. A 4K image served to a mobile phone wastes bandwidth and money. Cloudflare and CloudFront both support automatic image optimization—converting formats, resizing, and compressing on the fly at the edge without pre-generating multiple versions.

Negotiate committed-use discounts. CloudFront offers Origin Shield ($0.0090/10,000 requests) that consolidates origin fetches through a single cache. At high volume, CloudFront's Security Savings Bundle provides up to 30% discount on CloudFront and WAF usage with a 1-year commitment. Cloudflare's free tier handles surprising traffic volumes—startups can scale to millions of requests before needing to upgrade.

For structured practice applying CDN architecture across complete system design problems, Grokking the System Design Interview covers CDN design as a core component in every global-scale solution. For advanced CDN patterns including multi-CDN strategies, edge compute architectures, and production-scale content delivery, Grokking the Advanced System Design Interview builds the depth required for L6+ interviews. The system design interview guide provides the broader framework for integrating CDN decisions into every interview phase.

Frequently Asked Questions

When should I include a CDN in my system design?

Any time the system serves content to global users: images, videos, static assets, API responses. CDNs reduce latency from 150–200ms (cross-continental) to under 5ms (local edge hit). If the interviewer mentions "global users" or "low latency," a CDN is expected in your design.

What is the difference between push and pull CDN?

Push CDN: content is uploaded to edge servers proactively before users request it. Used for large files and predictable content. Pull CDN: content is fetched from the origin on the first cache miss, then cached at the edge. Used for most web content. Pull is the default in interviews.

How does cache invalidation work in CDNs?

Three approaches: TTL-based expiration (automatic but delayed), explicit API purge (immediate but requires API call), and versioned URLs (safest—URL changes bypass stale cache). Production systems use all three: long TTLs for static assets with versioned filenames, short TTLs for API responses, and explicit purges for urgent updates.

Which CDN should I recommend in a system design interview?

CloudFront for AWS-native architectures (deep S3/Lambda integration, 600+ PoPs). Cloudflare for security-first requirements (DDoS protection, anycast, free tier). Fastly for real-time purging needs (sub-150ms global purge). Akamai for enterprise media streaming (4,200+ PoPs, largest network). Always explain why.

What is anycast routing?

The CDN announces the same IP address from every PoP globally. Internet routing (BGP) automatically directs each user to the topologically nearest PoP. This requires no DNS manipulation and provides inherent DDoS resilience—attack traffic is distributed across all PoPs instead of hitting one server.

How do I calculate CDN cost in a system design interview?

Estimate monthly egress: daily requests × average response size × 30 days. Apply provider pricing (CloudFront: ~$0.085/GB for first 10TB, decreasing with volume). Mention optimizations: compression (30–50% size reduction), cache hit ratio (95%+ target), and image optimization (WebP/AVIF). Show cost awareness without memorizing exact prices.

What is edge computing in the CDN context?

Running custom code at CDN edge locations instead of at your origin. Use cases: URL rewriting, A/B testing, authentication, image resizing, and header manipulation. CloudFront Functions, Cloudflare Workers, and Fastly Compute@Edge support this. Mentioning edge compute in interviews signals awareness of modern CDN capabilities.

What is a two-tier cache hierarchy?

Edge PoPs (tier 1) serve users directly. Regional mid-tier caches (tier 2) sit between edge and origin, absorbing cache misses. Without the mid-tier, every edge cache miss hits the origin—creating thundering herd problems during deployments or viral events. The mid-tier consolidates requests to protect the origin.

How do I set cache headers for a CDN?

Use Cache-Control headers: max-age for browser cache duration, s-maxage for CDN cache duration (overrides max-age at the CDN), no-store for content that must never be cached, and stale-while-revalidate for serving stale content briefly while fetching a fresh copy. Always set explicit headers—default behavior varies by CDN.

Should I use a multi-CDN strategy?

For systems requiring the highest availability and global performance, yes. Route traffic to different CDNs per region (CloudFront for Americas, Cloudflare for EMEA). Use DNS-level failover between CDNs. The trade-off: operational complexity doubles. For most interview scenarios, a single CDN is sufficient—mention multi-CDN as a scaling option.

TL;DR

A CDN serves content from edge servers close to users, reducing latency from 150–200ms to under 5ms. The architecture uses a two-tier cache hierarchy: edge PoPs (100–300+ cities) serve users, mid-tier regional caches absorb misses before they reach the origin. BGP anycast routing directs users to the nearest PoP automatically. Cache what content to cache: static assets with long TTLs and versioned filenames, semi-static API responses with short TTLs (5–60 seconds), and never cache personalized or authenticated content. Three invalidation strategies: TTL expiration (simplest), API purge (fastest for urgent updates, sub-150ms on Cloudflare/Fastly), and versioned URLs (safest for static assets). Choose CloudFront for AWS integration (600+ PoPs), Cloudflare for security and DDoS protection (330+ cities, anycast), Fastly for real-time purging and edge compute, or Akamai for enterprise streaming (4,200+ PoPs). Target 95%+ cache hit ratio. Enable Brotli compression. Serve WebP/AVIF images. In interviews, go beyond "add a CDN" to specify cache headers, TTL strategy, invalidation approach, and provider selection with reasoning.

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
How many months to prepare for Google interview?
Which language is required for a data analyst?
What is the 3-step interview process?
What is Dell best known for?
Who is the CEO of Palantir?
Where is Netflix's headquarters?
Related Courses
Grokking the Coding Interview: Patterns for Coding Questions course cover
Grokking the Coding Interview: Patterns for Coding Questions
The 24 essential patterns behind every coding interview question. Available in Java, Python, JavaScript, C++, C#, and Go. The most comprehensive coding interview course with 543 lessons. A smarter alternative to grinding LeetCode.
4.6
Discounted price for Your Region

$197

Grokking Modern AI Fundamentals course cover
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
3.9
Discounted price for Your Region

$72

Grokking Data Structures & Algorithms for Coding Interviews course cover
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
4
Discounted price for Your Region

$78

Design Gurus logo
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.