On this page
Requirements gathering
Capacity estimation
API design
Data model
High-level architecture
The core decision: fan-out on write vs fan-out on read
How Facebook, Twitter, and Instagram actually do it
Ranking: how the feed knows you
Caching strategy
Scaling and fault tolerance
Follow-up questions interviewers ask
Putting it all together
Related case studies and deep dives
How to Design a Social Media News Feed: A System Design Interview Walkthrough


On This Page
Requirements gathering
Capacity estimation
API design
Data model
High-level architecture
The core decision: fan-out on write vs fan-out on read
How Facebook, Twitter, and Instagram actually do it
Ranking: how the feed knows you
Caching strategy
Scaling and fault tolerance
Follow-up questions interviewers ask
Putting it all together
Related case studies and deep dives
This blog unravels how to build a social network’s news feed system. We’ll cover how ranking algorithms personalize each feed, how to design an efficient backend and database for feed data, caching strategies for fast delivery, and techniques to scale the system for heavy read traffic.
Open Facebook, Twitter, or Instagram right now and scroll your feed. Notice something: it feels like the app knows you. The first post is someone you actually care about. The video that auto-plays is exactly the kind you'd click on. The ad is a little too relevant.
That magic — the feed that knows you — is one of the hardest engineering problems at FAANG scale. It's also one of the most-asked system design interview questions, because it touches almost every major concept: sharding, caching, fan-out, ranking, consistency trade-offs, asynchronous processing.
This guide walks through the full case study the way a senior engineer would answer it at the whiteboard. By the end, you'll know:
- How to frame requirements so the interviewer knows you're designing for the right scale
- The math on capacity estimation (QPS, storage, bandwidth — with specific numbers)
- The fan-out-on-write vs fan-out-on-read debate that's central to every feed design
- How Facebook, Twitter, and Instagram handle this differently — and what that tells you about the trade-offs
- How ranking, caching, sharding, and scaling actually work in practice
- The follow-up questions interviewers ask after you've done the initial design
Let's get into it.
Requirements gathering
The first move in any system design interview is to gather requirements. Don't skip this — interviewers specifically listen for whether candidates clarify scope before they start drawing boxes. For the deep dive on the full requirements ritual, see how to gather and prioritize requirements.
For a news feed, here's what I'd ask and then lock in:
Functional requirements:
- Users can post content (text, images, videos) visible to their followers
- Users see a personalized feed of posts from accounts they follow
- The feed is ranked (not strictly chronological) to surface more relevant content first
- Users can like, comment, and share posts (which feeds back into ranking)
- Feed loads fast — both on initial open and on pull-to-refresh
Non-functional requirements:
- Massive read-heavy traffic (reads vastly outnumber writes)
- Low latency — feed should load in under 200ms
- High availability — a brief feed outage is a top-of-homepage outage
- Eventual consistency is acceptable — if a new post takes a few hundred ms to appear for all followers, no one notices
- Scales to hundreds of millions of daily active users
Explicitly out of scope (for the initial design):
- Ads and monetization
- Live video / stories (separate systems, not the core feed)
- Direct messaging
- Content moderation (acknowledge it exists; don't design it here)
That scoping exercise alone, done out loud in an interview, separates candidates who "get it" from candidates who just start drawing.
Capacity estimation
Now for the math. Interviewers want to see concrete numbers, not hand-waving. Here's how I'd estimate for a Twitter-scale product. The full technique is covered in back-of-the-envelope estimation, but for this case study:
Users:
- 500M daily active users (DAU)
- Each user opens the app 5 times a day on average
- Each session refreshes the feed twice → 5B feed reads per day
Writes:
- 10% of DAU posts at least once per day → 50M posts/day
- Average 200 bytes text + 1 image URL per post → ~1KB per post metadata
Reads vs writes: 5B reads vs 50M writes = 100:1 read-to-write ratio. This is the single most important number in the design. It tells you to optimize ruthlessly for read performance.
QPS (queries per second):
- Feed reads: 5B / 86,400 seconds/day ≈ 58,000 QPS average, ~150,000 QPS at peak
- Posts: 50M / 86,400 ≈ 580 writes/second, ~1,500 at peak
Storage (post metadata only, ignoring media):
- 50M posts/day × 1KB × 365 days × 5 years = ~91 TB for 5 years of posts
- Media (images, video) stored separately in object storage + CDN — we'll estimate that at 10x the metadata, so ~1 PB
Social graph:
- Average user follows 200 accounts → 500M × 200 = 100B follow edges
- At ~20 bytes per edge → ~2 TB
Bandwidth:
- 150K peak QPS × ~10KB per feed response = ~1.5 GB/sec egress at peak
The numbers tell a clear story: this is a read-heavy, storage-heavy, distributed problem. A single database instance handles ~10,000 QPS on a good day. We need serious distribution.
API design
Three endpoints cover the core flow:
POST /v1/posts
Body: { content, media_urls, visibility }
Returns: { post_id, created_at }
GET /v1/feed?user_id={id}&cursor={cursor}&limit=20
Returns: { posts: [...], next_cursor }
POST /v1/posts/{post_id}/reactions
Body: { type: "like" | "comment" | "share", payload? }
Returns: { reaction_id }
Three things to flag explicitly in an interview:
- Cursor-based pagination, not offset-based. Offset breaks when new content arrives between pages. Cursor (timestamp or post ID) is stable.
- Feed is keyed by user_id, not returned in the request body. Authentication (inferred from the session) determines whose feed you get.
- Reactions are a separate endpoint. They fire asynchronously — the reaction is immediately visible to the actor but propagates to ranking signals eventually. For the full API design discussion, see mastering the API interview.
Data model
Three main entities plus a feed table:
Posts table (sharded by post_id):
post_id (primary key)
author_id
content
media_urls
created_at
visibility
Follows table (sharded by follower_id):
follower_id
followee_id
created_at
Indexed both directions — we need "who does user X follow?" AND "who follows user X?" efficiently.
User feed table (sharded by user_id — the read-optimized table):
user_id
post_id
score
inserted_at
This is the precomputed feed — one row per (user, post_in_their_feed) pair. This is what we read when a user opens the app.
Why a separate feed table? Because computing a user's feed by JOINing posts and follows at read time would be catastrophic at 150K QPS. The feed table is denormalized — it duplicates data but makes reads fast. This is classic system design: trade storage for read performance. If the denormalization trade-off feels fuzzy, normalization vs denormalization has the full treatment.
Sharding strategy: All three tables shard by the query key. Posts by post_id (most lookups are "get this specific post"). Follows by follower_id (most queries are "who does user X follow?"). Feeds by user_id (we always query "get user X's feed"). Cross-shard queries are minimized. For the full sharding playbook, see the database sharding guide.
Database choice: For the posts and follows tables, a distributed SQL or wide-column store (Cassandra, Vitess, CockroachDB). For the feed table, something optimized for fast key-value reads — Redis-backed for hot users, with a Cassandra fallback for cold users. The logic: different data has different access patterns; don't force them all into one database. If the SQL vs NoSQL choice isn't clear, NoSQL databases for system design breaks down the trade-offs.
High-level architecture
The system breaks into these services:
- Post Service — writes new posts to the posts table; publishes "new post" events to a message queue
- Follow Service — manages the follows table; answers "who does X follow?"
- Feed Service — serves
GET /feedrequests; reads from the feed table + post table via cache - Fan-out Service (background workers) — consumes "new post" events, computes affected followers, and writes to feed tables
- Ranking Service — computes relevance scores for posts, either in the fan-out path or at read time
- Cache Layer (Redis) — caches hot feeds, hot posts, and the social graph
- CDN — serves images and video at the edge
- Message Queue (Kafka) — decouples the post event from the fan-out work
The request flow for a feed read:
- Client calls
GET /v1/feed - API Gateway routes to Feed Service (load-balanced across many instances — see load balancer guide)
- Feed Service checks Redis cache for the user's feed
- Cache hit (90%+ of the time): return immediately
- Cache miss: read from feed table → hydrate posts from post cache or post table → rank → return → write back to cache
The request flow for a post creation:
- Client calls
POST /v1/posts - Post Service writes to posts table (synchronous)
- Post Service publishes
new_post_createdevent to Kafka (async) - Fan-out workers consume the event, resolve followers, and write to affected feed tables
- Client gets a success response immediately after step 2 — the fan-out happens in the background
Now we get to the heart of the design: fan-out.
The core decision: fan-out on write vs fan-out on read
This is the question every news feed interview hinges on. Get this right and the rest of the design falls into place.
Fan-out on write (push model). When user A posts, the system immediately writes that post into every follower's feed table. By the time any follower opens the app, their feed is already assembled. Reads are O(1) lookups.
Pros: Feed reads are blazingly fast. Minimal work at read time. Scales well for the 99% of users with modest follower counts. Cons: A celebrity with 100M followers posts, and you have to write 100M rows. The "write amplification" is massive.
Fan-out on read (pull model). When user A opens the app, the system looks up who A follows, fetches recent posts from each, merges them, ranks, and returns.
Pros: Writes are trivial — one post, one row inserted. No wasted work for inactive users. Cons: Reads are expensive — lots of database queries, lots of merging. Latency is high at scale. Fails for users who follow thousands of accounts.
The hybrid approach. Real systems use both, switching strategies based on the follower count of the poster:
- Regular users (under ~10k followers): fan-out on write. The write amplification is tolerable.
- Celebrity users (over ~10k followers): fan-out on read for this specific user's posts, merged with the pre-computed feed at read time.
When you open your app, the system loads your pre-computed feed (which contains posts from all the regular accounts you follow), then separately fetches recent posts from any celebrities you follow, then merges and ranks the combined list. Two queries instead of one, but you avoid the 100M-write problem.
In an interview: explicitly walk through this decision. The candidate who says "I'd use fan-out on write" without acknowledging the celebrity problem has given an incomplete answer. The candidate who says "I'd use a hybrid with a threshold around 10k followers, fan-out-on-write below and fan-out-on-read above" has given a senior answer.
How Facebook, Twitter, and Instagram actually do it
These design decisions aren't theoretical — the real platforms have documented their approaches. Knowing the real-world choices lets you anchor your interview answer:
Facebook uses heavy ML ranking. The feed is NOT chronological at all — it's scored by EdgeRank (the original algorithm) and its many successors, optimizing for predicted engagement. Facebook pioneered fan-out on write, then migrated to a hybrid as follower counts grew. The feed storage is a Cassandra-like wide-column store; the ranking model is one of the largest production ML systems in the world.
Twitter uses a hybrid fan-out model (originally called "Timeline Service"). For regular accounts: fan-out on write, precomputed timelines stored in Redis. For celebrity accounts (millions of followers): fan-out on read, their tweets are fetched at read time and merged with the user's precomputed timeline. Twitter's original feed was chronological; it's now ML-ranked with a chronological option preserved.
Instagram started with fan-out on write and pure chronological ordering, then shifted to ML ranking around 2016. The feed is heavily media-driven, so the challenge is less about ranking and more about fast image/video delivery (which leans heavily on CDN infrastructure).
The pattern: all three converged on similar architectures — hybrid fan-out, precomputed feeds, ML ranking, heavy caching. The differences are mostly in the ranking models and the specific thresholds they use.
In an interview, you can reference this: "The approach I'm describing is basically what Twitter evolved to — hybrid fan-out, precomputed feeds in Redis for regular users, fan-out on read for celebrities, ML ranking on top." That kind of grounded answer signals you've read about how the real systems work, not just memorized the textbook.
Ranking: how the feed knows you
Modern feeds aren't chronological. They're ranked by a machine learning model that scores each candidate post for each user, then sorts by predicted relevance.
The input features typically include:
- User-poster affinity signals: how often you interact with this poster, whether they're in your close-friends list, how often they view your content
- Content signals: post type (video, image, text), post age (recency decay), post engagement (likes/comments/shares per minute)
- Context signals: time of day, device, recent session activity
- Negative signals: have you hidden posts from this poster, skipped similar content, or marked it uninteresting
The model outputs a relevance score. The feed is sorted by score, not timestamp.
Where does the ranking run?
Two options, each with trade-offs:
-
Rank during fan-out (pre-ranked feed). Compute the score when the post is inserted into each follower's feed. Stored alongside the post_id. Read time is just "fetch sorted list." Simple, fast at read time. But: ranking can't take into account the current user's context (time of day, recent session) — those signals are frozen at insert time.
-
Rank at read time (on-the-fly). Pull ~200 candidate posts from the feed table, re-score them at request time with current context, return the top 20. More accurate but slower, and requires the ranking model to run at 150K QPS.
Real systems use a hybrid — a coarse ranking at fan-out time (cheap, fast) filters candidates, and a precision ranking at read time (expensive, accurate) orders the final set. This is sometimes called two-stage ranking.
The deep ML details aren't usually expected in a system design interview. What IS expected is that you can name the ranking stage explicitly, explain what signals it uses, and describe where it runs in the architecture.
Caching strategy
Caching is how a read-heavy system actually survives at scale. The caching plan for the feed:
Feed cache per user (Redis). The top 500 posts in each active user's feed, stored as a sorted set keyed by user_id. TTL of a few minutes with invalidation on new posts from followed users. This cache absorbs the vast majority of feed reads — target a 90%+ hit rate. For the full caching deep dive, see caching for system design interviews.
Post content cache. Popular posts (going viral) are fetched by millions of users. Cache the full post content keyed by post_id so every feed request hydrating that post hits Redis, not the post database.
Social graph cache. "Who does user X follow?" — this list is relatively stable and read constantly. Cache it per user with a longer TTL (minutes to hours).
CDN for media. Images and video URLs in posts should resolve to CDN edge nodes, not origin servers. The full CDN story is in the CDN system design basics post.
Cache invalidation strategy. When a new post arrives from someone you follow, we have three options: (a) invalidate your feed cache entirely (simple, costs a cache rebuild), (b) partial invalidation — insert the new post and drop the oldest (more efficient), or (c) lazy — let the TTL handle it (cheapest but the user sees stale content briefly). Most systems use (b). The cache invalidation problem is famously hard; see cache invalidation strategies for the full picture.
Eviction policy. LRU (least recently used) across all caches. Active users' feeds stay hot; inactive users' feeds age out naturally.
Scaling and fault tolerance
A few additional concerns that come up as follow-ups:
Message queue for fan-out work. When a user posts, the fan-out could affect millions of rows. We don't want the HTTP response to wait for that work. The Post Service writes the post synchronously, then publishes a new_post_created event to Kafka. A pool of fan-out worker services consumes the event, resolves the follower list, and writes to feed tables in parallel batches. This decouples the expensive work from the user's action. Full broker comparison in Kafka vs RabbitMQ vs ActiveMQ.
Asynchronous ranking updates. User engagement signals (likes, comments, time-on-post) flow into a separate event stream. A stream processor (Kafka Streams or Flink) updates the ranking model's features in near-real-time. See Kafka Streams vs Flink vs Storm for which processor fits where.
Replication and availability. Every shard has at least 3 replicas across availability zones. Read replicas serve the read-heavy query load. For the full availability story, see high availability in system design.
Graceful degradation. If the ranking service is slow, fall back to chronological ordering. If the feed cache is cold, serve a shorter feed. If the fan-out queue is backed up, readers can still see cached content. Never return a blank page.
Idempotency. The fan-out worker might retry events on failure. We need the feed-insert operation to be idempotent so retries don't create duplicates. See idempotency in system design for how to design this correctly.
Consistency model. The feed is eventually consistent. A new post might take a few hundred ms to appear on all followers' feeds. That's fine — users don't notice. The deeper framework for reasoning about consistency under partition is CAP theorem vs PACELC.
Follow-up questions interviewers ask
Once you've done the initial design, the interview usually continues with follow-ups. Be ready for:
- "How would you handle a celebrity with 500M followers posting a viral tweet?" (Hybrid fan-out, pull-on-read for their posts. If the post goes viral, front-cache it in Redis so every follower's feed hydration hits cache, not the post DB.)
- "A bug was shipped that injected bad posts into user feeds. How do you recover?" (Replay from the event log — Kafka retention lets us rebuild affected feed tables from scratch. This is why event-sourcing-friendly architectures are valuable.)
- "How would you add a 'only show posts from the last hour' option?" (Additional filter at read time, trivial with the feed table indexed by timestamp.)
- "How do you rank content from a user who just joined and follows nobody?" (Cold-start problem — serve a trending/popular feed, gradually personalize as engagement signals accumulate.)
- "What if a user unfollows someone? Do we remove that user's posts from their feed immediately?" (Trade-off: proactive cleanup is expensive; lazy cleanup is simpler. Most systems filter unfollowed users' posts at read time rather than deleting rows.)
- "How do you handle abuse — bots posting spam that dominates feeds?" (Separate moderation service scores posts for spam/abuse; low-scoring posts are filtered from fan-out. Out of scope for initial design, but acknowledge.)
- "How would you add 'stories' (ephemeral 24-hour content)?" (Different data model — stories have a TTL on the post itself and aren't added to the permanent feed table. A separate stories service + separate feed rendering path.)
If you can handle five of those seven without hesitating, you've nailed the follow-ups.
Putting it all together
Here's the one-sentence version: A news feed is a read-heavy problem where the job is to precompute as much as possible at write time, cache everything at read time, and degrade gracefully when either breaks.
In an interview, the structure that signals seniority is:
- Gather requirements explicitly before drawing anything
- Do the capacity estimation with real numbers
- Name the fan-out-on-write vs fan-out-on-read decision AND the hybrid approach
- Describe ranking as a distinct pipeline stage with clear inputs and outputs
- Make the caching strategy explicit — what's cached where, how invalidation works
- Acknowledge the reality: Facebook, Twitter, and Instagram have all converged on this basic shape
Good luck with your next interview.
Related case studies and deep dives
The news feed is one of the most interlinked problems in the case studies. Related reads:
- How to Design Instagram — a specific variant of this design with media as the primary content type.
- How to Design a Chat Application — another fan-out messaging case study with different trade-offs (real-time delivery vs precomputed feed).
- How to Design a Recommendation System — the ranking problem generalized. Feed ranking is a specific case of recommendation.
- Database Sharding Guide — the sharding strategy that makes feed storage scale.
- Caching for System Design — the deep dive on cache design patterns.
- How to Design a URL Shortener — the classic case study for foundational interview framework, useful as a contrast to this one.
For the full system design interview roadmap, start with my complete system design interview guide.
What our users say
Eric
I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.
Tonya Sims
DesignGurus.io "Grokking the Coding Interview". One of the best resources I’ve found for learning the major patterns behind solving coding problems.
Roger Cruz
The world gets better inch by inch when you help someone else. If you haven't tried Grokking The Coding Interview, check it out, it's a great resource!
Access to 50+ courses
New content added monthly
Certificate of completion
$29.08
/month
Billed Annually
Recommended Course

Grokking the Object Oriented Design Interview
59,389+ students
3.9
Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions. Master low level design interview.
View CourseRead More
The Ultimate System Design Cheat Sheet (2026) – Ace Your System Design Interview
Arslan Ahmad
System Design Tutorial: Step-by-Step Beginner's Guide
Arslan Ahmad
Mastering the System Design Interview: Landing Your Dream Job
Arslan Ahmad
How to Design a Rate Limiter: Algorithms, Architecture, and Trade-offs
Arslan Ahmad