How would you model an interaction graph (followers/friends) efficiently?
An interaction graph represents how users connect and engage with each other. Think of it as a set of user nodes with relationship edges for follow and friend. The key is to store and query these edges so that reads and writes stay fast even as the graph reaches hundreds of millions of users and billions of relationships. In a system design interview you will be expected to model this graph, choose storage and indexes, and justify trade offs for scalability and reliability.
Why It Matters
Social products live or die by graph performance. Every core feature depends on it. Follow suggestions, mutual friends, who to notify, and feed fanout all rely on quick graph lookups. The wrong data model creates hot partitions, slow pagination, and expensive joins. The right model enables predictable latency, linear scale, and simple operations across a distributed system.
How It Works Step by Step
-
Clarify semantics
- Follow is a directed edge from A to B
- Friend is an undirected relationship which can be represented as two directed edges A to B and B to A
- Decide what metadata belongs on an edge such as creation time, source of action, ranking score, privacy flags
-
Choose the primary representation
-
Adjacency list per user is the common choice for scale
-
Maintain two lists per user
- following list contains users this user follows
- followers list contains users who follow this user
-
Each entry stores neighbor user id plus a small payload such as created time and optional score
-
-
Pick a storage layout
-
Document or key value style store
- single row per user per list with a sorted container of neighbor ids
-
Wide column store
- partition by user id and cluster by neighbor id or created time for range scans
-
Relational store
- an edges table with from id and to id plus necessary indexes if scale is moderate
-
For real time reads place a memory cache in front of the system for hot lists
-
-
Index for the main access patterns
-
Membership check is A following B
- store a tiny inverted index per user so that existence checks are O log N or O one depending on structure
-
List with pagination
- sort by created time descending and use a cursor based on last seen time or neighbor id
-
Count queries
- maintain counters for follower and following totals as separate fields updated with each write
-
-
Write path for a follow
-
Validate both user ids and privacy policy
-
Idempotency key prevents double insert on retries
-
Dual write
- append A to the followers list of B
- append B to the following list of A
-
Increment counters and enqueue downstream work for feed or notifications
-
Soft delete on unfollow
- remove both adjacency entries and decrement counters
-
-
Friend modeling
- Choose synchronous confirm or asynchronous accept flow
- Store both directions at confirmation time so reads stay cheap
- Optionally materialize a friends list per user as the set intersection of following and followers kept in the background
-
Sharding and partitioning
- Partition by user id using consistent hashing to spread large users
- For celebrities with very high degree set use dedicated partitions to avoid hot shards
- Keep adjacency data local to the user partition to favor common reads such as my following and my followers
-
Pagination that does not degrade
- Avoid offset since it grows linearly with page number
- Use time based or id based cursors which let storage use clustering to jump directly to the next page
- Cap page size and expose a stable order such as newest first
-
Caching strategy
- Cache top pages of each list in memory since they are read most often
- Invalidate by simple token or short TTL coupled with background refresh
- For membership checks use a tiny bloom filter or a compact set in memory for hot users
-
Derived queries
- Mutual follows use set intersection between my following and your following
- Mutual friends are the intersection of our friends lists
- People you may know uses two hop walks such as friends of friends and can be precomputed for active users
- Counters and analytics
- Keep denormalized counters for followers and following on the user profile
- Rebuild counters asynchronously if needed by scanning the adjacency lists
- Safety and privacy
- Store block edges as a separate structure and check them in the write path to avoid illegal edges
- Honor private accounts by holding follow requests until approved
Real World Example
Consider a photo sharing app with a follow model. When user A taps follow on user B
- The service writes an edge A to B and B to followers of B
- Both user counters update in constant time
- The service enqueues a notification for B
- For feed, the system can either push A posts to B during fanout or rely on pull reads at view time. The graph store remains the source of truth for who follows whom
- When the client shows B profile, it renders followers count from the counter and the first page of followers from the cache. Membership checks use a fast existence query to show the state of the follow button
This pattern keeps reads predictable and scales linearly since each user owns their shard of adjacency.
Common Pitfalls
-
Storing the entire list as one giant blob per user which forces full rewrites on each change and causes high write amplification
-
Using offset pagination which becomes slower as the page number grows
-
Modeling friend as a computed intersection only at read time which creates heavy fan in queries under load
-
Missing a reverse index which makes membership checks slow or forces expensive scans
-
Letting celebrity users overload one shard due to naive modulo based partitioning
-
Forgetting privacy and block rules in the write path which leads to inconsistent state
Interview Tip
Expect a prompt like design the follow graph for a social app that serves one hundred million daily active users. Start with the adjacency list per user. State the key access patterns membership check list with pagination and counts. Propose sharding by user id with consistent hashing. Add counters and caching. Close by explaining how you would handle celebrity users and how you would test the write path for idempotency and race safety.
Key Takeaways
-
Use adjacency lists per user with separate followers and following
-
Keep a compact payload on each edge and store creation time for ordering
-
Shard by user id with protection for celebrity users
-
Expose cursor based pagination and fast membership checks
-
Maintain denormalized counters and cache hot pages
Table of Comparison
| Model or Store | Edge Representation | Strengths | Weaknesses | Best Use Case |
|---|---|---|---|---|
| Relational Table | One row per edge with from_id, to_id | Simple, consistent | Expensive joins at scale | Small to medium networks |
| Key-Value Store | One list per user per edge type | High scalability, fast reads | Hard to query across users | Large follow graphs |
| Wide-Column Store | Partition by user, cluster by neighbor/time | Efficient range queries | Schema complexity | High throughput systems |
| Graph Database | Nodes and edges as first-class citizens | Powerful traversals | Slow for deep graphs across shards | Analytics or recommendations |
| In-Memory Cache | Hot pages and memberships | Low latency | Eviction management | Popular users or repeated queries |
FAQs
Q1. How should I store both followers and following efficiently?
Use two separate adjacency lists per user—one for followers and one for following. Partition by user ID so both lists live close together for efficient reads.
Q2. What’s the best way to paginate large follow lists?
Use cursor-based pagination with IDs or timestamps. Avoid offsets since they require scanning skipped rows, which gets slower over time.
Q3. How do you prevent hot shards for celebrity users?
Distribute their lists across multiple partitions or use dedicated shards. Combine with caching and rate limiting on deep pagination.
Q4. How do you calculate mutual friends or mutual follows?
Perform a set intersection between following lists of two users. For large graphs, precompute and store these intersections periodically.
Q5. How do you maintain accurate follower counts?
Store follower/following counters on the user profile. Update them transactionally with each follow or unfollow, and periodically reconcile in the background.
Q6. Should we use a graph database for social graphs?
Only for small to medium systems or analytics workloads. For large-scale networks, adjacency lists in distributed KV or wide-column stores perform better.
Further Learning
-
Build a strong foundation with the fundamentals in our course Grokking System Design Fundamentals
-
Practice end to end social graph reasoning and feed scenarios in Grokking Scalable Systems for Interviews
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78