How would you model an interaction graph (followers/friends) efficiently?

An interaction graph represents how users connect and engage with each other. Think of it as a set of user nodes with relationship edges for follow and friend. The key is to store and query these edges so that reads and writes stay fast even as the graph reaches hundreds of millions of users and billions of relationships. In a system design interview you will be expected to model this graph, choose storage and indexes, and justify trade offs for scalability and reliability.

Why It Matters

Social products live or die by graph performance. Every core feature depends on it. Follow suggestions, mutual friends, who to notify, and feed fanout all rely on quick graph lookups. The wrong data model creates hot partitions, slow pagination, and expensive joins. The right model enables predictable latency, linear scale, and simple operations across a distributed system.

How It Works Step by Step

Clarify semantics
- Follow is a directed edge from A to B
- Friend is an undirected relationship which can be represented as two directed edges A to B and B to A
- Decide what metadata belongs on an edge such as creation time, source of action, ranking score, privacy flags
Choose the primary representation
- Adjacency list per user is the common choice for scale
- Maintain two lists per user
  - following list contains users this user follows
  - followers list contains users who follow this user
- Each entry stores neighbor user id plus a small payload such as created time and optional score
Pick a storage layout
- Document or key value style store
  - single row per user per list with a sorted container of neighbor ids
- Wide column store
  - partition by user id and cluster by neighbor id or created time for range scans
- Relational store
  - an edges table with from id and to id plus necessary indexes if scale is moderate
- For real time reads place a memory cache in front of the system for hot lists
Index for the main access patterns
- Membership check is A following B
  - store a tiny inverted index per user so that existence checks are O log N or O one depending on structure
- List with pagination
  - sort by created time descending and use a cursor based on last seen time or neighbor id
- Count queries
  - maintain counters for follower and following totals as separate fields updated with each write
Write path for a follow
- Validate both user ids and privacy policy
- Idempotency key prevents double insert on retries
- Dual write
  - append A to the followers list of B
  - append B to the following list of A
- Increment counters and enqueue downstream work for feed or notifications
- Soft delete on unfollow
  - remove both adjacency entries and decrement counters
Friend modeling
- Choose synchronous confirm or asynchronous accept flow
- Store both directions at confirmation time so reads stay cheap
- Optionally materialize a friends list per user as the set intersection of following and followers kept in the background
Sharding and partitioning
- Partition by user id using consistent hashing to spread large users
- For celebrities with very high degree set use dedicated partitions to avoid hot shards
- Keep adjacency data local to the user partition to favor common reads such as my following and my followers
Pagination that does not degrade
- Avoid offset since it grows linearly with page number
- Use time based or id based cursors which let storage use clustering to jump directly to the next page
- Cap page size and expose a stable order such as newest first
Caching strategy
- Cache top pages of each list in memory since they are read most often
- Invalidate by simple token or short TTL coupled with background refresh
- For membership checks use a tiny bloom filter or a compact set in memory for hot users
Derived queries

Mutual follows use set intersection between my following and your following
Mutual friends are the intersection of our friends lists
People you may know uses two hop walks such as friends of friends and can be precomputed for active users

Counters and analytics

Keep denormalized counters for followers and following on the user profile
Rebuild counters asynchronously if needed by scanning the adjacency lists

Safety and privacy

Store block edges as a separate structure and check them in the write path to avoid illegal edges
Honor private accounts by holding follow requests until approved

Real World Example

Consider a photo sharing app with a follow model. When user A taps follow on user B

The service writes an edge A to B and B to followers of B
Both user counters update in constant time
The service enqueues a notification for B
For feed, the system can either push A posts to B during fanout or rely on pull reads at view time. The graph store remains the source of truth for who follows whom
When the client shows B profile, it renders followers count from the counter and the first page of followers from the cache. Membership checks use a fast existence query to show the state of the follow button

This pattern keeps reads predictable and scales linearly since each user owns their shard of adjacency.

Common Pitfalls

Storing the entire list as one giant blob per user which forces full rewrites on each change and causes high write amplification
Using offset pagination which becomes slower as the page number grows
Modeling friend as a computed intersection only at read time which creates heavy fan in queries under load
Missing a reverse index which makes membership checks slow or forces expensive scans
Letting celebrity users overload one shard due to naive modulo based partitioning
Forgetting privacy and block rules in the write path which leads to inconsistent state

Interview Tip

Expect a prompt like design the follow graph for a social app that serves one hundred million daily active users. Start with the adjacency list per user. State the key access patterns membership check list with pagination and counts. Propose sharding by user id with consistent hashing. Add counters and caching. Close by explaining how you would handle celebrity users and how you would test the write path for idempotency and race safety.

Key Takeaways

Use adjacency lists per user with separate followers and following
Keep a compact payload on each edge and store creation time for ordering
Shard by user id with protection for celebrity users
Expose cursor based pagination and fast membership checks
Maintain denormalized counters and cache hot pages

Table of Comparison

Model or Store	Edge Representation	Strengths	Weaknesses	Best Use Case
Relational Table	One row per edge with from_id, to_id	Simple, consistent	Expensive joins at scale	Small to medium networks
Key-Value Store	One list per user per edge type	High scalability, fast reads	Hard to query across users	Large follow graphs
Wide-Column Store	Partition by user, cluster by neighbor/time	Efficient range queries	Schema complexity	High throughput systems
Graph Database	Nodes and edges as first-class citizens	Powerful traversals	Slow for deep graphs across shards	Analytics or recommendations
In-Memory Cache	Hot pages and memberships	Low latency	Eviction management	Popular users or repeated queries

FAQs

Q1. How should I store both followers and following efficiently?

Use two separate adjacency lists per user—one for followers and one for following. Partition by user ID so both lists live close together for efficient reads.

Q2. What’s the best way to paginate large follow lists?

Use cursor-based pagination with IDs or timestamps. Avoid offsets since they require scanning skipped rows, which gets slower over time.

Q3. How do you prevent hot shards for celebrity users?

Distribute their lists across multiple partitions or use dedicated shards. Combine with caching and rate limiting on deep pagination.

Q4. How do you calculate mutual friends or mutual follows?

Perform a set intersection between following lists of two users. For large graphs, precompute and store these intersections periodically.

Q5. How do you maintain accurate follower counts?

Store follower/following counters on the user profile. Update them transactionally with each follow or unfollow, and periodically reconcile in the background.

Only for small to medium systems or analytics workloads. For large-scale networks, adjacency lists in distributed KV or wide-column stores perform better.

Further Learning

Build a strong foundation with the fundamentals in our course Grokking System Design Fundamentals
Practice end to end social graph reasoning and feed scenarios in Grokking Scalable Systems for Interviews