0% completed
The explanation for “What if we can combine sharding by TweetID and Tweet creation time” is very poor.
julian_humecki
Jun 20, 2024
You correctly say that sharding based on tweet id fixes the hot user problem but then increases latency since you need to query all shards (in the case of a shard taking longer to respond, your latency increases). This is fine.
Your second suggestion is epoch time, and that doesn’t work cuz servers holding recent info with be much hotter than other serivera (due to range based partitioning)
What’s not fine is claiming that if you combine epoch time with a sequence number to be your tweet id that automatically all your problems are solved. You go back to using your hashing function and you still need to query all your db servers.
Do correct me, but I don’t see how that fixes the latency problem.
9
0
Comments
Design Gurus6 months ago
Your understanding is right.
The latency issue is discussed at the end of the section. For reference, here it is:
In the above approach, we still have to query all the servers for timeline generation, but our reads (and writes) will be substantially quicker.
- ...