How do you tune compaction in log‑structured storage?
Time partitioning is a powerful technique for managing massive datasets efficiently. By dividing data into time-based partitions (for example, by day or hour), systems can query, store, and maintain data more effectively. This approach is widely used in analytics, logging, and observability systems like Netflix, Amazon, and Uber to handle petabytes of time-series data. When done right, it improves query performance, reduces storage cost, and simplifies data lifecycle management.
Why It Matters
In large-scale distributed systems, scanning unnecessary data can be extremely costly. Time partitioning allows systems to skip irrelevant data by reading only partitions that match a query’s time range. It also enables efficient data retention and backfilling while supporting concurrent ingestion at scale. From a system design interview perspective, this concept demonstrates your ability to balance query latency, cost efficiency, and scalability. You are showing not only how to store data but how to structure it for predictable performance as volume grows.
How It Works (Step-by-Step)
1. Profile the workload
Understand the workload pattern — what time range users query, how often new data arrives, and the acceptable latency for analytics and reporting. This helps determine partition granularity and compaction strategy.
2. Choose the correct time attribute
Use event time (when an event occurred) for analytics, since it aligns with user behavior. Ingest time (when data is stored) is useful for debugging or monitoring ingestion pipelines.
3. Decide partition granularity
Daily partitions are standard for most analytical workloads. Hourly partitions are preferred for near-real-time monitoring systems. Use the coarsest granularity that still meets your query filter requirements.
4. Define partition naming and layout
Organize partitions consistently. Example paths:
dt=2025-11-05 for daily, or dt=2025-11-05/hr=14 for hourly.
Always store and partition in UTC to avoid timezone confusion.
5. Manage file format and size
Use columnar formats like Parquet or ORC. Keep file sizes around 100–500 MB to optimize scan performance. Merge smaller files periodically using a compaction job.
6. Handle late or out-of-order data
Real-world systems receive delayed data. Use a watermark strategy (for example, event time minus one hour/day) to accept late arrivals and reprocess only impacted partitions.
7. Use clustering and indexing
Within each time partition, cluster or sort data by frequently filtered columns (for example, user_id or service_name) to enhance scan efficiency.
8. Plan data retention and tiering
Older partitions can be dropped or moved to cheaper storage (like S3 Glacier or cold tier). This keeps operational datasets light while retaining historical data affordably.
9. Distribute write load
If an hour becomes a hotspot, add secondary bucketing (for example, hashing by user ID within that hour) to distribute load across multiple files or writers.
10. Automate maintenance tasks
Automate partition discovery, compaction, and metadata cleanup. Monitor metrics like small-file count, query pruning efficiency, and partition skew.
Real-World Example
At Netflix, client logs stream into an event lake partitioned by dt and hr. Streaming jobs write data into hourly folders, compact them periodically, and move older data to cheaper tiers. Dashboards scan only recent partitions, reducing scan cost dramatically. Late arrivals are re-merged with watermark-based jobs that rewrite only affected partitions. The combination of time partitioning and compaction keeps storage efficient while ensuring analytics remain accurate and low-latency.
Common Pitfalls or Trade-offs
1. Too fine-grained partitions Minute-level partitions increase metadata and file counts, slowing query planning and compaction.
2. Too coarse partitions Month-level partitions reduce pruning efficiency and force wide scans.
3. Using ingest time instead of event time Misalignment between event occurrence and ingestion leads to inaccurate analytics and retention logic.
4. Small file problem Streaming pipelines often generate too many small files. Without periodic compaction, query performance degrades sharply.
5. Timezone inconsistencies Partitioning in local time can cause duplicated or missing hours during daylight saving transitions. Always use UTC.
6. Skewed partitions Events like flash sales or outages create high-volume partitions. Use hash bucketing within time partitions to balance the load.
Interview Tip
A common interview follow-up is: “What happens when late data arrives after the partition is compacted?”
A strong answer mentions using a watermark to define lateness, idempotent ingestion (based on event ID), and reprocessing only the affected partitions. Mention that using UTC-based hourly partitions simplifies retention and analytics. Understanding these trade-offs shows real-world design maturity.
For in-depth practice, explore Grokking Scalable Systems for Interviews.
Key Takeaways
-
Time partitioning enables query pruning and faster analytics.
-
Choose event time over ingest time for correct results.
-
Use daily or hourly partitions depending on workload.
-
Avoid small files by scheduling regular compaction.
-
Drop old partitions to enforce retention efficiently.
Table of Comparison
| Strategy | Best For | Strengths | Weaknesses | Typical Query Pattern |
|---|---|---|---|---|
| Time partitioning (day/hour) | Analytics, observability | Fast pruning, easy retention | Late data handling, small-file issue | last N days, rolling windows |
| Hash partitioning (by user/tenant) | Uniform workloads | Load balancing, avoids hotspots | Poor pruning on time queries | user_id = X |
| Range partitioning (on numeric key) | Metric thresholds | Efficient range scans | Moving hot spots | metric between A and B |
| Clustered partitioning (Z-order or sorted) | Mixed filters | Better skipping inside partitions | More maintenance overhead | time + user_id |
| Hybrid time + hash partitioning | Hot hours or bursts | Balances write load within time | Higher metadata count | hour = X, user_id in subset |
FAQs
Q1. Should I partition by event time or ingest time?
Always partition by event time for analytics accuracy. Use ingest time only for pipeline debugging or monitoring.
Q2. How large should files be within a partition?
Aim for 100–500 MB per file. Merge smaller files using compaction jobs to maintain scan efficiency.
Q3. How do I handle late-arriving events?
Use a watermark window to capture late arrivals and rewrite only the affected partitions. Keep ingestion idempotent to avoid duplicates.
Q4. Which is better: daily or hourly partitions?
Use daily partitions for most batch workloads. Use hourly for near-real-time analytics or systems with spiky data arrival.
Q5. Can I combine time partitioning with hash partitioning?
Yes. Combine them when hourly partitions become hotspots. Hashing within a time bucket spreads write load evenly.
Q6. How do I implement data retention with time partitions?
Automate deletion of old partitions (for example, keep 90 days). Run vacuum or cleanup jobs to remove stale metadata.
Further Learning
For deeper insights into data partitioning, pruning, and storage efficiency, check out Grokking Scalable Systems for Interviews.
To build strong foundations in system design principles and scalability strategies, start with Grokking System Design Fundamentals.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78