How would you implement TTL/expiry semantics reliably in databases?
TTL (time to live) defines how long a record should stay valid before expiring. Expiry semantics ensure that outdated data, like session tokens or cached items, is removed automatically. Implementing TTL reliably in a distributed database is not trivial because of factors like replication lag, compaction delays, and inconsistent clocks across nodes. This guide explains how to design TTL in a way that is both reliable and efficient, especially for system design interviews.
Why It Matters
TTL is crucial for keeping systems clean, compliant, and cost-effective. In real-world systems:
- Performance: Expiring unused data improves query performance and reduces storage costs.
- Correctness: Expired sessions or tokens must not be reused, even if still stored.
- Compliance: Helps meet data retention and privacy laws.
- Reliability: Prevents stale data from affecting analytics or decision-making.
In a system design interview, discussing TTL demonstrates your understanding of lifecycle management, background processing, and database internals.
How It Works (Step-by-Step)
1. Define the Expiry Model
Add an expires_at column that stores an absolute expiry timestamp (NOW() + TTL_interval). This ensures consistency even if data replication happens later. You can also use a sliding TTL where the expiry extends after each access (common for sessions).
2. Enforce TTL at Read Time
Always filter queries by WHERE expires_at > NOW(). This prevents serving expired data even before it’s physically deleted. Create an index on expires_at for efficiency. Many production systems implement this via database views or APIs to ensure it cannot be bypassed.
3. Perform Background Cleanup
Use a background job or scheduled task to delete expired records. Deletion should happen in small batches (e.g., 10K rows per batch) to avoid write amplification and locking issues. In distributed databases like Cassandra or DynamoDB, compaction eventually purges tombstoned rows.
4. Use Partitioned Tables for Bulk Expiry
For very large datasets, partition tables by time (e.g., events_2025_11_10) and drop entire partitions. This is the most efficient form of expiry, especially for logs or metrics.
5. Handle Replication and Clock Skew
All TTL computation should use server-side time. Never rely on client timestamps because device clocks can drift. During replication, expired data should be invisible across all replicas. Always enforce the expires_at filter during reads.
6. Consider Native TTL Features
Some databases (like Cassandra, Redis, and MongoDB) support native TTL. These systems automatically remove expired items. However, TTL enforcement may be lazy (triggered during read or compaction), so for strong guarantees, still add a read filter.
7. Sliding TTL and Race Conditions
For session or token renewal, you can refresh expiry time using:
UPDATE sessions SET expires_at = GREATEST(expires_at, NOW() + interval '30 minutes') WHERE id = session_id;
Use atomic updates or compare-and-swap to avoid overwriting newer expiry times.
Real-World Example
Netflix stores temporary streaming metadata in a Cassandra cluster with a 24-hour TTL. The application relies on Cassandra’s native TTL for automatic expiry but also adds a filter to ensure no stale metadata is returned. For user sessions, Netflix uses a sliding TTL model—each time a session is validated, the TTL resets. Expired sessions are eventually compacted out without impacting read performance.
Common Pitfalls or Trade-offs
- Client-side expiry: Dangerous due to clock drift. Always compute
expires_atserver-side. - Large delete batches: Can cause lock contention or I/O spikes. Use throttled cleanup.
- Replica lag: Expired data might still appear on replicas. Add filters for correctness.
- Lazy compaction: Some databases only purge during compaction, which delays space reclamation.
- Over-refreshing TTL: Sliding TTL on popular keys can lead to write storms.
Interview Tip
Interviewers might ask, “How would you ensure expired sessions are not accessible during replica lag?”
You can answer: “I’d enforce expiry both at the application layer and the read path (WHERE expires_at > NOW()), ensuring expired data is never returned. Physical deletion can be asynchronous to reduce write pressure.”
Key Takeaways
- Always compute TTL server-side using a trusted clock.
- Combine logical (read-time) and physical (delete-time) expiry for safety.
- Use partitions for efficient bulk expiry.
- Throttle deletion jobs to avoid performance spikes.
- Never rely only on lazy compaction or cache TTL for correctness.
Table of Comparison
| Approach | Enforcement Mechanism | Space Reclamation Method | Consistency Strength | Cost and Complexity | Best Use Case |
|---|---|---|---|---|---|
| Native TTL (Cassandra, Redis) | Engine auto-deletes expired data | Automatic via compaction/GC | Weak on replicas (lazy purge) | Low | Session tokens, cache, ephemeral keys |
Manual expires_at column | Query filter expires_at > NOW() | Batch deletion by sweeper job | Strong, if enforced in queries | Moderate | Application data with correctness needs |
| Partition-based expiry | Partition boundaries filter data | Drop partitions periodically | Very strong | Low for bulk ops | Logs, metrics, analytics tables |
| Soft delete (tombstones) | Logical flag deleted_at IS NULL | Later vacuum or compaction | High read cost, recoverable | Medium | Auditable or recoverable datasets |
| Cache-only TTL | Cache evicts after TTL | Auto-eviction | Weak (DB still holds data) | Low | Temporary performance cache |
FAQs
Q1. What is the difference between soft TTL and hard TTL?
Soft TTL allows serving slightly stale data while triggering background refresh, whereas hard TTL makes data instantly invalid after expiry.
Q2. How do I avoid load spikes from TTL deletions?
Use throttled batch jobs and incremental partition drops. Deleting millions of rows in one go can overload storage I/O.
Q3. Why not rely only on cache TTL?
Cache expiry only affects the in-memory layer. The database will still store expired data, which could violate compliance or correctness rules.
Q4. How does replication affect TTL behavior?
Replicas might lag behind the primary and show expired data temporarily. Always add expires_at > NOW() checks during reads to stay safe.
Q5. What is the most efficient way to handle large-scale expiry?
Time-based partitions. Dropping an old partition is far cheaper than row-by-row deletion.
Q6. What TTL approach is best for compliance-driven retention?
Use absolute TTL with enforced read filters and partition-level expiry to ensure deterministic deletion and auditability.
Further Learning
For a deeper dive into lifecycle management, caching, and cleanup mechanisms, explore Grokking System Design Fundamentals.
If you want to master large-scale expiry handling in distributed databases, see Grokking Scalable Systems for Interviews.
For advanced interview preparation on database internals and design trade-offs, check out Grokking the System Design Interview.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78