How would you implement TTL/expiry semantics reliably in databases?

TTL (time to live) defines how long a record should stay valid before expiring. Expiry semantics ensure that outdated data, like session tokens or cached items, is removed automatically. Implementing TTL reliably in a distributed database is not trivial because of factors like replication lag, compaction delays, and inconsistent clocks across nodes. This guide explains how to design TTL in a way that is both reliable and efficient, especially for system design interviews.

Why It Matters

TTL is crucial for keeping systems clean, compliant, and cost-effective. In real-world systems:

Performance: Expiring unused data improves query performance and reduces storage costs.
Correctness: Expired sessions or tokens must not be reused, even if still stored.
Compliance: Helps meet data retention and privacy laws.
Reliability: Prevents stale data from affecting analytics or decision-making.

In a system design interview, discussing TTL demonstrates your understanding of lifecycle management, background processing, and database internals.

How It Works (Step-by-Step)

1. Define the Expiry Model

Add an expires_at column that stores an absolute expiry timestamp (NOW() + TTL_interval). This ensures consistency even if data replication happens later. You can also use a sliding TTL where the expiry extends after each access (common for sessions).

2. Enforce TTL at Read Time

Always filter queries by WHERE expires_at > NOW(). This prevents serving expired data even before it’s physically deleted. Create an index on expires_at for efficiency. Many production systems implement this via database views or APIs to ensure it cannot be bypassed.

3. Perform Background Cleanup

Use a background job or scheduled task to delete expired records. Deletion should happen in small batches (e.g., 10K rows per batch) to avoid write amplification and locking issues. In distributed databases like Cassandra or DynamoDB, compaction eventually purges tombstoned rows.

4. Use Partitioned Tables for Bulk Expiry

For very large datasets, partition tables by time (e.g., events_2025_11_10) and drop entire partitions. This is the most efficient form of expiry, especially for logs or metrics.

5. Handle Replication and Clock Skew

All TTL computation should use server-side time. Never rely on client timestamps because device clocks can drift. During replication, expired data should be invisible across all replicas. Always enforce the expires_at filter during reads.

6. Consider Native TTL Features

Some databases (like Cassandra, Redis, and MongoDB) support native TTL. These systems automatically remove expired items. However, TTL enforcement may be lazy (triggered during read or compaction), so for strong guarantees, still add a read filter.

7. Sliding TTL and Race Conditions

For session or token renewal, you can refresh expiry time using:

UPDATE sessions SET expires_at = GREATEST(expires_at, NOW() + interval '30 minutes') WHERE id = session_id;

Use atomic updates or compare-and-swap to avoid overwriting newer expiry times.

Real-World Example

Netflix stores temporary streaming metadata in a Cassandra cluster with a 24-hour TTL. The application relies on Cassandra’s native TTL for automatic expiry but also adds a filter to ensure no stale metadata is returned. For user sessions, Netflix uses a sliding TTL model—each time a session is validated, the TTL resets. Expired sessions are eventually compacted out without impacting read performance.

Common Pitfalls or Trade-offs

Client-side expiry: Dangerous due to clock drift. Always compute expires_at server-side.
Large delete batches: Can cause lock contention or I/O spikes. Use throttled cleanup.
Replica lag: Expired data might still appear on replicas. Add filters for correctness.
Lazy compaction: Some databases only purge during compaction, which delays space reclamation.
Over-refreshing TTL: Sliding TTL on popular keys can lead to write storms.

Interview Tip

Interviewers might ask, “How would you ensure expired sessions are not accessible during replica lag?” You can answer: “I’d enforce expiry both at the application layer and the read path (WHERE expires_at > NOW()), ensuring expired data is never returned. Physical deletion can be asynchronous to reduce write pressure.”

Key Takeaways

Always compute TTL server-side using a trusted clock.
Combine logical (read-time) and physical (delete-time) expiry for safety.
Use partitions for efficient bulk expiry.
Throttle deletion jobs to avoid performance spikes.
Never rely only on lazy compaction or cache TTL for correctness.

Table of Comparison

Approach	Enforcement Mechanism	Space Reclamation Method	Consistency Strength	Cost and Complexity	Best Use Case
Native TTL (Cassandra, Redis)	Engine auto-deletes expired data	Automatic via compaction/GC	Weak on replicas (lazy purge)	Low	Session tokens, cache, ephemeral keys
Manual `expires_at` column	Query filter `expires_at > NOW()`	Batch deletion by sweeper job	Strong, if enforced in queries	Moderate	Application data with correctness needs
Partition-based expiry	Partition boundaries filter data	Drop partitions periodically	Very strong	Low for bulk ops	Logs, metrics, analytics tables
Soft delete (tombstones)	Logical flag `deleted_at IS NULL`	Later vacuum or compaction	High read cost, recoverable	Medium	Auditable or recoverable datasets
Cache-only TTL	Cache evicts after TTL	Auto-eviction	Weak (DB still holds data)	Low	Temporary performance cache

FAQs

Q1. What is the difference between soft TTL and hard TTL?

Soft TTL allows serving slightly stale data while triggering background refresh, whereas hard TTL makes data instantly invalid after expiry.

Q2. How do I avoid load spikes from TTL deletions?

Use throttled batch jobs and incremental partition drops. Deleting millions of rows in one go can overload storage I/O.

Q3. Why not rely only on cache TTL?

Cache expiry only affects the in-memory layer. The database will still store expired data, which could violate compliance or correctness rules.

Q4. How does replication affect TTL behavior?

Replicas might lag behind the primary and show expired data temporarily. Always add expires_at > NOW() checks during reads to stay safe.

Q5. What is the most efficient way to handle large-scale expiry?

Time-based partitions. Dropping an old partition is far cheaper than row-by-row deletion.

Q6. What TTL approach is best for compliance-driven retention?

Use absolute TTL with enforced read filters and partition-level expiry to ensure deterministic deletion and auditability.

Further Learning

For a deeper dive into lifecycle management, caching, and cleanup mechanisms, explore Grokking System Design Fundamentals.

If you want to master large-scale expiry handling in distributed databases, see Grokking Scalable Systems for Interviews.

For advanced interview preparation on database internals and design trade-offs, check out Grokking the System Design Interview.