How would you implement soft vs hard TTL for GDPR deletion?

Soft TTL and hard TTL are practical patterns to honor GDPR erasure while keeping systems safe and consistent. Soft TTL hides or masks personal data immediately so the user experience reflects deletion now, while a background clock counts down to a final purge. Hard TTL performs the irreversible purge and proves that no readable copy remains across primary stores, caches, search, logs, and backups. Together they give you fast compliance signals with reliable follow through at scale.

Why It Matters

GDPR requires prompt and verifiable deletion of personal data. Product teams need fast user visible deletion. Platform teams need a defensible purge that reaches every replica and derivative store. Interviewers love this topic because it mixes privacy by design, data lifecycle, multi region replication, and failure handling. A clean design shows you can balance correctness, latency, and operational risk in a distributed system.

How It Works Step by Step

Concepts Soft TTL means immediate suppression of personal data from all reads plus a scheduled final purge time recorded in metadata. Hard TTL means the purge itself along with evidence that the data and keys are gone.

Step 1 Identity resolution Create a stable subject identifier like subject_id that maps the user to all rows, documents, objects, and index entries linked through a data catalog. This catalog must include primary storage, caches, search indices, data lake tables, streaming topics, and analytics models.

Step 2 Accept a deletion request Validate the request and issue a deletion_token with an idempotency key. Write a single truth record in a privacy_orchestrator table with fields such as subject_id, requested_at, soft_ttl_deadline, hard_ttl_deadline, and status.

Step 3 Apply soft TTL immediately

Primary database set is_erased true, store erased_at now and purge_at soft_ttl_deadline. All read paths must filter on is_erased and return either nothing or a masked surrogate.
Caches evict keys for the subject and block rehydration if the backing record is marked erased.
Search mark a tombstone document and trigger near term reindex of affected shards.
Object storage move pointers to a quarantine namespace with a lifecycle rule that enforces purge_at.
Streams publish a deletion event with the idempotency key so downstream services update their own soft TTL state.
Auth and personalization revoke tokens or embeddings tied to the subject. Do not recompute new features for the subject.

Step 4 Quarantine and suppression During the soft window, no feature computation or analytics job should consume quarantined data. If your platform mandates retention for security or fraud, quarantine keeps data encrypted and out of business use.

Step 5 Observe and verify soft state Emit metrics like percent_of_reads_suppressed and caches_evicted_for_subject. Store a compact audit log entry with subject_id and the actions taken. Include retriable error codes for any lagging subsystem.

Step 6 Trigger hard TTL At or before hard_ttl_deadline, the orchestrator fans out purge tasks. These tasks must be idempotent and monotonic. They physically delete database rows, index entries, cache entries, objects in storage, and any derived features. They update a deletion_ledger with a cryptographic digest of identifiers to prove later that the same items were purged.

Step 7 Handle backups and snapshots You cannot rewrite immutable point in time backups. Use crypto shredding by encrypting personal data with a subject scoped key. On hard TTL, destroy the key in the key management system. Any restored backup will contain ciphertext that is not recoverable for that subject. Maintain a restore policy that replays the deletion_ledger after any disaster recovery to avoid resurrection.

Step 8 Derived data and analytics For aggregated analytics replace rows that include the subject with recomputed results or subtractive corrections. If the data is safe after irreversible anonymization, document the transformation. Otherwise purge and recompute.

Step 9 Concurrency and propagation Use event driven propagation with at least once delivery. Each consumer must treat deletion events as high priority. Guard against race conditions by checking a deletion_version field on each record and only writing application updates if version is unchanged or the record is not erased.

Step 10 Prove it Provide a user facing deletion receipt that references the deletion_token and the time soft and hard actions completed. Internally keep minimal audit metadata without personal fields such as subject hash, timestamps, and action codes.

Real World Example

Think of an Instagram style platform. A user requests erasure. Within seconds the profile disappears and content no longer shows up in search. That is soft TTL. A privacy orchestrator updates databases, evicts caches, and writes tombstones to the media index so feeds cannot fetch the content. The system sets purge_at to a short window for safety checks and fraud investigations. When the clock hits purge time, media objects in blob storage are deleted, the user key in the key manager is destroyed, and a deletion receipt is issued. If the platform must restore from an old snapshot later, the destroyed key ensures that photos for that subject remain unreadable.

Common Pitfalls or Trade offs

Backups and snapshots Keeping personal data readable in immutable backups violates the spirit of erasure. Crypto shredding with subject scoped keys plus a replayable deletion_ledger is the clean path.
Search and caches Teams often delete the database row but forget to remove or tombstone the index and the cache. Reads will still surface data for hours. Make deletion a platform event that invalidates every layer.
Restoration risk Restoring a cluster without replaying the deletion_ledger can resurrect erased data. Bake the replay into the recovery runbook.
Eventual consistency gaps In multi region setups rely on tombstones and read time suppression so data stays hidden while you wait for hard purge to reach every replica.
Over retention in analytics If a model training set keeps the subject, your product can still infer private traits. Track lineage from source to model and either purge and retrain or apply a correction method.
Clock issues TTL logic tied to unsynchronized clocks causes surprise delays. Use a single time source and store both requested_at and effective_from to resolve ordering.

Interview Tip

Interviewers often ask how you would prove deletion while using immutable backups. Say you would use subject scoped encryption with keys in a hardened manager, perform soft suppression immediately, then destroy keys on hard TTL, and keep a deletion_ledger for replay after any restore. They may also probe how you prevent data from reappearing in caches and indices, so describe the tombstone pattern and cache rehydration guards.

Key Takeaways

Soft TTL makes data invisible fast.
Hard TTL finishes the purge and proves it across every store.
Crypto shredding plus a deletion_ledger closes the backup gap.
Idempotent fan out jobs and deletion events keep the system robust.
Read time suppression and tombstones block stale reads during propagation.

Table of Comparison

Approach	Primary Goal	User Visible Effect	Reversibility	Backup Safety	Common Use
Soft TTL	Immediate suppression	Data disappears from reads now	Possible before purge	Needs crypto keys intact	Fast compliance and quarantine
Hard TTL	Permanent purge	No copy remains in live systems	Not reversible	Unsafe unless crypto shredding	Final erasure and audit closure
Crypto Shredding	Defang immutable backups	Restored data becomes unreadable	Not reversible	Strong for snapshots	Backups and low-touch archives

FAQs

Q1 What is the difference between soft TTL and hard TTL for GDPR deletion?

Soft TTL hides data immediately and schedules a purge. Hard TTL performs the irreversible purge across primary stores, indices, caches, and derived data.

Q2 How fast should soft TTL apply to be GDPR friendly?

Aim for seconds. The moment a request is accepted, reads should return no personal data. Any lagging store must be guarded by read time suppression and tombstones.

Q3 How can I handle immutable backups during deletion?

Encrypt personal data with subject scoped keys and destroy those keys on hard TTL. Keep a deletion_ledger so any restore can replay erasures for live systems.

Q4 Do I need to retrain models after deletion?

If the subject meaningfully influenced a model or a dataset, remove their contribution and retrain or apply a correction technique. Document the lineage and action.

Q5 How do I prevent rehydration of erased data from caches or search?

Publish a deletion event and block cache fills for records with is_erased true. For search write tombstones and force a targeted reindex of affected shards.

Q6 What metrics prove that deletion is working at scale?

Track percent_of_reads_suppressed time_to_soft_ttl time_to_hard_ttl error_rate_by_subsystem and count_of_items_purged. Store minimal audit metadata for evidence.

Further Learning

Explore how deletion propagation and distributed consistency are managed in Grokking the System Design Interview to strengthen your understanding of reliable data workflows.
Dive deeper into scalable event-driven architectures and data lineage management with Grokking Scalable Systems for Interviews for advanced system design mastery.