How do you enforce GDPR deletes in denormalized caches?

When a user requests GDPR deletion, the most complex part is cleaning denormalized caches that store user data in multiple derived forms. This guide shows how to systematically enforce GDPR deletes across such caches in scalable distributed systems — a topic often discussed in advanced system design interviews.

Introduction

Denormalized caches store precomputed or derived data to improve performance. Examples include materialized feed entries, profile summaries, or search indexes. Under GDPR, when a user requests data deletion, every trace of that user must be erased — not only from primary storage but also from these derived caches. Failing to do so violates privacy compliance and can lead to serious data leaks.

Why It Matters

GDPR compliance is not optional. Cached or derived data can re-expose deleted information through user feeds, search results, or CDN snapshots. For engineers, this topic demonstrates awareness of privacy-driven architecture, eventual consistency challenges, and cache invalidation.

How It Works (Step by Step)

1. Create a Privacy Subject Record Maintain a single identifier for the user (subject ID) across systems. Use an idempotent request ID to track progress across caches and ensure repeatable operations.

2. Emit a Deletion Event Trigger an asynchronous “user delete” event in your data pipeline. All systems that store denormalized copies must subscribe to this event.

3. Maintain a Reverse Index of Cache Keys During writes, tag cached data with the user’s subject ID. This allows targeted purging later. Example: user:1234 → [feed:567, profile:1234, search:query_1234].

4. Apply Immediate Serving Guardrails Add the user ID to a denylist at the API or CDN layer to block further exposure while caches are being cleaned.

5. Execute Targeted Cache Purges Use the reverse index to delete related cache entries. This includes:

In-memory stores: Delete by key or tag.
Search indexes: Delete or reindex documents.
Feed stores: Remove or recompute affected shards.
CDNs: Purge by tag or URL pattern.

6. Prevent Resurrection Introduce tombstones — markers in storage that prevent deleted data from being rehydrated by background jobs or replication streams.

7. Verify and Audit Log each purge operation and verify deletion through automated scans or sample queries. Generate a compliance audit record.

Real-World Example

At Instagram, when a user deletes their account, the system must remove data from profile caches, follow graphs, story thumbnails, and search autocomplete. The primary service publishes a “GDPR_DELETE” event containing the user’s ID. Downstream systems (feed, cache, search) consume this event, delete all references using their reverse index, and insert a tombstone to prevent regeneration. Within minutes, the data becomes unservable, and verification jobs confirm removal across regions.

Common Pitfalls or Trade-offs

1. Missing Reverse Index Without key tagging, deletion becomes a time-consuming full scan.

2. Resurrection Through Backfills Data may reappear through asynchronous rebuilds or sync jobs unless tombstones are enforced.

3. Soft Delete Without Guardrails If you only mark data as deleted, caches might still serve it until expiration.

4. Over-deletion Deleting too broadly (e.g., shared cache entries) may degrade performance or remove unrelated data.

5. Incomplete Verification Without periodic scans, ghost data may persist in secondary replicas or metrics stores.

Interview Tip

Interviewers often ask, “How would you ensure GDPR deletes propagate across caches?” A strong answer: “I’d design a reverse index mapping user IDs to cache keys, emit deletion events to downstream consumers, enforce immediate denylist at read path, and maintain tombstones to prevent regeneration.”

Key Takeaways

Maintain a reverse index to track cache dependencies.
Use deletion events to notify all denormalized stores.
Add denylist filters to block reads during propagation.
Apply tombstones to prevent accidental resurrection.
Audit and verify deletion for compliance.

Table of Comparison

Approach	Speed	Accuracy	Complexity	Resurrection Risk
TTL-based expiry	Slow	Low	Low	High
Targeted purge via reverse index	Fast	High	Medium	Low
Full cache flush	Fast	Medium	High	Medium
Read-path denylist	Instant	High	Medium	None
Tombstone registry	Moderate	High	Medium	Very Low

FAQs

A reverse index maps user IDs to all cache keys or objects that reference them. It enables precise, fast deletion without scanning entire caches.

Ideally, online caches should be cleared within minutes, while long-term analytics stores can be updated within the 30-day regulatory window.

Q3. How do I ensure deleted data doesn’t reappear?

Use tombstones and ensure that background jobs or replication systems check for them before rehydrating data.

Yes. Any layer that stores or serves personal data, including CDNs, search clusters, and metrics, must participate in the deletion workflow.

Q5. What is the role of an edge denylist?

It provides instant protection by blocking any response containing deleted user data during the purge process.

Q6. How can I verify deletion?

Run automated probes to confirm that cache reads for the deleted user return no data. Keep logs for compliance audits.

Further Learning

Learn more about building privacy-compliant, scalable architectures in Grokking Scalable Systems for Interviews.
Strengthen your fundamentals of caching and data consistency with Grokking the System Design Interview.