How do you enforce GDPR deletes in denormalized caches?

When a user requests GDPR deletion, the most complex part is cleaning denormalized caches that store user data in multiple derived forms. This guide shows how to systematically enforce GDPR deletes across such caches in scalable distributed systems — a topic often discussed in advanced system design interviews.

Introduction

Denormalized caches store precomputed or derived data to improve performance. Examples include materialized feed entries, profile summaries, or search indexes. Under GDPR, when a user requests data deletion, every trace of that user must be erased — not only from primary storage but also from these derived caches. Failing to do so violates privacy compliance and can lead to serious data leaks.

Why It Matters

GDPR compliance is not optional. Cached or derived data can re-expose deleted information through user feeds, search results, or CDN snapshots. For engineers, this topic demonstrates awareness of privacy-driven architecture, eventual consistency challenges, and cache invalidation.

How It Works (Step by Step)

1. Create a Privacy Subject Record Maintain a single identifier for the user (subject ID) across systems. Use an idempotent request ID to track progress across caches and ensure repeatable operations.

2. Emit a Deletion Event Trigger an asynchronous “user delete” event in your data pipeline. All systems that store denormalized copies must subscribe to this event.

3. Maintain a Reverse Index of Cache Keys During writes, tag cached data with the user’s subject ID. This allows targeted purging later. Example: user:1234 → [feed:567, profile:1234, search:query_1234].

4. Apply Immediate Serving Guardrails Add the user ID to a denylist at the API or CDN layer to block further exposure while caches are being cleaned.

5. Execute Targeted Cache Purges Use the reverse index to delete related cache entries. This includes:

  • In-memory stores: Delete by key or tag.
  • Search indexes: Delete or reindex documents.
  • Feed stores: Remove or recompute affected shards.
  • CDNs: Purge by tag or URL pattern.

6. Prevent Resurrection Introduce tombstones — markers in storage that prevent deleted data from being rehydrated by background jobs or replication streams.

7. Verify and Audit Log each purge operation and verify deletion through automated scans or sample queries. Generate a compliance audit record.

Real-World Example

At Instagram, when a user deletes their account, the system must remove data from profile caches, follow graphs, story thumbnails, and search autocomplete. The primary service publishes a “GDPR_DELETE” event containing the user’s ID. Downstream systems (feed, cache, search) consume this event, delete all references using their reverse index, and insert a tombstone to prevent regeneration. Within minutes, the data becomes unservable, and verification jobs confirm removal across regions.

Common Pitfalls or Trade-offs

1. Missing Reverse Index Without key tagging, deletion becomes a time-consuming full scan.

2. Resurrection Through Backfills Data may reappear through asynchronous rebuilds or sync jobs unless tombstones are enforced.

3. Soft Delete Without Guardrails If you only mark data as deleted, caches might still serve it until expiration.

4. Over-deletion Deleting too broadly (e.g., shared cache entries) may degrade performance or remove unrelated data.

5. Incomplete Verification Without periodic scans, ghost data may persist in secondary replicas or metrics stores.

Interview Tip

Interviewers often ask, “How would you ensure GDPR deletes propagate across caches?” A strong answer: “I’d design a reverse index mapping user IDs to cache keys, emit deletion events to downstream consumers, enforce immediate denylist at read path, and maintain tombstones to prevent regeneration.”

Key Takeaways

  • Maintain a reverse index to track cache dependencies.

  • Use deletion events to notify all denormalized stores.

  • Add denylist filters to block reads during propagation.

  • Apply tombstones to prevent accidental resurrection.

  • Audit and verify deletion for compliance.

Table of Comparison

ApproachSpeedAccuracyComplexityResurrection Risk
TTL-based expirySlowLowLowHigh
Targeted purge via reverse indexFastHighMediumLow
Full cache flushFastMediumHighMedium
Read-path denylistInstantHighMediumNone
Tombstone registryModerateHighMediumVery Low

FAQs

Q1. What is a reverse index for GDPR deletes?

A reverse index maps user IDs to all cache keys or objects that reference them. It enables precise, fast deletion without scanning entire caches.

Q2. How fast should GDPR deletes propagate?

Ideally, online caches should be cleared within minutes, while long-term analytics stores can be updated within the 30-day regulatory window.

Q3. How do I ensure deleted data doesn’t reappear?

Use tombstones and ensure that background jobs or replication systems check for them before rehydrating data.

Q4. Are CDNs and search indexes part of GDPR scope?

Yes. Any layer that stores or serves personal data, including CDNs, search clusters, and metrics, must participate in the deletion workflow.

Q5. What is the role of an edge denylist?

It provides instant protection by blocking any response containing deleted user data during the purge process.

Q6. How can I verify deletion?

Run automated probes to confirm that cache reads for the deleted user return no data. Keep logs for compliance audits.

Further Learning

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
What are functional and non-functional requirements?
Functional and non-functional requirements in context of system design interview.
When should you choose CQRS over CRUD, and why?
Understand when CQRS beats CRUD. Learn how Command Query Responsibility Segregation improves scalability, read performance, and system flexibility in modern distributed architectures.
What is the hardest question in an interview?
Paxos vs Raft vs ZAB: choosing a consensus protocol.
Paxos vs Raft vs ZAB for system design interview success. Clear comparison, concrete steps, real world guidance, trade offs, FAQs, and a crisp table to help you choose the right consensus protocol for a scalable architecture.
Is C++ an object-oriented programming language?
Highlighting transferable leadership qualities from past roles
Related Courses
Course image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
4.6
Discounted price for Your Region

$197

Course image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
3.9
Discounted price for Your Region

$78

Course image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
4
Discounted price for Your Region

$78

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.