How do you design time partitioning (by day/hour) for large datasets?
Per tenant encryption at rest with BYOK (Bring Your Own Key) ensures every customer owns and controls their own encryption key while your system manages secure data storage and retrieval. This design not only improves trust and compliance but also strengthens isolation between tenants in a multi tenant SaaS platform.
Introduction
Bring Your Own Key allows each customer to supply or control their encryption key through their own cloud KMS or HSM. Your application encrypts data using tenant specific data keys, which are themselves wrapped with the customer’s key. This model enforces strong boundaries between tenants, enabling immediate revocation or rotation of keys when needed.
Why It Matters
BYOK enhances customer trust, supports enterprise compliance, and improves security posture by delegating key ownership to tenants. It’s frequently discussed in system design interviews because it demonstrates understanding of key hierarchies, data control, and how to balance performance with security in distributed systems.
How It Works (Step-by-Step)
1. Key Hierarchy
- The tenant manages a root key in their KMS or HSM.
- Your system generates a unique data key for encrypting the tenant’s data.
- The data key is wrapped (encrypted) using the tenant’s root key and stored securely with metadata.
2. Tenant Onboarding
- Tenant registers their key reference (KMS ARN or URI).
- Ownership verification occurs via a challenge decrypt or signed attestation.
- Only key references are stored—never plaintext keys.
3. Write Path
- Generate or retrieve a data key for the tenant.
- Encrypt data using AES 256 GCM (object level) or AES 256 XTS (disk level).
- Wrap the data key using the tenant’s root key and store it with the encrypted data.
- Record audit logs for traceability.
4. Read Path
- Retrieve the encrypted data and its metadata.
- Check in memory cache for unwrapped data keys (short TTL).
- If missing, call tenant’s KMS to unwrap.
- Decrypt and verify data before returning it.
5. Rotation and Rewrap
- Data key rotation: Generate a new data key for future writes.
- Root key rotation: Rewrap existing wrapped keys using the new root key.
- Use lazy rewrap—perform rewrap on read to avoid downtime.
6. Revocation and Offboarding
- Tenant disables or deletes the root key.
- Your system must fail closed—denying reads and evicting caches immediately.
- Recovery only occurs if the tenant restores decrypt permissions.
7. Performance Optimization
- Cache unwrapped data keys securely in memory.
- Use short TTLs and purge on key rotation events.
- Batch KMS requests and asynchronous rewrap jobs to minimize latency.
8. Multi Region Handling
- Store encrypted data globally but decrypt only in the same region as the tenant key.
- Use local KMS calls for compliance and latency efficiency.
- Implement safe fallback for regional outages using cached keys with strict TTL.
9. Auditing and Access Control
- Tie encrypt/decrypt permissions to service roles scoped per tenant.
- Maintain immutable logs for every cryptographic action with key ID and tenant ID.
Real World Example
Consider a SaaS file storage service like Dropbox Business. Each customer manages their own key through AWS KMS or Azure Key Vault. Files are encrypted using tenant data keys, and metadata includes the wrapped data key and tenant ID. When the tenant rotates or revokes their KMS key, new files use updated keys, and old ones become unreadable until rewrap. This model satisfies strict compliance (GDPR, HIPAA) while allowing immediate access control by the tenant.
Common Pitfalls or Trade-offs
Too Many KMS Calls Using a new data key per file increases latency and cost. Use per tenant or per shard data keys with caching and rotation policies.
Improper Tenant Binding Not validating tenant ID against the key reference can lead to cross tenant data exposure. Always enforce binding at encryption and decryption.
Incomplete Encryption Coverage Ensure backups, logs, and analytics pipelines are encrypted with tenant keys. Missing these leads to compliance violations.
Region Drift Replicating data to regions where the tenant key is unavailable breaks decrypt operations. Validate regions during writes.
Fail Open Caching Long TTL caches may still decrypt after revocation. Keep TTLs short and purge on policy changes.
Interview Tip
If asked “How would you let a tenant immediately revoke your access?”, explain that decrypt operations depend on the tenant root key. Revoking decrypt permission locks the provider out instantly. Mention that you cache unwrapped keys with very short TTLs and log all decrypt attempts for audit visibility.
Key Takeaways
-
BYOK enables true tenant control with strong compliance guarantees.
-
Use envelope encryption and strict key hierarchy separation.
-
Cache securely and minimally for performance.
-
Always validate tenant ID to prevent cross tenant leaks.
-
Plan for rotation, rewrap, and revocation workflows.
Table of Comparison
| Approach | Who Controls Root Keys | Latency | Revocation Power | Complexity | Ideal Use Case |
|---|---|---|---|---|---|
| Provider Default Encryption | Provider | Lowest | Provider Only | Low | Small tenants or basic compliance |
| BYOK per Tenant (Cloud KMS) | Tenant | Low to Medium | Tenant Controlled | Medium | Enterprise SaaS with isolation needs |
| Hold Your Own Key (External HSM) | Tenant (External HSM) | Medium to High | Tenant Absolute | High | Regulated industries (Finance, Healthcare) |
| Application Level Encryption | Shared or Tenant App Keys | Medium | Partial | High | Sensitive field protection |
| Per Object Data Keys | Provider | Low | Provider Only | Medium | Workloads needing frequent rotation |
FAQs
Q1. What is BYOK for encryption at rest?
It’s a model where each tenant manages their own encryption key in a KMS or HSM, while your service uses envelope encryption to wrap data keys under that tenant key.
Q2. How does BYOK differ from provider-managed encryption?
Provider-managed encryption uses one shared provider key. BYOK isolates each tenant’s key, allowing independent rotation, revocation, and compliance boundaries.
Q3. What happens when a tenant rotates their key?
New data is encrypted with the new key version, while old keys are rewrapped lazily or through batch rewrap jobs without re-encrypting plaintext data.
Q4. Can BYOK work across multiple regions?
Yes, as long as data decryption happens in the same region as the tenant’s key. Multi-region replication should keep ciphertext consistent and compliant.
Q5. How do you reduce latency with BYOK?
Cache unwrapped data keys securely in memory with short TTL, batch KMS requests, and avoid frequent rewraps.
Q6. How can you demonstrate compliance to auditors?
Use structured audit logs for every encrypt and decrypt event with tenant ID, key ID, and timestamp, ensuring immutability and verifiability.
Further Learning
Deepen your understanding of security, encryption, and scalability patterns in our Grokking System Design Fundamentals course. To explore real-world SaaS architectures and multi-tenant encryption designs, check out Grokking Scalable Systems for Interviews.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78