How would you implement tenant‑aware encryption (envelope encryption, BYOK)?

Tenant aware encryption is a pattern where every customer of a multi tenant platform gets its own cryptographic boundary. You encrypt data with a short lived data key and then wrap that data key with a tenant specific master key. This is called envelope encryption. If an enterprise supplies and controls its own master key you get BYOK. The result is strong isolation, fast decryption at runtime, and clear control for audits and revocation.

Why It Matters

Interviewers love this topic because it combines security, storage, and reliability in one design. In production you need it for least privilege, breach blast radius reduction, and compliance goals like data residency and customer control. It lets a customer rotate or revoke access without forcing a full migration of your storage. It also enables premium plans where security sensitive tenants bring their own keys while smaller tenants use platform keys managed by your KMS.

How It Works Step by step

Actors and key hierarchy

Application services read and write tenant data.
A crypto service mediates every encrypt and decrypt.
A KMS or HSM holds master keys and enforces policy.
Hierarchy: a service root key protects a per tenant key encryption key, which then wraps per item data encryption keys.

Write path

Resolve tenant identity and authorization for the crypto call. Derive associated data that will be bound to the ciphertext, for example tenant id, table, and record id.
Generate a random data encryption key, AES GCM 256 is a solid default. Never reuse a nonce for the same key.
Encrypt the payload with the data key and a unique twelve byte nonce. Include the associated data so tampering with metadata is detectable.
Wrap the data key using the tenant key in KMS. Persist only the wrapped key, the KMS key id, the nonce, and the ciphertext plus tag. The plaintext data key must be zeroed from memory.
Emit an audit event that includes who called, which tenant key was used, and why.

Read path

Fetch the record. It includes the wrapped data key, the nonce, the KMS key id, and the ciphertext.
Ask KMS to unwrap the data key. KMS checks policy, logs the access, and returns the data key.
Decrypt the payload with AES GCM, verify the tag with the same associated data, and zero the data key from memory. Optionally cache the unwrapped data key for a very short time within a process safe cache that never swaps to disk.

BYOK variants

Imported key. The tenant generates a master key in its HSM and imports it into your KMS. You reference it with an alias but cannot export it later.
External key manager also called hold your own key. Your crypto service calls an external key portal controlled by the tenant. The portal applies policy and returns a one time use wrap or unwrap. Expect higher latency and design timeouts plus graceful degradation.

Rotation

Tenant master key rotation. Mark a new active version. Rewrap existing wrapped data keys lazily on read, or run a background job that rewrites envelopes. Your data stays encrypted at rest the whole time.
Data key rotation. Generate fresh data keys for new writes. Reencrypt older objects only if risk or policy requires it.

Caching and performance

Use an in process LRU cache for unwrapped data keys with sub second TTL. Pin the memory and wipe on eviction. This reduces KMS calls while keeping the window tiny.
Batch KMS operations when rewrapping at scale and use concurrency limits to avoid KMS throttling.

Multiregion

Create region local tenant keys so decrypt does not cross regions. Keep key aliases consistent across regions and maintain a control plane to create and rotate them in lockstep. Store region in the envelope header.

Policy and audit

Every unwrap requires a service identity scoped to the tenant. Bind policy to tenant id and environment. Send all events to an immutable audit log for later review.

Real World Example

Consider a global marketplace that stores orders for many businesses. Small merchants use a platform managed tenant key. Large enterprises choose BYOK. When order data is written the service generates a fresh data key, encrypts the order, and wraps that key with the tenant master key. A month later an enterprise rotates its key. New writes use the new version immediately, while a background rewrap job updates older envelopes. If that enterprise leaves the platform and requests revocation you disable decrypt permission on their key. The data remains safely unreadable without deleting the storage, which is exactly the control auditors expect.

Common Pitfalls or Trade offs

Nonce reuse in GCM. Reusing a nonce with the same key breaks confidentiality. Use a random or counter based nonce strategy that guarantees uniqueness per data key.
Unbounded KMS calls. Per record unwraps can crush latency and cost. Use short in process caches, batch rewraps, and compress round trips where the KMS allows it.
Over rotating. Frequent full reencrypt cycles can be expensive. Prefer lazy rewrap with background migration and rotate on a clear schedule.
Policy drift across regions. A tenant key may exist in one region but not another. Keep a control plane that reconciles keys, aliases, and grants.
BYOK expectation gap. BYOK improves control and audit but does not make your provider blind if decrypt happens in your process. If a tenant requires maximum control consider external key management with server side decrypt inside the tenant boundary, at the cost of latency and availability risks.
Losing track of envelopes. Always store a compact header that includes algorithm, KMS key id, key version, nonce, and a checksum of associated data.

Interview Tip

A favorite prompt is this. An enterprise rotates its BYOK master key at noon and demands that no decrypt of old data happens with the retired version after that minute. Explain how you will satisfy this with minimal downtime. A strong answer mentions a cache TTL shorter than the notice window, an allow list of active key versions in the crypto service, immediate rewrap for hot partitions, and lazy rewrap for the long tail with strict enforcement at decrypt time.

Key Takeaways

Envelope encryption isolates data per tenant and limits breach blast radius.
BYOK gives customers lifecycle control through rotation, disable, and audit.
Store only wrapped data keys next to ciphertext and never store plaintext keys.
Use AES GCM with unique nonces and bind metadata through associated data.
Control cost and latency with short lived key caches and batch rewrap jobs.

Table of Comparison

Approach	Who Controls Master Key	Blast Radius	Latency	Rotation Complexity	Revocation Power	Typical Fit
Single Shared Platform Key	Provider	Largest	Lowest	Simple	Weak	Internal tools, low-risk data
Per-Tenant Key in Provider KMS	Provider	Small	Low	Moderate	Good	Most SaaS platforms
BYOK (Imported into KMS)	Customer	Small	Low	Moderate	Very Good	Enterprise SaaS plans
External Key Manager (Customer-Held)	Customer	Small	High	High	Strongest	Regulated or government workloads
Per-Object Data Key (Single Root Key)	Provider	Medium	Low	Simple	Limited	Content-heavy services

FAQs

Q1. What is envelope encryption and why is it used?

Envelope encryption uses a fast symmetric data key to encrypt the payload and then wraps that data key with a master key. It gives performance at scale and clear control of key lifecycle.

Q2. How does BYOK differ from provider managed keys?

With BYOK the customer supplies or owns the master key. You can import it into your KMS or keep it external. This enables customer driven rotation and revocation while the provider focuses on safe implementation.

Q3. How often should tenant master keys be rotated?

A common guidance is at least once per year or after a sensitive event such as staff changes. Many enterprises prefer quarterly. Use lazy rewrap to spread cost while keeping new writes on the latest version.

Q4. Does BYOK stop the cloud provider from ever seeing my data?

Not by itself. If decryption happens inside the provider application then runtime access still exists. For maximal control keep the master key external and design server side decrypt in a tenant boundary under tenant policy.

Q5. What happens if a tenant disables or deletes its key?

Decrypts will fail. Your application must show a clear error and provide a path for re enable or key replacement. Never build an escape hatch that bypasses tenant control.

Q6. How do I handle backup and restore with tenant aware encryption?

Backups contain only ciphertext and wrapped keys. Restores require the same tenant keys and policies. Test disaster recovery by restoring into an isolated environment with read only access to ensure you can decrypt.

Further Learning

To master encryption design patterns and secure scalable architecture, explore these in-depth courses from DesignGurus.io:

Grokking the System Design Interview: Learn how encryption, isolation, and compliance fit into end-to-end architecture in real system design interviews.
Grokking Scalable Systems for Interviews: Dive deeper into multi-tenant security, KMS design, and scalability challenges in distributed systems.

Both courses include practical examples, diagrams, and interview exercises to help you design secure, multi-tenant systems confidently.