How do you run per‑tenant KMS keys and isolate encryption domains?

Per tenant KMS keys sound scary at first, but the core idea is simple. In a multi tenant SaaS system you want each customer to live in its own cryptographic world. That world is called an encryption domain. Per tenant KMS design means you give every tenant its own key or even its own key hierarchy so that a mistake or breach in one tenant does not help an attacker decrypt data from any other tenant.

This topic sits right at the intersection of security architecture, scalable systems, and real world compliance. It is a favorite area for senior level system design interviews because it forces you to think clearly about boundaries, key lifecycle, performance, and blast radius.

Why it Matters

Per tenant keys and isolated encryption domains matter for several reasons in production systems and in system design interviews.

Strong blast radius control. If a key leaks, only one tenant is affected rather than your entire fleet.
Compliance needs. Many regulated customers expect their own keys or even bring your own key setups, sometimes with keys in their own cloud account.
Clear security stories. It is much easier to explain to auditors that no tenant can decrypt another tenant data because there is simply no shared key.
Operational flexibility. You can rotate, revoke, or escrow keys for one tenant without disturbing others.
Interview strength. When you talk about per tenant KMS keys in a system design interview you signal that you understand isolation beyond just tables and schemas. You understand cryptographic blast radius and key domains.

When you combine this with scalable architecture patterns and distributed systems concepts like multi region replication, this becomes an advanced yet very practical topic.

How it Works Step by Step

At a high level you are solving two related problems.

Give each tenant an independent encryption domain.
Make it efficient and safe for your services to use the right key at the right time.

You usually solve this with three building blocks.

A key management service such as AWS KMS, GCP KMS, or Azure Key Vault.
Envelope encryption with data keys protected by tenant specific KMS keys.
Strong metadata to bind ciphertext to a tenant and a key version.

Let us walk through a common design.

Step 1 Identify the encryption domain

First decide what one encryption domain means in your system. Often it is one tenant, but it could also be one tenant per region or one tenant per product line.

You decide that:

Each tenant will have a tenant id used everywhere in storage and in KMS.
Every ciphertext will carry a tenant id and a key id or key version.

This alignment between application identity and encryption domain is the foundation for everything else.

Step 2 Provision per tenant KMS keys

For each tenant you create a KMS key or key alias. Common patterns.

One symmetric KMS key per tenant for general encryption.
Optional extra keys per tenant for special purposes such as signing, payment tokens, or regulated data.
Tenant specific KMS key policy that restricts which services can use this key and in which operations.

You also store the mapping between tenant id and KMS key id in a metadata store. This can be a configuration database, a dedicated key registry, or even KMS aliases that embed the tenant id in the alias name.

Step 3 Use envelope encryption for data

You typically do not encrypt large blobs directly with the KMS key. Instead you use envelope encryption.

For each object or record that needs encryption.

Generate a fresh data encryption key in your service, often via a KMS generate data key operation tied to the tenant KMS key.
Use the data key to encrypt the payload locally, for example with AES GCM.
Store the ciphertext plus a small header that includes tenant id, KMS key id or alias, and data key version.
Store the data key itself encrypted by the tenant KMS key, often returned by the KMS generate operation.

On read:

Read the ciphertext and header.
Use the tenant id to look up the correct KMS key.
Ask KMS to decrypt the wrapped data key using the tenant KMS key.
Use the plaintext data key to decrypt the payload, then discard the plaintext data key from memory after use or short caching.

This way, even if your storage layer leaks, an attacker still needs the tenant specific KMS key to decrypt anything.

Step 4 Enforce policy in KMS

KMS policy is where you really isolate encryption domains. You configure each tenant key so that:

Only specific microservices can call encrypt or decrypt for that tenant id.
Optional conditions ensure that the caller passes the correct tenant id and purpose in request context.
Logging and key usage metrics are enabled per key so you can audit each tenant independently.

For bring your own key setups, the tenant KMS key may live in the tenant cloud account. You configure cross account access so that your service can use only the specific KMS key for that tenant, and nothing else. The encryption domain is now split across two accounts, which is even stronger from the tenant perspective.

Step 5 Handle key rotation and revocation

Key rotation is per tenant.

You periodically create a new KMS key version for the tenant.
New data uses the new key version, while existing ciphertext still points to old versions.
At low traffic times you can migrate old data in the background, re encrypting with the new key, but this is optional if KMS supports decryption with older versions.

If you must revoke a tenant:

Disable or schedule deletion of that tenant KMS key.
Data is still present in storage but is not decryptable anymore.
This is a very strong form of logical deletion that some privacy regulations appreciate.

Real World Example

Take a multi tenant analytics SaaS where you ingest events from hundreds of companies into one large data lake and serve dashboards. All tenants share the same storage cluster and compute layer, but you want strong guarantees that data from Company A can never be decrypted using keys from Company B.

A common design:

One KMS key per tenant in your cloud account, or optionally a key per tenant in the tenant own account for bring your own key.
Events arrive tagged with tenant id. Before storing them, ingestion service encrypts each batch with a fresh data key generated under that tenant KMS key.
The encrypted data key and tenant id travel with the data into the lake and onward into columnar storage, cold storage, and possibly log based replication.
When a query for a tenant dashboard runs, the query engine fetches the encrypted data keys, calls KMS only with the correct tenant KMS key, then decrypts local fragments in memory.

Now imagine that someone mistakenly configures a query to read a shard that contains mixed tenant data. The worst that happens is that decryption will fail for shards that do not match the tenant KMS key, because the keys simply do not match the ciphertext header. This is a practical way to enforce cryptographic boundaries across very large shared infrastructure.

Common Pitfalls or Trade offs

Overusing keys or underusing keys If you use one single KMS key for every tenant you lose most of the isolation benefits. On the other hand, if you create too many KMS keys per tenant for every tiny use case, you increase cost and operational complexity. A good default is one main encryption key per tenant plus extra keys only for clearly distinct domains such as separate compliance zones.

Ignoring performance and KMS limits KMS services have rate limits and per operation latency. If you decrypt a fresh data key for every single small object on the hot path you can create a bottleneck and higher costs. You usually cache decrypted data keys in memory for a short time window per tenant, or design batch operations that reuse a key for many rows.

Weak metadata binding If ciphertext does not carry a reliable tenant id and key id, services can accidentally try to decrypt with the wrong key. Always embed tenant id and key version in the header stored next to the ciphertext, and validate that combination before decrypting.

Inconsistent policy across regions In distributed systems with multi region replication, it is easy to forget that KMS keys are often regional. If you copy ciphertext across regions, either keep the same encryption domain with cross region keys or clearly split domains by region and re encrypt on arrival. Be explicit in the design.

Complicated bring your own key flows Cross account KMS, external customer HSMs, and on premise keys add a lot of moving parts. These setups are powerful for enterprise customers but increase failure modes. In an interview, acknowledge the complexity and call out good monitoring and fallback plans.

Per tenant KMS works very well when you care deeply about tenant isolation and blast radius and you are willing to pay a bit more effort in key provisioning and metadata plumbing. It is not ideal for simple internal tools or tiny services where a single application key is enough and operational simplicity is more important.

Interview Tip

A common senior level system design interview question looks like this.

You are designing a multi tenant SaaS for financial data. Tenants want guarantees that their data is cryptographically isolated from each other, and some want to use keys they control. How would you design key management using KMS

A strong answer would:

Start by defining the encryption domain as per tenant and possibly per region.
Propose one KMS key per tenant, with envelope encryption for data.
Explain how ciphertext carries tenant id and key version.
Mention KMS policies that scope access to specific services and tenant context.
Describe rotation and revocation per tenant.
Mention optional bring your own key support via cross account KMS.

If you can talk through these points clearly and briefly, you are already above the average candidate for system design interview performance on this topic.

Key Takeaways

Per tenant KMS keys create separate encryption domains so that each tenant lives in its own cryptographic world.
Envelope encryption with data keys wrapped by tenant KMS keys is the standard way to scale this design.
Strong metadata binding between ciphertext, tenant id, and key version is critical for safety and debuggability.
KMS policies and cross account setups are where you enforce real isolation and enable bring your own key scenarios.
Performance, cost, and operational complexity must be managed through caching, batching, and clear key lifecycle processes.

Table of Comparison

Key strategy	Security isolation	Operational complexity	Cost profile	Typical use case
Single shared KMS key for all tenants	Lowest. One key compromise affects all tenants	Very simple to manage	Lowest direct KMS cost	Internal tools, low risk data, early prototypes
Per tenant KMS key in provider account	High. Blast radius limited to one tenant key	Moderate. Requires tenant key registry and rotation	Moderate. More keys and KMS calls	Standard SaaS with strong isolation requirements
Per tenant KMS key in tenant account (BYOK)	Very high. Tenants fully control their key domain	High. Cross account setup and onboarding complexity	Higher operational overhead	Enterprise customers with strict compliance or security demands

FAQs

Q1. What is a per tenant KMS key model in system design?

It is a key management pattern where each tenant in a multi tenant SaaS gets its own KMS key or key hierarchy. Application data for that tenant is encrypted using keys scoped to that tenant, so a compromise or misconfiguration for one tenant key does not let anyone decrypt another tenant data. This is a strong architecture choice for security focused system design interviews.

Q2. How do per tenant KMS keys improve multi tenant isolation?

Per tenant keys create separate encryption domains. Even if an attacker gains access to storage or one tenant key, ciphertext for other tenants is still protected by different keys and by KMS policies. Combined with good access control and logging, this greatly limits blast radius in distributed systems that host many customers on shared infrastructure.

Q3. Is per tenant key management expensive in terms of performance?

It can be if you call KMS for every record on the hot path. In practice, you use envelope encryption and short lived caches for decrypted data keys so that KMS is only used to wrap and unwrap data keys occasionally. You also monitor KMS latency and rate limits and design your system to batch requests where possible. For most real world system design interview scenarios, this cost is acceptable compared to the security benefits.

Q4. When should I consider bring your own key on top of per tenant keys?

You add bring your own key when enterprise customers want direct control of keys in their own cloud accounts. Your service uses cross account KMS access to perform encryption and decryption on their behalf. This is common for financial services, healthcare, and regulated industries and is a good enhancement to mention in senior system design interview answers.

Q5. How do I rotate per tenant KMS keys without downtime?

You create a new key version for each tenant and start encrypting new data with the new version while leaving old data encrypted under the old version. KMS usually supports decryption with older versions, so reads continue to work. In the background you can gradually re encrypt older data. The important part is to store key version metadata with every ciphertext so your services know which version to use.

Q6. How does per tenant KMS key design interact with multi region or cross zone deployments?

KMS keys are often regional, so you decide whether one encryption domain spans multiple regions or is region specific. You can either create a separate key per region and re encrypt when data crosses regions, or use multi region keys where supported. In a system design interview, call out that you must align tenant boundaries, regional replication strategy, and KMS key layout to avoid surprises.

Further Learning

If you want a structured path through topics like encryption domains, key hierarchy, and other security conscious patterns for scalable architecture, take a look at the advanced scenarios in Grokking Scalable Systems for Interviews.

If you are still building your base in system design interview concepts such as multi tenant data models, distributed systems basics, and high level architecture patterns, start with Grokking System Design Fundamentals and then layer per tenant KMS design on top.