How would you design multi‑tenant search indices without leakage?
Designing multi tenant search is about serving many customers from a shared engine while guaranteeing zero data leakage. The core idea is simple. Every document carries tenant identity and every query runs inside a server side sandbox that enforces tenant filters for hits, aggregations, highlights, and caches. The tricky part is making this airtight while keeping relevance, latency, and costs in control for a scalable architecture that can pass a system design interview.
Why It Matters
Multi tenant search is a common requirement in SaaS and marketplace systems. If a user from Tenant A can infer or view data from Tenant B through search results, facets, typeahead, or analytics, you have a breach. Leakage can happen even when top hits look fine because counts, suggestions, or highlight snippets might cross tenant boundaries. Interviewers ask this to test your grasp of distributed systems, access control, and performance trade offs under growth.
Real World Example
Imagine an online marketplace similar to Amazon where each seller is a tenant. A support agent at Seller Alpha searches for refund policy in tickets. The gateway authenticates the agent and sets tenant context to Alpha. The search service rewrites the query by injecting tenantId equals Alpha as a constant filter and routes the request using a hash of Alpha to keep shards balanced. Aggregation on ticket status and suggestion for refund phrase are computed inside the same filter, so counts and typeahead never reveal activity from Seller Beta. Query caches include tenant in their key, so a cached response for Beta cannot serve Alpha. If Seller Gamma pays for premium relevance with custom synonyms, Gamma either uses a dedicated index or joins a synonym family with compatible tenants.
How It Works (Step-by-Step)
-
Capture Tenant Context at the Edge
- Authenticate the user and resolve a canonical tenant ID in your gateway or API layer.
- Never trust a tenant ID from the client side. Use server-controlled tokens or identity claims.
-
Encode Tenant Identity at Index Time
- Add a mandatory
tenantIdfield to every document. - Compose the document ID as
tenantId:docIdto ensure uniqueness and simplify deletes.
- Add a mandatory
-
Partition and Route Data
- Siloed Indices: One index per tenant (high isolation, higher cost).
- Shared Index with Routing: Route documents by tenant ID for balance and scalability.
- Hybrid Approach: Small tenants share, large tenants get dedicated indices.
-
Server-Side Query Enforcement
- Wrap all queries in a filter like
tenantId == currentTenant. - Apply this filter to search results, aggregations, and autocomplete suggestions.
- Wrap all queries in a filter like
-
Cache Isolation
- Include
tenantIdin all cache keys. - Disable global warmers or pre-fetchers that lack tenant context.
- Include
-
Access Control and Security
- Enforce field-level and document-level security to protect sensitive tenant data.
-
Testing and Auditing
- Regularly test queries for leakage between tenants.
- Maintain logs of tenant-level queries and results for traceability.
Common Pitfalls or Trade-Offs
-
Trusting Client Input: Never rely on client-sent tenant IDs. This is the most common source of leakage.
-
Partial Filtering: Forgetting to apply tenant filters to aggregations, highlights, or suggestions leaks metadata.
-
Cache Pollution: Shared cache keys can expose one tenant’s results to another.
-
Over-Isolation: Too many small indices increase memory footprint and reduce cache efficiency.
-
Complex Analyzers: Per-tenant analyzers in a shared index can break mappings and relevance scoring.
-
Incomplete Testing: Without automated leakage tests, small edge cases can slip through unnoticed.
Table of Comparison
| Approach | Leakage Risk | Performance | Cost | Operational Complexity | Ideal Use Case |
|---|---|---|---|---|---|
| Silo per Tenant Indices | Very Low | Stable | High | High | Enterprise tenants needing strict compliance |
| Shared Index with Tenant Partition | Low (if filtered correctly) | Excellent | Low | Medium | Startups or multi-tenant SaaS platforms |
| Hybrid Approach | Low | Balanced | Balanced | Medium | Platforms with mixed tenant sizes |
Further Learning
-
Strengthen your foundations with Grokking System Design Fundamentals. Learn the core building blocks of indexing, partitioning, and scalability.
-
Master real-world scenarios in Grokking the System Design Interview. Practice building secure, multi-tenant systems under interview conditions.
-
Deep dive into scalable architectures in Grokking Scalable Systems for Interviews. Explore routing, caching, and performance trade-offs in distributed search systems.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78