How would you design multi‑tenant search indices without leakage?

Designing multi tenant search is about serving many customers from a shared engine while guaranteeing zero data leakage. The core idea is simple. Every document carries tenant identity and every query runs inside a server side sandbox that enforces tenant filters for hits, aggregations, highlights, and caches. The tricky part is making this airtight while keeping relevance, latency, and costs in control for a scalable architecture that can pass a system design interview.

Why It Matters

Multi tenant search is a common requirement in SaaS and marketplace systems. If a user from Tenant A can infer or view data from Tenant B through search results, facets, typeahead, or analytics, you have a breach. Leakage can happen even when top hits look fine because counts, suggestions, or highlight snippets might cross tenant boundaries. Interviewers ask this to test your grasp of distributed systems, access control, and performance trade offs under growth.

Real World Example

Imagine an online marketplace similar to Amazon where each seller is a tenant. A support agent at Seller Alpha searches for refund policy in tickets. The gateway authenticates the agent and sets tenant context to Alpha. The search service rewrites the query by injecting tenantId equals Alpha as a constant filter and routes the request using a hash of Alpha to keep shards balanced. Aggregation on ticket status and suggestion for refund phrase are computed inside the same filter, so counts and typeahead never reveal activity from Seller Beta. Query caches include tenant in their key, so a cached response for Beta cannot serve Alpha. If Seller Gamma pays for premium relevance with custom synonyms, Gamma either uses a dedicated index or joins a synonym family with compatible tenants.

How It Works (Step-by-Step)

  1. Capture Tenant Context at the Edge

    • Authenticate the user and resolve a canonical tenant ID in your gateway or API layer.
    • Never trust a tenant ID from the client side. Use server-controlled tokens or identity claims.
  2. Encode Tenant Identity at Index Time

    • Add a mandatory tenantId field to every document.
    • Compose the document ID as tenantId:docId to ensure uniqueness and simplify deletes.
  3. Partition and Route Data

    • Siloed Indices: One index per tenant (high isolation, higher cost).
    • Shared Index with Routing: Route documents by tenant ID for balance and scalability.
    • Hybrid Approach: Small tenants share, large tenants get dedicated indices.
  4. Server-Side Query Enforcement

    • Wrap all queries in a filter like tenantId == currentTenant.
    • Apply this filter to search results, aggregations, and autocomplete suggestions.
  5. Cache Isolation

    • Include tenantId in all cache keys.
    • Disable global warmers or pre-fetchers that lack tenant context.
  6. Access Control and Security

    • Enforce field-level and document-level security to protect sensitive tenant data.
  7. Testing and Auditing

    • Regularly test queries for leakage between tenants.
    • Maintain logs of tenant-level queries and results for traceability.

Common Pitfalls or Trade-Offs

  • Trusting Client Input: Never rely on client-sent tenant IDs. This is the most common source of leakage.

  • Partial Filtering: Forgetting to apply tenant filters to aggregations, highlights, or suggestions leaks metadata.

  • Cache Pollution: Shared cache keys can expose one tenant’s results to another.

  • Over-Isolation: Too many small indices increase memory footprint and reduce cache efficiency.

  • Complex Analyzers: Per-tenant analyzers in a shared index can break mappings and relevance scoring.

  • Incomplete Testing: Without automated leakage tests, small edge cases can slip through unnoticed.

Table of Comparison

ApproachLeakage RiskPerformanceCostOperational ComplexityIdeal Use Case
Silo per Tenant IndicesVery LowStableHighHighEnterprise tenants needing strict compliance
Shared Index with Tenant PartitionLow (if filtered correctly)ExcellentLowMediumStartups or multi-tenant SaaS platforms
Hybrid ApproachLowBalancedBalancedMediumPlatforms with mixed tenant sizes

Further Learning

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
What are the top system design interview questions for Coupang interview?
How to study AI for beginners?
Where can I practice system design interview?
How would you build a data catalog with lineage and governance?
Design a production grade data catalog with lineage and governance for scalable architecture and distributed systems. Learn models, ingestion, lineage, policy enforcement, portal design, pitfalls, and interview ready trade offs with links to practical courses.
Can I make money using OpenAI?
Proven methods to accelerate coding problem-solving skills
Related Courses
Course image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
4.6
Discounted price for Your Region

$197

Course image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
3.9
Discounted price for Your Region

$78

Course image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
4
Discounted price for Your Region

$78

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.