How would you design multi‑tenant search indices without leakage?

Designing multi tenant search is about serving many customers from a shared engine while guaranteeing zero data leakage. The core idea is simple. Every document carries tenant identity and every query runs inside a server side sandbox that enforces tenant filters for hits, aggregations, highlights, and caches. The tricky part is making this airtight while keeping relevance, latency, and costs in control for a scalable architecture that can pass a system design interview.

Why It Matters

Multi tenant search is a common requirement in SaaS and marketplace systems. If a user from Tenant A can infer or view data from Tenant B through search results, facets, typeahead, or analytics, you have a breach. Leakage can happen even when top hits look fine because counts, suggestions, or highlight snippets might cross tenant boundaries. Interviewers ask this to test your grasp of distributed systems, access control, and performance trade offs under growth.

Real World Example

Imagine an online marketplace similar to Amazon where each seller is a tenant. A support agent at Seller Alpha searches for refund policy in tickets. The gateway authenticates the agent and sets tenant context to Alpha. The search service rewrites the query by injecting tenantId equals Alpha as a constant filter and routes the request using a hash of Alpha to keep shards balanced. Aggregation on ticket status and suggestion for refund phrase are computed inside the same filter, so counts and typeahead never reveal activity from Seller Beta. Query caches include tenant in their key, so a cached response for Beta cannot serve Alpha. If Seller Gamma pays for premium relevance with custom synonyms, Gamma either uses a dedicated index or joins a synonym family with compatible tenants.

How It Works (Step-by-Step)

  1. Capture Tenant Context at the Edge

    • Authenticate the user and resolve a canonical tenant ID in your gateway or API layer.
    • Never trust a tenant ID from the client side. Use server-controlled tokens or identity claims.
  2. Encode Tenant Identity at Index Time

    • Add a mandatory tenantId field to every document.
    • Compose the document ID as tenantId:docId to ensure uniqueness and simplify deletes.
  3. Partition and Route Data

    • Siloed Indices: One index per tenant (high isolation, higher cost).
    • Shared Index with Routing: Route documents by tenant ID for balance and scalability.
    • Hybrid Approach: Small tenants share, large tenants get dedicated indices.
  4. Server-Side Query Enforcement

    • Wrap all queries in a filter like tenantId == currentTenant.
    • Apply this filter to search results, aggregations, and autocomplete suggestions.
  5. Cache Isolation

    • Include tenantId in all cache keys.
    • Disable global warmers or pre-fetchers that lack tenant context.
  6. Access Control and Security

    • Enforce field-level and document-level security to protect sensitive tenant data.
  7. Testing and Auditing

    • Regularly test queries for leakage between tenants.
    • Maintain logs of tenant-level queries and results for traceability.

Common Pitfalls or Trade-Offs

  • Trusting Client Input: Never rely on client-sent tenant IDs. This is the most common source of leakage.

  • Partial Filtering: Forgetting to apply tenant filters to aggregations, highlights, or suggestions leaks metadata.

  • Cache Pollution: Shared cache keys can expose one tenant’s results to another.

  • Over-Isolation: Too many small indices increase memory footprint and reduce cache efficiency.

  • Complex Analyzers: Per-tenant analyzers in a shared index can break mappings and relevance scoring.

  • Incomplete Testing: Without automated leakage tests, small edge cases can slip through unnoticed.

Table of Comparison

ApproachLeakage RiskPerformanceCostOperational ComplexityIdeal Use Case
Silo per Tenant IndicesVery LowStableHighHighEnterprise tenants needing strict compliance
Shared Index with Tenant PartitionLow (if filtered correctly)ExcellentLowMediumStartups or multi-tenant SaaS platforms
Hybrid ApproachLowBalancedBalancedMediumPlatforms with mixed tenant sizes

Further Learning

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.