Search and Indexing for System Design Interviews: Inverted Indexes, Ranking, and the Secondary-Index Pattern

01Why Search Is Its Own Thing

Most candidates can name "Elasticsearch" and stop there. The interview goes deeper: how does the inverted index actually work? What's the query lifecycle from text input to ranked results? How does ranking happen, and why does it matter more than the database choice? Where does search sit relative to your primary store, and why isn't it the primary store itself?

The depth here lives in three places. First, the data structure: the inverted index is the load-bearing concept, and walking through how it answers a query is the difference between vocabulary and understanding. Second, ranking: returning matches is the easy part; ranking them well is the entire game. Third, the architectural pattern: production search is almost always secondary to a primary store, synced through change data capture, because search engines aren't reliable systems of record.

This page covers all three. Algorithms and tools are here too, but they're a setup for the architectural framing. The senior signal is recognizing that search is its own database category but rarely the primary store, and that the ranking quality matters more than the engine name.

The Senior Move

The senior signal in search interviews isn't naming Elasticsearch. It's recognizing that search engines are caches with rich query semantics, not systems of record. Naming the primary-store-plus-search-index pattern, with change data capture between them, is what separates senior candidates from "I'd add Elasticsearch" candidates.

02What Search Actually Does (and Why It's Hard)

Search takes a text query ("cheap flights to tokyo") and returns the documents most likely to satisfy the user's intent. That's the simple version. The real version: search has to handle multiple things at once, each of which is its own engineering problem.

Match. Find documents that contain the query terms. This is what the inverted index solves: instead of scanning every document for the words, the index goes from word to documents directly.
Rank. Of the matched documents, return the most relevant ones first. This is where TF-IDF, BM25, learning-to-rank, and modern hybrid scoring live.
Tolerate variation. Users misspell. They use synonyms. They mix languages. They expect plurals to match singulars. Production search handles all of this through tokenization, stemming, and analyzers.
Stay fast. Search has to return results in tens of milliseconds even when scanning millions of documents. The data structure choices and the query-time tradeoffs all serve this constraint.

What search does not do well, despite what some prep material implies:

Serve as a primary database. Search engines have weaker durability guarantees and consistency models than primary stores. If you lose your Elasticsearch cluster, you should be able to rebuild it from your primary store, not the other way around.
Handle transactional writes. Search indexes are eventually consistent with their source data. Writes propagate through indexing pipelines that take seconds to minutes. Don't expect read-your-writes from a search index.
Replace structured queries. "Find users by id" or "join orders to customers" belongs in your relational store. Search shines for full-text, faceted, and ranking queries; using it for structured queries is a mismatch.

03The Inverted Index, Walked Through

Most candidates can say "inverted index" but few can explain how it actually works. This section walks through the data structure with a concrete example, because the interview probe is "explain how Elasticsearch matches a query" and naming the structure isn't enough.

The forward index (the wrong way)

A forward index maps documents to their content. Document 1 contains "cheap flights to tokyo." Document 2 contains "tokyo restaurants." Document 3 contains "cheap hotels paris." To find documents containing "tokyo," a forward index requires scanning every document and checking. Linear in the corpus size. Useless at scale.

The inverted index (the right way)

An inverted index flips the relationship: it maps each word to the documents containing it. To find documents containing "tokyo," you look up "tokyo" in the index and get the list of matching documents directly. Logarithmic or constant time.

For the three documents above, the inverted index looks like:

Inverted Index (simplified)

cheap[doc1, doc3]

flights[doc1]

to[doc1]

tokyo[doc1, doc2]

restaurants[doc2]

hotels[doc3]

paris[doc3]

To find documents matching "cheap tokyo," you look up both terms and intersect their posting lists: [doc1, doc3] ∩ [doc1, doc2] = [doc1]. One document matches both terms.

What real production indexes add

The simplified version above is the core idea. Production indexes add several layers:

Term frequency per document. Each posting includes how many times the term appears in that document, used by ranking algorithms to score the match. "Tokyo" appearing five times in a document is stronger evidence of relevance than appearing once.
Position information. Each posting can include where in the document the term appears. This enables phrase queries ("new york" must appear as adjacent words) and proximity scoring.
Field-level indexing. Documents have multiple fields (title, body, tags). Each field gets its own index so queries can target specific fields with different weights.
Compression. Posting lists for common terms can be huge ("the" appears in nearly every document). Production indexes compress them aggressively using techniques like delta encoding and variable-byte encoding. The compression matters: a single uncompressed inverted index can be larger than the source corpus.

Tokenization: the surprisingly hard part

Before you can build the index, you have to break documents into terms. This is tokenization, and it's surprisingly hard.

Word boundaries. English has spaces. Chinese and Japanese don't. Tokenization for CJK languages requires either dictionary-based or statistical segmentation, both of which are imperfect.
Case normalization. "Tokyo" and "tokyo" should match. Most indexes lowercase everything at index time and again at query time.
Stemming and lemmatization. "Running," "runs," and "ran" should arguably all match a query for "run." Stemmers reduce words to their root form (often imperfectly).
Stop words. "The," "a," "and" appear in nearly every document and rarely affect relevance. Most indexes filter them out at index time.
Synonyms. "NYC" and "New York" should match. Synonym handling is its own configuration layer, and it can be applied at index time, query time, or both.

The interview move: name tokenization explicitly when describing the indexing pipeline. "We tokenize the source text, lowercase, remove stop words, apply stemming, then build the inverted index." That sentence signals that you understand the pipeline rather than treating Elasticsearch as a black box.

04The Query Lifecycle

What happens between a user typing "cheap flights to tokyo" and seeing ranked results? Five stages, each with its own concerns. The diagram below shows the flow; the prose walks through each stage.

The Search Query Lifecycle

The query lifecycle. Tokenize the input, look up posting lists for each term, intersect to find candidates, rank by relevance, return top-K. Each stage has its own engineering challenges.

Stage 1: Tokenize

The query goes through the same tokenization the documents went through at index time. Lowercase, stem, drop stop words. "Cheap flights to tokyo" becomes [cheap, flight, tokyo]. The query and documents have to be tokenized identically; mismatches here are a common bug.

Stage 2: Index lookup

Each token's posting list is fetched from the inverted index. For "cheap" you get the list of documents containing that term. Same for "flight" and "tokyo." This is the part that the inverted index makes fast: instead of scanning the corpus, you do three small lookups.

Stage 3: Intersect

For an AND query, intersect the posting lists: only documents in all three lists are candidates. Real query parsers handle AND, OR, NOT, phrase queries, and field-specific queries differently. The intersection is efficient because posting lists are sorted (usually by document id), so set operations are O(n+m) with skip lists.

Stage 4: Rank

The intersected list is the candidate set. Now rank it by relevance. This is where TF-IDF, BM25, and modern signals (recency, popularity, click-through history, vector similarity) combine into a final score per document.

Stage 5: Top-K

Return the top K documents (often 10 or 20). The application pages through the full ranked list, but the search engine usually returns a fixed window. Pagination beyond the first few pages is more expensive: the engine has to score and sort more candidates to give you a stable ordering.

The interview move: walk through the lifecycle when asked "how does search work?" Don't just say "inverted index." Say "tokenize, look up, intersect, rank, return top-K." Five words that signal you understand the flow.

05Ranking: Where Search Differentiates

Returning matches is the easy part. Ranking them well is the entire game. Two search engines can match the same documents but rank them differently; users will prefer the one that ranks better. The depth probe at staff level is "how would you improve ranking?" — a question with no single right answer but several recognizable approaches.

TF-IDF

The 1990s default

Term frequency times inverse document frequency. Common terms (appear in many documents) get downweighted; rare terms get boosted. A document containing "tokyo" five times scores higher than one containing it once, but a document with rare term "geisha" scores higher still per occurrence.

When to useBaseline understanding. Almost no production system uses pure TF-IDF anymore, but the intuition (rare terms weight more, more occurrences weight more) underlies everything that came after.

BM25

The modern default

A refinement of TF-IDF with diminishing returns on term frequency (the 100th occurrence of a term doesn't help much) and document length normalization (long documents shouldn't dominate just because they have more words). The standard scoring algorithm in Elasticsearch, OpenSearch, Solr, and most production engines.

When to useThe default for keyword search. If someone asks "what's the scoring algorithm?" the answer is BM25 unless something specific demands otherwise.

Learning to Rank

Production augmentation

Use machine learning to combine BM25 with other signals: recency, popularity, user click-through history, document quality, business rules. The model learns which combinations of signals predict which documents users actually clicked. Common at scale (Google, Amazon, LinkedIn) but operationally heavy.

When to useYou have click-through data and the resources to train and operate a ranking model. For most products, BM25 plus a few hand-tuned boosts is a better starting point.

Beyond pure relevance: business signals

Production ranking blends pure relevance scoring with business signals. Common ones:

Recency. News search prefers recent articles even if older ones are technically more relevant. Apply a recency boost or penalize old documents.
Popularity. Documents that many users have engaged with get boosted. Click-through rate, dwell time, share count.
Business rules. E-commerce promotes products from preferred sellers. Job search promotes paid listings. The system has to surface these without making relevance suffer too much.
Personalization. Same query from different users returns different results, weighted by their history and preferences.

The interview move on ranking: when asked "how would you improve search quality?", the strong response goes beyond the algorithm. "We'd start with BM25 as the relevance baseline, layer in recency for time-sensitive queries, add popularity from click-through data, and over time train a learning-to-rank model on top. The algorithm change is one knob; the signals you blend in are several knobs."

Returning matches is the easy part. Ranking is where search systems differentiate, and the gap between "BM25 only" and "BM25 plus signals" is what users feel.

06The Primary-Store-Plus-Search-Index Pattern

This is the architectural pattern most candidates miss. Production search isn't a standalone system; it's a secondary index over a primary store, kept in sync through change data capture. Naming this pattern explicitly is what distinguishes senior candidates.

Primary Store + Search Index, Synced via CDC

Production search lives next to the primary store, not as a replacement for it. Change data capture (CDC) streams writes from the primary into the search index. The index is rebuildable from the primary, never the other way around.

Why this pattern, and not just "use Elasticsearch"

Three reasons production systems split primary store from search index:

Durability and consistency. Primary stores (Postgres, DynamoDB) offer ACID transactions, point-in-time recovery, and strong consistency guarantees. Search engines don't, or do so weakly. If you lose your search cluster, you can rebuild it from the primary; if you lose your primary, your data is gone. The primary is the system of record.
Different data shapes. The primary stores normalized records optimized for transactional updates. The search index stores denormalized documents optimized for query-time relevance. The same logical entity (a product, a job posting) lives in both stores in different shapes.
Different operational characteristics. Primary stores favor write throughput and consistency. Search indexes favor read throughput and scoring. Mixing the two workloads on one system gives you the worst of both.

How the sync actually works

Change data capture (CDC) is the bridge. Every write to the primary store generates a change event; the change event flows through a streaming pipeline; the search index applies the change. Common implementations:

Database transaction logs. Tools like Debezium read Postgres's WAL or MySQL's binlog directly, producing a stream of changes. The application doesn't have to do anything; the CDC tool watches the database.
Outbox pattern. The application writes the change to an outbox table in the same transaction as the data change. A separate process reads the outbox and publishes to a queue. The message queues deep-dive covers this pattern.
Dual-write at the application. The application explicitly writes to both stores. Simple but error-prone: any failure between the two writes leaves them inconsistent. Avoid unless the alternative is impossible.

The lag between primary write and search index visibility is usually seconds. Users searching for content they just created may not see it for a moment. This is the "eventual consistency" trade you're making for the secondary-index pattern.

The Interview Move

"How would you add search to this system?" The strong response: "We'd add Elasticsearch as a secondary index, sync from Postgres via CDC using Debezium or the outbox pattern. Postgres stays the source of truth; Elasticsearch is rebuildable. We accept eventual consistency on search results — typically a few seconds of lag — in exchange for the query power Elasticsearch provides." That sentence covers the architecture, the sync mechanism, and the consistency tradeoff in three sentences.

07The Hybrid Keyword + Vector Shift

The 2026 default for new search systems is hybrid: keyword search (BM25 over an inverted index) combined with vector search (semantic similarity over embeddings). Each technique has weaknesses the other covers, and the combination is meaningfully better than either alone.

Where keyword search struggles

Keyword search matches exact terms. It fails on:

Synonyms. A query for "couch" doesn't match a document about "sofa" without explicit synonym configuration.
Conceptual queries. "Things to do with kids in tokyo" has very few terms that match document text directly. Documents about "family activities in japan" should rank highly but won't with pure keyword matching.
Misspellings. "Tokio" doesn't match "Tokyo" without fuzzy matching, which has its own tradeoffs.
Cross-language queries. A query in English doesn't match content in Japanese without translation.

Where vector search struggles

Vector search embeds queries and documents in a high-dimensional space and returns nearest neighbors. It fails on:

Exact match queries. Looking for a specific product code "SKU-12345" or a specific person's name. Vector search may surface conceptually similar items instead of the exact match.
Filtering by structured fields. "Tokyo restaurants under $50 with vegan options." Vector search can find similar restaurants but isn't great at hard filters; you usually combine with structured filtering.
Explainability. Why did this document rank highly? With BM25, you can point to specific terms. With vector similarity, the answer is "because the embeddings are close in 768-dimensional space." Useful for debugging.
Cost and latency. Vector search is more expensive per query. Storing and searching embeddings has real infrastructure cost. The vector databases deep-dive covers the operational concerns.

How hybrid actually works

Run both searches in parallel. Get a candidate set from each. Combine the scores. Return the merged ranking. Three common combination approaches:

Reciprocal rank fusion (RRF). Each document gets a score based on its rank in each list (1/rank, summed). Simple, parameter-free, surprisingly effective. The default for many production hybrid systems in 2026.
Linear combination. Final score = α × BM25 score + β × vector similarity. Tunable but requires the two scores to be comparable, which usually requires normalization.
Two-stage retrieval. Use vector search to retrieve a broad candidate set (top 1000), then re-rank with a more expensive model that uses BM25 plus other signals. Common in modern RAG and recommendation systems.

The 2026 honest answer

For most products in 2026, the right starting point is hybrid: BM25 for exact matches and structured queries, vector search for semantic queries, RRF or linear combination to merge. Postgres with pgvector handles the small-scale case in one database; dedicated vector stores (Pinecone, Weaviate) become appropriate at scale.

The interview move: name hybrid as the default. "We'd use Elasticsearch with BM25 plus a vector search component, combining results with RRF. Pure keyword search misses conceptual queries; pure vector misses exact matches. Hybrid gives us both." That sentence reflects what production systems actually look like in 2026.

08Failure Modes

Failure 01

Index drift from primary store

The CDC pipeline drops a message, or a malformed record fails to index, or a manual data fix bypasses the pipeline. The search index slowly diverges from the primary store. Users see stale or missing results that look like data corruption but are really sync drift.

The fix is reconciliation: a periodic process that compares primary and index, surfaces discrepancies, and re-indexes the affected documents. Most production systems run this nightly. Rebuilding the index from scratch is the nuclear option when drift is bad enough.

Failure 02

Hot shards on celebrity content

Search indexes are sharded for scale. A celebrity post or a viral product gets searched far more than average. The shard holding that document becomes a hot shard, with all the dynamics from the sharding deep-dive: high latency, uneven utilization, capacity ceiling.

The fix is the same as for any hot key: cache the popular results, replicate hot shards more aggressively, or split the hot document across multiple shards if the engine supports it. Elasticsearch's read replicas help; aggressive caching at the application layer helps more.

Failure 03

Indexing falls behind during traffic spikes

A sudden write spike (a launch, a sale, a news event) sends the indexing pipeline into backpressure. The primary store keeps accepting writes, but they take minutes or hours to reach the search index. Users searching for the new content can't find it.

The fix is the same backpressure machinery from queues: provision the indexing pipeline for the peak rate, autoscale the indexer workers, and set alerts on indexing lag. For some workloads, you accept that the search index lags during incidents and warn users; for others, you have to scale aggressively to keep up.

Failure 04

Search engine treated as primary store

Someone discovers Elasticsearch can do queries the primary store can't, and starts writing data only to Elasticsearch. Months later, an Elasticsearch cluster failure or a corrupted index loses data with no source to rebuild from.

The fix is policy and review: writes go to the primary, period. Anything that goes only to the search index is treated as ephemeral data that can be recomputed or lost. New features that need to be searchable should write to the primary first, then propagate to the index. Catching this in code review is much cheaper than catching it in an outage.

09How Search Interacts With Other Concepts

Search × Database selection. Search is its own database category in database selection. The decision is rarely "search vs relational" but "primary store + search index, in what configuration." pgvector inside Postgres often replaces a separate search store at small to moderate scale.
Search × Sharding. Search indexes shard the same way primary stores do, with the same hot-key risks. The shard key choice is often by document type or by tenant. Sharding covers the analogous tradeoffs.
Search × Message queues. CDC pipelines from primary store to search index almost always run through a queue (Kafka, Kinesis, Pub/Sub). The outbox pattern from message queues is the canonical bridge.
Search × Replication. Search indexes are typically replicated for read throughput and availability, with the same eventual-consistency tradeoffs from replication and consistency.
Search × Caching. Hot search queries are often cached because the same queries repeat. Cache the top results for popular queries; the cache absorbs most of the load while the index handles the long tail. Caching covers placement.

For more cross-concept interactions, see the concepts library hub.

10Practice Scenarios

Three scenarios. Read the setup. Decide your approach before opening the reveal.

Scenario 01

An e-commerce site needs product search across 10 million SKUs with filters (price, category, brand) and ranking by relevance plus popularity. How do you architect this?

Postgres is the system of record. Search needs to handle full-text on title and description, faceted filters on structured fields, and ranking that combines text relevance with click-through-driven popularity. Latency target p95 under 200ms. Roughly 100 searches per second.

How to think about this

This is the canonical primary-store-plus-search-index pattern. Postgres stays the source of truth; Elasticsearch (or OpenSearch) is the secondary index for search queries.

Sync: Debezium reading Postgres's WAL, streaming changes through Kafka, into Elasticsearch indexers. Indexing lag of seconds is acceptable for product search.

Index design: One Elasticsearch index per locale (English, Spanish, etc.) for proper tokenization. Documents include searchable fields (title, description) and structured fields (price, category, brand, popularity score) for filtering.

Ranking: BM25 for text relevance plus a popularity boost based on click-through data. Recompute popularity scores nightly from analytics; push them into the index. Maybe a learning-to-rank model later if we have the data and it justifies the operational cost.

Caching: Top searches (the popular product searches) get cached at the application layer. Cache hit rate of even 30% removes meaningful load from Elasticsearch.

Strong answer: "Elasticsearch as secondary index, synced from Postgres via Debezium + Kafka. BM25 ranking with popularity boost from analytics. Per-locale indexes for tokenization. Cache the popular queries at the application layer. Plan for hybrid (vector) search later when the data and pattern justify it."

Scenario 02

A user types "cheap weekend trip near san francisco for outdoor activities" — a conceptual query that doesn't match document text directly. How do you handle this?

Existing system uses BM25 keyword search over a corpus of travel content. The query above returns poor results because few documents contain those exact terms. Users complain that obvious matches are missing.

How to think about this

This is the conceptual-query failure mode of pure keyword search. The fix is hybrid: add vector search alongside BM25.

Add embeddings. Use a text embedding model (OpenAI, Cohere, or open-source like sentence-transformers) to embed each document. Store the embeddings in pgvector if scale allows, or a dedicated vector store if not.

Embed the query at search time. Same embedding model converts the user's query into a vector. Find nearest neighbors in the embedding space.

Combine. Run both BM25 and vector search. Combine results with reciprocal rank fusion (RRF) or a linear combination of normalized scores. The merged ranking gives keyword precision plus semantic recall.

Watch the latency. Two searches in parallel plus combination logic. With reasonable caching and fast vector search infrastructure, this stays under 200ms.

Strong answer: "Add vector search alongside BM25 for hybrid retrieval. Embed documents using a sentence-transformer model, store in pgvector if our scale allows. At query time, run both searches in parallel and combine with RRF. The keyword path catches exact matches; the vector path catches conceptual queries. We get both."

Scenario 03

A team proposes using Elasticsearch as the only database, eliminating Postgres entirely. Should they?

The argument is that Elasticsearch can do everything Postgres does (CRUD, aggregations, joins via parent-child) plus search. They want to simplify the stack from two databases to one.

How to think about this

No. The proposal misunderstands what Elasticsearch is. The pitch ("we can simplify by using one database") sounds reasonable but ignores fundamental differences.

Durability is weaker. Elasticsearch can lose recent writes during certain failure modes. Postgres doesn't, with appropriate replication. For data you can't lose (orders, payments, user accounts), Elasticsearch is the wrong choice.

Consistency is weaker. Elasticsearch doesn't offer ACID transactions. Multi-document updates aren't atomic. The system has to handle the resulting consistency holes at the application layer.

Operational cost is higher. Elasticsearch clusters are operationally more demanding than Postgres. Memory tuning, shard balancing, version upgrades — all things Postgres handles more gracefully.

The "simplify to one database" goal is real but misdirected. The right way to simplify isn't to replace Postgres with Elasticsearch; it's to use Postgres with full-text search and pgvector for cases where Elasticsearch isn't strictly needed. Postgres native FTS is good enough for many workloads.

Strong answer: "Don't do this. Elasticsearch is a search engine, not a system of record. We'd lose durability and consistency in exchange for query convenience that Postgres can mostly handle natively. The right simplification is to use Postgres FTS or pgvector for cases that don't need Elasticsearch's full power. Reach for Elasticsearch when search complexity actually demands it."

11Search and Indexing FAQ

Elasticsearch or OpenSearch?

OpenSearch is the AWS-led fork of Elasticsearch from when Elastic changed the license in 2021. They're API-compatible for most operations. OpenSearch is the right default if you're on AWS or want a fully open-source path. Elasticsearch is the right default if you're using Elastic Cloud or have features only Elasticsearch supports. The architectural patterns are identical; the choice rarely matters in interview discussions.

What about Postgres full-text search?

Underrated. Postgres's built-in FTS (tsvector, tsquery) handles small to moderate scale workloads competently. For maybe 60% of cases where teams reach for Elasticsearch, Postgres FTS would be sufficient. The honest 2026 answer for a new system: try Postgres FTS first. Reach for Elasticsearch when you need cross-field ranking, complex analyzers, or scale that Postgres can't handle. Database selection covers the broader pattern.

How do I tokenize for non-English languages?

Use language-specific analyzers. Elasticsearch ships with analyzers for many languages (kuromoji for Japanese, smartcn for Chinese, etc.). For each language you support, either run a separate index with the right analyzer, or use a multi-field configuration that runs multiple analyzers per field. Don't try to use the English analyzer on Japanese text; it'll fail catastrophically because there are no spaces between words.

What's the deal with sharding in Elasticsearch?

Each Elasticsearch index has a fixed number of primary shards set at creation time. Shards distribute data across nodes for parallelism. The choice is hard because changing shard count requires reindexing. Common defaults: 1-3 shards for small indexes, more for very large ones. The principle is the same as sharding for databases: pick a count that gives you parallelism without making each shard tiny.

How do I handle real-time search requirements?

"Real-time" usually means seconds, not milliseconds. Elasticsearch refresh interval (the time between writing and search visibility) is configurable; default is 1 second. You can make it faster (down to ~100ms) at the cost of indexing throughput, or slower for higher throughput. For workflows that need true real-time (sub-second visibility), the application can read from the primary store directly until the index catches up.

What about typo tolerance and "did you mean"?

Two layers. Typo tolerance happens at query time through fuzzy matching: the engine matches terms within a small edit distance of the query. "Did you mean" is a separate suggestion layer driven by query logs (suggesting the most common similar query) or dictionary-based correction. Most production search uses both. Elasticsearch supports fuzziness on most query types and has a phrase suggester for "did you mean."

How does this interact with vector search?

The 2026 default is hybrid: keyword search and vector search, combined. They cover different failure modes (keyword for exact match, vector for semantic). The combination through reciprocal rank fusion or linear scoring is genuinely better than either alone. The vector databases deep-dive covers the vector side; this page covers how they fit together.

Should I build search myself or use a managed service?

Managed for almost all cases. Elastic Cloud, AWS OpenSearch Service, Algolia (which is more opinionated and easier for small workloads). Self-managed Elasticsearch is operationally heavy. Build it yourself only when you have specific requirements managed services can't meet: very high scale, custom analyzers, specific compliance constraints. For most products, the managed option is the right answer.

Continue

Observability →

The next concept on the recommended learning path. Metrics, traces, and logs; SLOs and error budgets; the difference between alerting and observability; and what staff loops grade for in production-readiness questions.