How would you manage secondary indexes in distributed KV stores?

Secondary indexes in distributed key value stores enable queries beyond simple primary key lookups. They let systems fetch items based on other attributes such as email, status, or category without performing a full scan. In distributed databases like DynamoDB, Cassandra, or RocksDB-based systems, secondary indexes are implemented as separate data structures that must stay consistent with the main data.

Why It Matters

In large-scale systems, the lack of efficient secondary indexes can cripple query performance. Indexes transform O(N) scans into O(log N) lookups. However, they add complexity in distributed environments because data is sharded across nodes. Managing them correctly tests your understanding of consistency, replication, and fault tolerance — core skills interviewers expect in system design interviews.

How It Works (Step by Step)

Identify Access Patterns Determine what non-primary attributes are frequently queried. Example: in a user table, if lookups by email are common, you need a secondary index on that field.
Select Index Type Local Index: Co-located with the primary partition. Works well when queries contain the same partition key. Global Index: Distributed separately from the main keyspace, allowing queries on any field but adding complexity in synchronization.
Build the Index Structure Each indexed field maps to a list of primary keys (or records). For example, for status=active, store a mapping to all user IDs with that status.
Maintain Consistency On every insert, update, or delete, the index must be updated. You can use: Synchronous updates (within the same transaction) for strong consistency. Asynchronous updates (via change streams or logs) for higher performance but eventual consistency.
Handle Rewrites and Deletes When a value of an indexed field changes, remove the old mapping and insert a new one. To prevent duplicates on retries, make the operation idempotent.
Partition and Scale Distribute global indexes using a hashing function based on the indexed field. Add “salting” or bucketing to avoid hotspots on popular values.
Query Processing For queries, the system first looks up the secondary index to find matching keys and then fetches the actual records. A “covering index” can store commonly queried fields to avoid a second fetch.
Rebuild and Repair Use background jobs or snapshot scans to rebuild indexes after data corruption or schema changes. Validate consistency using Merkle trees or checksums.

Real-World Example

In a Dynamo-style order management system, the main table might use OrderID as the primary key. But customer support queries like “Find all orders by customer X with status pending” require a global secondary index (GSI). The GSI could use (CustomerID, Status) as the key and store the creation timestamp for sorting. Writes to this index can be asynchronous via a stream that processes order events, keeping the main table isolated from index write latency.

Common Pitfalls or Trade-offs

Write Amplification – Each new index means extra writes per record. Optimize only for truly necessary queries.
Inconsistent Data – Asynchronous updates can create temporary mismatches between base and index tables.
Hotspots on Popular Keys – Certain attributes (e.g., status=pending) may cause uneven traffic. Use hashing or time bucketing.
Complex Updates – Attribute changes require atomic removal and insertion in the index; failing to handle retries properly causes duplicates.
Operational Complexity – Global indexes require extra monitoring, partition management, and rebalancing during cluster scale-out.

Interview Tip

A common interview question is: “How would you enforce uniqueness for a username in a distributed system?” The best approach is to store usernames as keys in a strongly consistent index (or a dedicated shard) and perform conditional writes. If the username already exists, the write fails. This demonstrates understanding of atomicity and partitioning.

Key Takeaways

Secondary indexes trade write simplicity for query speed.
Choose between local and global indexes based on query patterns.
Use asynchronous updates for scalability but monitor staleness.
Prevent hotspots using key salting or time-based partitioning.
Plan for rebuilds, backfills, and integrity checks early.

Table of Comparison

Strategy	Best For	Write Cost	Read Cost	Consistency	Complexity
Local Secondary Index	Queries within same partition	Low	Very Low	Strong	Low
Global Secondary Index (Sync)	Cross-partition queries needing real-time results	High	Low	Strong	Medium
Global Secondary Index (Async)	High write throughput, relaxed freshness	Medium	Low	Eventual	Low
Search Sidecar (e.g., Elastic/OpenSearch)	Complex text or multi-field queries	Medium	Medium	Eventual	Medium
Materialized Views	Precomputed frequent queries	High	Very Low	Strong	High

FAQs

Q1. What is the difference between a local and a global secondary index?

A local index resides on the same shard as the primary key and requires the partition key for queries. A global index spans multiple partitions and allows queries across any field.

Q2. How do you keep indexes consistent with the primary store?

Either perform synchronous dual writes (strong consistency) or use change streams for asynchronous updates (eventual consistency). The choice depends on latency and freshness needs.

Q3. How can you prevent hotspots in secondary indexes?

Use key salting or random prefixes on hot attributes. This spreads traffic evenly across partitions.

Q4. When should I rebuild an index?

Rebuild indexes after data corruption, schema changes, or when lag in asynchronous pipelines grows. Rebuilds often use snapshots plus live catch-up streams.

Q5. When is it better to use a search engine instead of secondary indexes?

If you need full-text search, ranking, or multi-field filtering, it’s better to use systems like OpenSearch or Solr instead of multiple global indexes.

Q6. How can I enforce uniqueness on a secondary attribute?

Perform a conditional write to the index key (e.g., username) before inserting the main record. If it already exists, the operation fails atomically.

Further Learning

For deeper insights on data modeling and scalability trade-offs, explore Grokking Scalable Systems for Interviews. To strengthen your foundation on consistency, partitioning, and indexing, check Grokking System Design Fundamentals. For hands-on interview preparation with real-world examples, enroll in Grokking the System Design Interview.

TAGS

System Design Interview

System Design Fundamentals

CONTRIBUTOR

Design Gurus Team