How do you do zero‑downtime migrations (DB + app coordination)?

Zero downtime migration means changing a live database and application without causing user impact. The concept revolves around the expand and contract pattern. You first expand the schema to support both old and new code, then contract it after the new system is fully stable. The challenge lies in orchestrating both database and application changes safely with rollback options at every stage.

Why It Matters

In high availability systems like Netflix or Amazon, downtime is not an option. These companies run 24/7, serving millions of global requests per second. Every deployment must be seamless. In interviews, knowing how to handle such migrations shows you understand data consistency, backward compatibility, and safe deploy patterns in distributed systems.

How It Works (Step-by-Step)

Step 1. Prepare Safety Nets

  • Define migration steps, rollbacks, and success criteria.
  • Introduce feature flags for reads and writes separately.
  • Set up monitoring for read/write latency, errors, and data parity.
  • Validate backups and ensure point-in-time restore is possible.

Step 2. Expand Schema (Backward Compatible)

  • Add new columns or tables as nullable to prevent locks.
  • Build indexes concurrently or online to avoid blocking writes.
  • Avoid renaming columns directly; introduce new ones instead.
  • For NoSQL, make code tolerant of both old and new data shapes.

Step 3. Backfill Data Gradually

  • Copy historical data in small chunks using background jobs.
  • Use throttling or batching to avoid database overload.
  • Monitor progress metrics like rows processed and failed counts.
  • Use Change Data Capture (CDC) to sync new writes during backfill.

Step 4. Enable Dual Writes

  • Update both old and new structures for new transactions.
  • Use idempotent operations and message queues to prevent duplication.
  • Implement shadow reads for verification between old and new data.

Step 5. Switch Reads Gradually

  • Start with a small canary rollout.
  • Use shadow queries to verify results before full cutover.
  • Monitor metrics to detect latency or consistency issues.
  • Keep rollback switches ready in case of errors.

Step 6. Contract Schema

  • After full cutover, remove unused tables, columns, and indexes.
  • Clean up dual-write logic and feature flags.
  • Archive migration logs and metrics for audit and rollback traceability.

Real World Example

Netflix once refactored its catalog database to support richer metadata and faster search. They first created new tables and backfilled data from the old schema. Next, they dual wrote updates to both tables. When confidence was high, they gradually redirected reads to the new structure. Finally, the old tables were dropped once all clients had switched over. The migration completed without user-visible downtime.

Common Pitfalls or Trade-Offs

1. Dropping Too Early

Removing columns or tables before all reads switch can break existing code. Always expand first and contract last.

2. Non-Online Schema Changes

Avoid blocking alters that rewrite entire tables. Use online or concurrent operations.

3. Missing Idempotency

If dual writes are not idempotent, retries may duplicate data. Always design for replays.

4. No Verification Metrics

Without comparing old and new data paths, silent corruption can occur. Always use parity checks and mismatch counters.

5. Ignoring Caches and Indexes

When switching data sources, caches or search indexes can return stale data. Refresh or invalidate them during the cutover.

6. Permanent Feature Flags

Forgetting to remove temporary flags leads to configuration complexity. Schedule cleanup tasks post-migration.

Interview Tip

Interviewers often ask how you’d rename a column used by multiple services without downtime. The correct response involves adding a new column, dual writing to both, switching reads, verifying data parity, then removing the old column once safe.

Key Takeaways

  • Expand before you contract.
  • Use dual writes and feature flags to isolate risk.
  • Make migrations idempotent and observable.
  • Validate every step with metrics before cleanup.
  • Always keep rollback paths ready.

Table of Comparison

Migration ApproachDowntime RiskComplexityData SafetyBest Use Case
Expand & Contract (Dual Write)Very LowHighVery HighGlobal-scale systems needing 24/7 uptime
Blue-Green Database MigrationLowMediumHighSystems that can run parallel stacks
Read-Only Window + BackfillMediumLowMediumInternal tools or admin systems
Big Bang MigrationHighLowLowEarly-stage projects with small traffic

FAQs

Q1. What does zero downtime migration mean?

It’s the process of modifying your database or schema while your application remains online. Users experience no disruption during the change.

Q2. Why use the expand and contract pattern?

It ensures backward compatibility between old and new code during the transition, minimizing data inconsistency and rollback complexity.

Q3. How do you handle large database backfills?

Break them into smaller chunks, use throttling, and leverage change data capture to sync new writes during the process.

Q4. Can zero downtime work for both SQL and NoSQL databases?

Yes. The principles remain the same—compatibility, incremental rollout, and verification. The tools and commands differ per database type.

Q5. How do you verify data after migration?

Use parity checks and shadow reads to compare results between old and new sources. Automate mismatch detection and alerting.

Q6. What are safe rollback strategies?

Keep old schema and dual writes active until verification is complete. If issues arise, switch reads back instantly using feature flags.

Further Learning

Enhance your understanding of safe migrations and scalable systems with these expert-designed courses:

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.