How do you do zero‑downtime migrations (DB + app coordination)?
Zero downtime migration means changing a live database and application without causing user impact. The concept revolves around the expand and contract pattern. You first expand the schema to support both old and new code, then contract it after the new system is fully stable. The challenge lies in orchestrating both database and application changes safely with rollback options at every stage.
Why It Matters
In high availability systems like Netflix or Amazon, downtime is not an option. These companies run 24/7, serving millions of global requests per second. Every deployment must be seamless. In interviews, knowing how to handle such migrations shows you understand data consistency, backward compatibility, and safe deploy patterns in distributed systems.
How It Works (Step-by-Step)
Step 1. Prepare Safety Nets
- Define migration steps, rollbacks, and success criteria.
- Introduce feature flags for reads and writes separately.
- Set up monitoring for read/write latency, errors, and data parity.
- Validate backups and ensure point-in-time restore is possible.
Step 2. Expand Schema (Backward Compatible)
- Add new columns or tables as nullable to prevent locks.
- Build indexes concurrently or online to avoid blocking writes.
- Avoid renaming columns directly; introduce new ones instead.
- For NoSQL, make code tolerant of both old and new data shapes.
Step 3. Backfill Data Gradually
- Copy historical data in small chunks using background jobs.
- Use throttling or batching to avoid database overload.
- Monitor progress metrics like rows processed and failed counts.
- Use Change Data Capture (CDC) to sync new writes during backfill.
Step 4. Enable Dual Writes
- Update both old and new structures for new transactions.
- Use idempotent operations and message queues to prevent duplication.
- Implement shadow reads for verification between old and new data.
Step 5. Switch Reads Gradually
- Start with a small canary rollout.
- Use shadow queries to verify results before full cutover.
- Monitor metrics to detect latency or consistency issues.
- Keep rollback switches ready in case of errors.
Step 6. Contract Schema
- After full cutover, remove unused tables, columns, and indexes.
- Clean up dual-write logic and feature flags.
- Archive migration logs and metrics for audit and rollback traceability.
Real World Example
Netflix once refactored its catalog database to support richer metadata and faster search. They first created new tables and backfilled data from the old schema. Next, they dual wrote updates to both tables. When confidence was high, they gradually redirected reads to the new structure. Finally, the old tables were dropped once all clients had switched over. The migration completed without user-visible downtime.
Common Pitfalls or Trade-Offs
1. Dropping Too Early
Removing columns or tables before all reads switch can break existing code. Always expand first and contract last.
2. Non-Online Schema Changes
Avoid blocking alters that rewrite entire tables. Use online or concurrent operations.
3. Missing Idempotency
If dual writes are not idempotent, retries may duplicate data. Always design for replays.
4. No Verification Metrics
Without comparing old and new data paths, silent corruption can occur. Always use parity checks and mismatch counters.
5. Ignoring Caches and Indexes
When switching data sources, caches or search indexes can return stale data. Refresh or invalidate them during the cutover.
6. Permanent Feature Flags
Forgetting to remove temporary flags leads to configuration complexity. Schedule cleanup tasks post-migration.
Interview Tip
Interviewers often ask how you’d rename a column used by multiple services without downtime. The correct response involves adding a new column, dual writing to both, switching reads, verifying data parity, then removing the old column once safe.
Key Takeaways
- Expand before you contract.
- Use dual writes and feature flags to isolate risk.
- Make migrations idempotent and observable.
- Validate every step with metrics before cleanup.
- Always keep rollback paths ready.
Table of Comparison
| Migration Approach | Downtime Risk | Complexity | Data Safety | Best Use Case |
|---|---|---|---|---|
| Expand & Contract (Dual Write) | Very Low | High | Very High | Global-scale systems needing 24/7 uptime |
| Blue-Green Database Migration | Low | Medium | High | Systems that can run parallel stacks |
| Read-Only Window + Backfill | Medium | Low | Medium | Internal tools or admin systems |
| Big Bang Migration | High | Low | Low | Early-stage projects with small traffic |
FAQs
Q1. What does zero downtime migration mean?
It’s the process of modifying your database or schema while your application remains online. Users experience no disruption during the change.
Q2. Why use the expand and contract pattern?
It ensures backward compatibility between old and new code during the transition, minimizing data inconsistency and rollback complexity.
Q3. How do you handle large database backfills?
Break them into smaller chunks, use throttling, and leverage change data capture to sync new writes during the process.
Q4. Can zero downtime work for both SQL and NoSQL databases?
Yes. The principles remain the same—compatibility, incremental rollout, and verification. The tools and commands differ per database type.
Q5. How do you verify data after migration?
Use parity checks and shadow reads to compare results between old and new sources. Automate mismatch detection and alerting.
Q6. What are safe rollback strategies?
Keep old schema and dual writes active until verification is complete. If issues arise, switch reads back instantly using feature flags.
Further Learning
Enhance your understanding of safe migrations and scalable systems with these expert-designed courses:
-
Grokking Scalable Systems for Interviews – Learn advanced rollout, data replication, and zero downtime techniques.
-
Grokking System Design Fundamentals – Build the foundational knowledge of scalability, caching, and distributed systems required for mastering migrations.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78