On this page

Understanding the Core Problem

The Trap of the "Big Bang" Rewrite

The Incremental Approach: The Strangler Fig Pattern

How the Proxy Works

The Hardest Part: Migrating Data

The Dual Write Strategy

Dark Launching and Canary Deployments

Dark Launching

Canary Deployments

Managing Distributed Transactions

Eventual Consistency

The Role of Observability

Handling Dependencies and Shared Libraries

The Human Element of Migration

Conclusion

Decoupling Data: How to Split a Monolith Database

Image
Arslan Ahmad
Learn how to migrate monolithic architectures to microservices without system downtime using the Strangler Fig pattern and dual-write strategies.
Image
On this page

Understanding the Core Problem

The Trap of the "Big Bang" Rewrite

The Incremental Approach: The Strangler Fig Pattern

How the Proxy Works

The Hardest Part: Migrating Data

The Dual Write Strategy

Dark Launching and Canary Deployments

Dark Launching

Canary Deployments

Managing Distributed Transactions

Eventual Consistency

The Role of Observability

Handling Dependencies and Shared Libraries

The Human Element of Migration

Conclusion

Software systems inevitably become difficult to manage as they grow in size and complexity.

Codebases that started as clean and organized projects often turn into tightly coupled structures where changing one feature breaks another.

This natural progression leads engineering teams to consider moving from a single, unified codebase to a distributed architecture. The challenge lies in executing this transition without stopping the service or disrupting the experience for active users.

Refactoring a massive system while it handles live traffic is one of the most difficult tasks in software engineering. It requires a deep understanding of architecture, data consistency, and risk management.

This process is often a focal point in system design interviews because it tests a candidate's ability to prioritize availability and stability over speed.

Understanding the mechanics of zero-downtime migration is essential for anyone looking to work on large-scale applications.

Understanding the Core Problem

The primary issue with legacy systems is often the monolithic architecture.

In a monolith, all functional components of an application reside in a single deployable unit. The user interface, business logic, and data access layers are woven together. While this makes development easy in the early stages, it becomes a bottleneck as the team and traffic grow.

When a monolith becomes too large, deployment takes longer.

A small bug in the payment processing code might crash the entire website, including the login page and the product catalog. To solve this, teams migrate to microservices.

This architecture breaks the application into small, independent services. Each service handles a specific domain, such as payments, users, or notifications.

The migration problem is not about the end state but the journey. A team cannot simply shut down the application for three months to rewrite the code.

The business would lose customers and revenue. The system must remain operational and bug-free during the entire transition. This constraint forces engineers to adopt incremental strategies rather than attempting a total rewrite.

The Trap of the "Big Bang" Rewrite

A common mistake is believing that the best solution is to build a brand new system from scratch alongside the old one and switch them on launch day.

This is known as a "Big Bang" rewrite. History shows this approach almost always fails.

Building a new system takes time. While developers are writing the new version, the old version is still being updated with new features and bug fixes to keep the business running.

The new system is constantly playing catch-up. By the time the new system is ready, it is often missing critical logic that was added to the old system during development.

Furthermore, switching all traffic at once introduces a massive risk of catastrophic failure.

If the new system has a critical flaw, the entire platform goes down.

The Incremental Approach: The Strangler Fig Pattern

The industry standard for safe migration is the Strangler Fig Pattern. This pattern involves gradually replacing specific functionalities of the legacy system with new applications and services. The new system eventually grows until the old system can be decommissioned.

This process relies heavily on a component called an API Gateway or a Proxy. This is a server that sits between the client (the user's browser or mobile app) and the backend services.

How the Proxy Works

The proxy acts as a router. In the beginning, the proxy sends all requests to the legacy monolith. When the engineering team is ready to migrate a specific feature, such as the "User Profile," they build a new microservice just for that feature.

Once the new service is ready, the team updates the proxy configuration. The proxy is told to look at the incoming network requests.

If a request is for the User Profile, the proxy routes it to the new microservice. If the request is for anything else, the proxy continues to route it to the legacy monolith.

This allows the team to migrate the system piece by piece.

Image

If the new User Profile service fails, the proxy can be instantly updated to route traffic back to the legacy system. This capability minimizes risk and ensures that a single error does not take down the entire application.

The Hardest Part: Migrating Data

Moving code is relatively straightforward compared to moving data.

In a monolith, there is usually a single, massive database.

All parts of the code read and write to this shared data source.

Microservices, however, usually require their own private databases to ensure they are decoupled.

Separating the data is where most migrations face complications. You cannot simply copy the database to a new location because the data changes every second. By the time the copy finishes, the live data has already changed, and the copy is outdated.

The Dual Write Strategy

To solve the data migration problem, engineers use a strategy called Dual Write.

This process ensures that data exists in both the old database and the new database in real-time.

  1. Insert the Interceptor: The team modifies the legacy code. Whenever the legacy system writes data to the old database, it also sends a copy of that data to the new microservice or database.

  2. Backfill Historic Data: At this point, new data is entering both systems, but the new database is missing all the old records. The team runs a background script to copy historical data from the old database to the new one.

  3. Verification: This is a critical step. The system is not switched over yet. Instead, a verification process runs to compare the records in both databases. It checks if the data matches perfectly.

  4. Change the Read Path: Once the team is confident that the data is synchronized and correct, they update the proxy or the application code to read from the new database. The write operations still go to both places to ensure safety.

This method allows for a rollback at any moment.

Image

If the new database performs poorly, the system can instantly revert to reading from the old database, which has been kept up to date via the dual write process.

Dark Launching and Canary Deployments

Even with the code moved and the data synchronized, releasing the new system to 100% of users immediately is risky.

The new service might have performance issues that only appear under heavy load.

To mitigate this, engineers use Dark Launching and Canary Deployments.

Dark Launching

Dark launching involves deploying the new service and having it process requests, but without showing the results to the user.

For example, when a user searches for a product, the system sends the search query to both the old search engine and the new search microservice.

The application returns the results from the old engine to the user.

However, in the background, the engineers capture the results from the new service and compare them to the old ones. This allows them to test the accuracy and performance of the new system with real production traffic without the user knowing.

Canary Deployments

Once the new service is proven to work in the dark, the team begins a Canary Deployment.

This technique involves routing a small percentage of real user traffic to the new service.

The team might configure the proxy to send 1% of all requests to the new microservice. The remaining 99% go to the legacy system.

The team monitors the 1% closely. They look for error rates, slow response times, or crashes.

If the metrics look good, they increase the traffic to 5%, then 10%, and eventually 100%.

If errors occur at the 1% level, only a tiny fraction of users are affected.

The team can quickly route that 1% back to the legacy system and fix the bug.

This controlled rollout significantly reduces the blast radius of any potential errors.

Managing Distributed Transactions

In a monolith, saving data across different tables is easy because of database transactions. A transaction ensures that either all data is saved correctly, or none of it is.

For example, when a user places an order, the system must deduct stock from the inventory table and add an entry to the order history table. If the database fails halfway through, the transaction rolls back, and no data is corrupted.

In a microservices architecture, the inventory lives in one database and the order history lives in another. A single database transaction cannot span across two different physical servers. This introduces the problem of distributed transactions.

If the order service saves the order, but the inventory service fails to deduct the stock, the data becomes inconsistent. The system thinks an item was sold, but the inventory count does not reflect it.

Eventual Consistency

To handle this, distributed systems rely on Eventual Consistency. This concept accepts that the system might not be consistent for a few milliseconds, but it will eventually become consistent.

A common pattern to implement this is the Saga Pattern.

Instead of one big transaction, the process is broken down into a series of local transactions.

  1. The Order Service creates the order and publishes an event called "OrderCreated."
  2. The Inventory Service listens for this event. When it receives it, it deducts the stock.
  3. If the Inventory Service fails to deduct stock, it publishes an "OutOfStock" event.
  4. The Order Service listens for this event and executes a compensating transaction, such as cancelling the order.

This approach is more complex than a monolithic transaction but is necessary for the system to scale and remain decoupled. It requires a shift in mindset from expecting immediate consistency to designing for failure and recovery.

The Role of Observability

When a system moves from a single application to many distributed services, debugging becomes much harder.

In a monolith, you can follow a request by looking at a single log file.

In a microservices architecture, a single user request might hop through five different services.

Migration cannot succeed without Observability. This goes beyond simple logging. It involves three pillars:

  1. Logs: Detailed records of discrete events (e.g., "Database connection failed").
  2. Metrics: Aggregated numerical data over time (e.g., "Average CPU usage is 40%").
  3. Tracing: The ability to track a single request as it travels across different services.

Distributed tracing assigns a unique ID to every request that enters the system. This ID is passed along to every service that handles the request.

If a user reports an error, an engineer can search for that ID and see exactly which service failed and why.

Without this visibility, fixing bugs during a migration is like trying to find a needle in a haystack in the dark.

Handling Dependencies and Shared Libraries

A hidden challenge in migration is dealing with shared code.

In a monolith, it is common to have a "Utils" or "Common" folder that contains helper functions used by every part of the application.

When extracting a service, developers often feel the urge to copy this shared code into the new service or create a shared library that all microservices import.

While this seems efficient, it can reintroduce coupling. If the shared library is updated, every single microservice might need to be rebuilt and redeployed to use the new version.

A better approach during migration is to tolerate some code duplication. It is often safer to copy a specific helper function into the new microservice than to create a rigid dependency on a shared library.

This allows the new microservice to evolve independently without fear of breaking other services when a shared library changes.

Image

The Human Element of Migration

Technical challenges are only half the battle.

Migrations are also a management and cultural challenge. Engineers prefer writing new code over refactoring old code.

Migration work is often invisible to the business stakeholders because it does not immediately add new features for customers.

To sustain a migration, the team must communicate the value in terms of reliability and speed. The goal is not just "clean code," but the ability to ship features faster in the future.

Furthermore, the team must avoid "Migration Fatigue."

Migrations can take years.

If the team focuses 100% on migration, they will burn out, and the product will stagnate. The most successful teams balance migration work with feature work. They might dedicate 20% of their sprint capacity to the migration while spending 80% on new business value.

This keeps the stakeholders happy and keeps the engineering team motivated.

Conclusion

Migrating a legacy system is a test of patience, discipline, and architectural knowledge. It is not about finding a magic tool that converts code automatically. It is about reducing risk through incremental steps.

By avoiding the big bang rewrite and embracing patterns like Strangler Fig and Dual Write, teams can modernize their architecture without impacting the user experience.

Here are the key takeaways for a safe system migration:

  • Avoid Big Bang Rewrites: Never attempt to rewrite the entire system at once. It increases risk and pauses feature development.

  • Use the Strangler Fig Pattern: Build new services around the edges of the old system and gradually route traffic to them.

  • Leverage a Proxy: Use an API Gateway to control traffic flow between the monolith and new microservices.

  • Implement Dual Writes: Keep data consistent by writing to both old and new databases during the transition.

  • Verify Before Switching: Run background verification processes to ensure data integrity before changing the read path.

  • Deploy with Canaries: Roll out changes to a small percentage of users first to limit the impact of potential bugs.

  • Prioritize Observability: Implement distributed tracing to track requests across the new and old systems.

  • Balance the Work: Mix migration tasks with new feature development to maintain business momentum and team morale.

What our users say

KAUSHIK JONNADULA

Thanks for a great resource! You guys are a lifesaver. I struggled a lot in design interviews, and this course gave me an organized process to handle a design problem. Please keep adding more questions.

Simon Barker

This is what I love about http://designgurus.io’s Grokking the coding interview course. They teach patterns rather than solutions.

ABHISHEK GUPTA

My offer from the top tech company would not have been possible without this course. Many thanks!!

More From Designgurus
Substack logo

Designgurus on Substack

Deep dives, systems design teardowns, and interview tactics delivered daily.

Read on Substack
Annual Subscription
Get instant access to all current and upcoming courses for one year.

Access to 50+ courses

New content added monthly

Certificate of completion

$33.25

/month

Billed Annually

Recommended Course
Grokking the System Design Interview

Grokking the System Design Interview

162,490+ students

4.7

Grokking the System Design Interview is a comprehensive course for system design interview. It provides a step-by-step guide to answering system design questions.

View Course
Join our Newsletter

Get the latest system design articles and interview tips delivered to your inbox.

Read More

7 Tips to Stand Out in Your System Design Interview

Arslan Ahmad

Arslan Ahmad

The 'System Design Primer' (GitHub) vs. DesignGurus.io: Free vs. Curated

Arslan Ahmad

Arslan Ahmad

Netflix System Design Interview Questions: An In-Depth Guide

Arslan Ahmad

Arslan Ahmad

OpenAI System Design Interview Questions

Arslan Ahmad

Arslan Ahmad

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.