How do you design DDoS protection (L3/4/7) with scrubbing centers?

DDoS protection at L3, L4, and L7 with scrubbing centers is a classic system design interview topic because it touches networking, distributed systems, routing, and application security in a single design. At a high level, you want to detect malicious traffic at the edge, divert it to powerful scrubbing centers that can absorb and clean the flood, then send only valid traffic back to your origin service with minimal extra latency.

Think of scrubbing centers as giant shock absorbers for your traffic. When an attack starts, traffic is routed through these centers where filters, rate limits, protocol checks, and application logic separate bots from real users. Getting this architecture right is a strong signal that you can design scalable and resilient internet facing systems.

Why It Matters

For real production systems, especially global products like streaming platforms, social networks, and e commerce, DDoS attacks are not hypothetical. They can:

Exceed your network bandwidth and saturate links
Exhaust kernel resources with packet floods
Overwhelm load balancers and reverse proxies
Take down critical endpoints such as login, checkout, or public APIs

In a system design interview, DDoS protection is a chance to show:

Awareness of the full stack from L3 ingress to L7 logic
Ability to design for internet scale adversarial traffic
Trade off reasoning between cost, latency, and protection strength
How to integrate external DDoS providers and scrubbing centers into your architecture

If you can explain multi layer DDoS protection clearly, with L3 volumetric controls, L4 transport defenses, and L7 application based filtering, you demonstrate strong understanding of scalable architecture and real world reliability concerns.

How It Works Step by Step

At a conceptual level, the flow is: detect, divert, scrub, return, observe. Let us break this down in practical interview friendly steps.

Step 1 Understand DDoS types and OSI layers

You should start by naming attack types and where they hit:

L3 volumetric attacks
- Flood the network with packets toward your IP range
- Goal is to saturate bandwidth or routers
L4 transport attacks
- SYN floods, UDP floods, connection exhaustion
- Goal is to exhaust connection tables and stateful resources
L7 application attacks
- Fake logins, search requests, expensive API calls
- Often low bandwidth but high resource per request

This classification naturally leads to layered controls at each level.

Step 2 Front your service with global anycast and a DDoS provider

Rather than exposing your origin IPs to the public internet, you front them with:

Anycast IP addresses served by a DDoS provider or CDN
Edge points of presence spread across regions
Optional on premise appliances near your data centers

Normal user traffic flows from client to closest edge, then to origin. In an attack, the same anycast address helps spread traffic across many locations rather than concentrating it in one data center.

Step 3 Continuous monitoring and anomaly detection

Your provider and your own network stack watch for:

Sudden jumps in packets per second or bits per second per IP or per prefix
Surges in SYNs without corresponding ACKs
Sudden traffic concentration from a region or ASN
Abnormal L7 patterns such as many login attempts per IP or per account

Thresholds, heuristic rules, and sometimes machine learning models trigger an alert or an automatic protection mode. For the interview, it is enough to say that flow logs and sampled packets feed a detection engine that can classify traffic as likely attack or not.

Step 4 Divert traffic to scrubbing centers

Once an attack is detected, traffic is diverted through scrubbing centers. Common mechanisms:

BGP diversion
- Your provider or your own network advertises your IP prefix from scrubbing centers
- Internet routes DDoS traffic to those centers instead of to your origin
GRE or IP tunnel back to origin
- Clean traffic is encapsulated and tunneled to your real data center
DNS based redirection for application level protection
- For some cases, DNS records can be adjusted to point to scrubbing farms

From the interview perspective, call out that BGP based diversion is standard for L3 and L4 protection at scale, because it works at routing level and does not require per client changes.

Step 5 Scrub traffic at L3 and L4

Inside the scrubbing center, high capacity devices or software clusters apply strict but efficient rules. Typical controls:

Stateless packet filters based on source IP ranges, ports, malformed headers
Rate limits per source IP or per subnet
Connection validation such as SYN cookies and aggressive timeouts
Protocol correctness checks to drop abnormal TCP or UDP patterns

The goal is to cut down the majority of clearly malicious flows with simple, cheap checks, so deeper L7 inspection can focus on the remaining subset.

Step 6 Scrub traffic at L7

For application layer DDoS and API abuse, the scrubbing center can integrate:

Web application firewall rules based on paths, headers, and parameters
Captcha or challenge response for suspicious clients
Bot detection using fingerprints, cookies, and JavaScript challenges
Request rate limits per user, per IP, per token, or per account
Prioritization of authenticated users or critical API paths

Here you tie into business specific knowledge. For example, protecting checkout endpoints more strictly than a public marketing page, or placing special limits on expensive search or recommendation APIs.

Step 7 Send clean traffic back to origin

After scrubbing, valid traffic needs to reach your actual service with minimal disruption:

Scrubbing center encapsulates packets into a tunnel to your data center
Your edge routers terminate the tunnel and forward packets into your private network
Load balancers and application servers process these as normal requests

Some designs keep scrubbing always in the path, others only divert during an active attack. Always on protection simplifies routing and makes behavior predictable, but may add extra latency and cost. On demand diversion saves cost but requires fast automation and good runbooks.

Step 8 Observability, controls, and human response

Finally, a DDoS design is incomplete without visibility and controls:

Dashboards for traffic levels at each layer
Attack summaries by vector, source, and duration
Controls to adjust thresholds or manually block or allow segments
Playbooks for SRE and security teams

In an interview, mention that you would design good metrics such as clean traffic rate, false positive rate, and time to mitigation, and you would integrate alerts with incident response tooling.

Real World Example

Imagine protecting a social media platform similar to Instagram. Critical entry points are:

Feed and profile pages
Media upload APIs
Login and signup flows
Public content viewing endpoints used by web and mobile apps

A realistic DDoS design could look like this:

All traffic goes through a global CDN or DDoS provider with anycast IPs
Static media such as images and videos are cached heavily at the edge, which naturally absorbs a lot of GET floods
Dynamic APIs for login, posting, and feed queries pass through L7 inspection and rate limiting rules at the provider or at your own API gateway
When a volumetric attack is detected, the provider automatically diverts that IP range through scrubbing centers, drops most malicious packets, and tunnels valid ones to the origin regions
Internally, each region has local rate limiters, circuit breakers, and back pressure to stop remaining malicious or abusive requests from taking down stateful components like databases or message queues

From a system design interview angle, you can sketch this as concentric layers of defense around core services. You can highlight that for mobile users you care a lot about latency, so you prefer to keep scrubbing near the user through many edge locations.

Common Pitfalls or Trade offs

Some typical pitfalls and trade offs to mention in interviews:

Over reliance on a CDN alone
- CDNs help with L7 and caching, but pure volumetric L3 floods can still saturate links without proper scrubbing centers and BGP diversion
Ignoring application layer attacks
- Many candidates only talk about bandwidth floods, and forget that a modest number of costly API calls can take down a database cluster
False positives harming real users
- Aggressive rules might block enterprise customers coming from carrier NATs or shared proxies
- Trade off between safety and friction for real clients
No protection for internal service to service paths
- If your internal network or management APIs are exposed on the internet, they also need some protective controls or strict network separation
Lack of capacity planning with providers
- Scrubbing centers and upstream peers need to be sized to absorb worst case attacks for your traffic profile
Missing layered defense
- A strong design uses multiple layers at L3, L4, and L7, plus application level back pressure, not just a single filter

The key message for interviews is that DDoS protection is about balance. You balance cost with risk, latency with security, and automated defenses with human oversight.

Interview Tip

A common interview pattern is:

Suppose your product uses a single region with a load balancer front door. How would you evolve this to protect against massive DDoS attacks while keeping latency low for a global user base

A strong answer:

Starts with a simple baseline and evolves the design step by step
Introduces a global DDoS provider and anycast addresses
Explains BGP diversion and scrubbing centers for volumetric attacks
Adds L7 controls such as rate limiting and captchas for abuse on login and search APIs
Mentions trade offs, for example always on versus on demand scrubbing, cost considerations, and impact on user experience

If you connect this to other system design interview topics such as API gateways, rate limiters, and multi region failover, you show that you think like an architect, not just as an implementer of one feature.

Key Takeaways

DDoS attacks span L3, L4, and L7, so you need layered defenses at network, transport, and application layers
Scrubbing centers absorb and filter large traffic floods, then send only valid traffic back to your origin through tunnels
BGP diversion and anycast IPs are key tools to route attack traffic into scrubbing centers at internet scale
Application layer protections such as WAF rules, captchas, and smart rate limiting are essential for modern API heavy products
Good observability and playbooks are part of the design, not an afterthought, especially for real production operations

Table of Comparison

Approach	Protected layers	Scalability	Latency impact	Typical use case
On premise firewall or load balancer only	Mostly L4 and some L7	Limited by your data center bandwidth and hardware	Low for normal traffic but may fail under large attacks	Small to medium services with modest attack surface
CDN or WAF without scrubbing centers	L7 plus partial L4 filtering	Better than on premise but still vulnerable to large L3 floods	Small added latency through edge nodes	Web heavy workloads where caching and WAF are the primary defenses
DDoS provider with global scrubbing centers	L3 plus L4 and L7 layered inspection	Can absorb very high bandwidth and packet rates when sized correctly	Slightly higher but consistent latency due to scrubbing path	Large global platforms that must stay online during intense attacks

FAQs

Q1. What is a scrubbing center in DDoS protection

A scrubbing center is a high capacity facility provided by a network or security vendor where incoming traffic is redirected during a DDoS attack. The center filters malicious packets at L3 and L4, applies application level checks at L7, and then forwards only clean traffic to your origin servers, often through secure tunnels.

Q2. How is traffic redirected to a DDoS scrubbing center during an attack

Most providers use BGP diversion. They start advertising your IP prefix from their scrubbing centers so that internet routers send traffic there instead of directly to your data center. After scrubbing, valid traffic is tunneled back to your origin. In some cases, DNS changes or gateway reconfiguration are used for application level redirection.

Q3. What is the difference between L3, L4, and L7 DDoS attacks

L3 attacks target raw network bandwidth, usually with packet floods. L4 attacks target transport protocols such as TCP and UDP, for example with SYN floods or connection exhaustion. L7 attacks hit the application logic, such as login or search endpoints, with many expensive requests that can overload databases or compute services.

Q4. Do I still need rate limiting if I use a DDoS scrubbing provider

Yes. Scrubbing providers handle large scale attacks and obvious bots, but you still need fine grained rate limiting inside your application stack to protect specific endpoints, tenants, or users. Internal rate limits can enforce business rules such as per user request budgets that a generic provider cannot easily know.

Q5. How does DDoS protection impact latency for real users

Scrubbing introduces an extra hop because traffic flows through the provider before reaching your origin. With a global edge network, this extra latency is usually small, but it can be noticeable for very latency sensitive applications. This is why many designs combine global edge locations with intelligent routing so that clean traffic travels the shortest reasonable path.

Q6. How should I explain DDoS protection with scrubbing centers in a system design interview

Start with the requirements and threat model, then show a layered architecture: anycast edge, DDoS provider, scrubbing centers, tunnels to origin, and application level rate limiting. Mention how detection, diversion, and scrubbing work, and highlight trade offs between cost, latency, and level of protection.

Further Learning

If you want to deepen your understanding of internet facing architectures, traffic management, and end to end system design for interviews, a structured path helps a lot.

You can build strong fundamentals around networks, load balancers, and security focused design patterns with Grokking System Design Fundamentals. It walks through core concepts that appear again and again in DDoS and resilience discussions, such as replication, partitioning, and caching.

Once you are comfortable with the basics, you can level up with real interview style deep dives in Grokking the System Design Interview. It covers end to end designs for large scale services where DDoS protection, global traffic routing, and high availability strategies fit naturally into the story you present to interviewers.

For even more scale focused topics such as advanced traffic management, global routing, and multi region architectures, you can continue into Grokking Scalable Systems for Interviews, which is ideal if you are targeting senior or staff level roles.