On this page
What This Blog Covers
The Load Balancer Mistakes That Quietly Kill Your Interview Score


On This Page
What This Blog Covers
What This Blog Covers
- Why "add a load balancer" is the most common wasted opportunity in system design interviews
- The six load balancer mistakes that interviewers notice but never tell you about
- L4 vs L7: the distinction that most candidates get wrong
- Why round robin is almost never the right answer in interviews
- Health checks, sticky sessions, and the details that signal engineering depth
- How to discuss load balancing in 60 seconds and sound like you have run production systems
Every system design interview includes a load balancer. It sits at the top of every architecture diagram, the first box the candidate draws.
And for most candidates, it is also the last thing they think about. They draw a box labeled "LB," draw arrows to a few servers, and move on to the database layer.
This is a missed opportunity.
The load balancer is one of the easiest components to discuss well, and one of the most common components to discuss poorly. Interviewers notice. They do not say anything. They just mark the evaluation: "candidate treated load balancing as a checkbox, not a design decision."
The mistakes are subtle. None of them will fail your interview outright.
But each one quietly chips away at your score, signaling that you have studied system design from diagrams rather than from operating real systems.
Six mistakes, compounded across 45 minutes, can be the difference between "strong hire" and "borderline."
This guide covers the six mistakes, why they matter, and exactly what to say instead.
Mistake 1: Treating the Load Balancer as a Black Box
Most candidates draw a load balancer and never mention it again. They do not specify what type of load balancer, what algorithm it uses, whether it operates at L4 or L7, or how it handles server failures. The load balancer becomes a magic box that "distributes traffic."
Why interviewers notice: A load balancer is not one thing. It is a category of solutions with different capabilities. An L4 load balancer (TCP/UDP level) makes routing decisions based on IP addresses and port numbers. It is fast and simple but cannot make decisions based on request content. An L7 load balancer (application level) can inspect HTTP headers, URLs, and cookies. It can route different API endpoints to different backend services, terminate SSL, and apply rate limiting.
What to say instead: "I would use an L7 load balancer here because we need to route API requests to different backend services based on the URL path. Specifically, /api/users routes to the user service, /api/orders routes to the order service. An L4 load balancer could not do this path-based routing because it does not inspect the HTTP layer. The trade-off is that L7 adds approximately 1 to 2 milliseconds of latency for header inspection, which is negligible for our use case."
Input your markdown text here
Mistake 2: Defaulting to Round Robin Without Justification
When interviewers ask "what algorithm does your load balancer use?", the overwhelming majority of candidates say "round robin." Most of them cannot explain why, or what alternatives exist.
Round robin distributes requests sequentially: server 1, server 2, server 3, server 1, server 2, server 3. It is the simplest algorithm and works when all servers are identical and all requests have roughly equal cost.
Why it is usually wrong: In real systems, not all requests have equal cost. A request to generate a PDF report takes 500ms. A request to return a cached user profile takes 5ms. Round robin sends both to the next server in sequence regardless of how busy that server already is. The server processing the PDF report gets another request while still working on the heavy one. The server that just returned a cached profile sits idle.
What to say instead: "I would use least-connections for the load balancing algorithm because our API has mixed workload: some endpoints are CPU-intensive (report generation) and others are fast (cached lookups). Least-connections routes each new request to the server with the fewest active connections, which naturally balances load across uneven request costs. Round robin would work if all requests had similar latency, but with a 100x variance in request cost, it creates hot spots."
Other algorithms to know:
- Weighted round robin: Distributes more requests to more powerful servers. Useful when servers have different hardware specifications.
- IP hash: Routes all requests from the same client IP to the same server. Useful for basic session affinity without cookies.
- Least response time: Routes to the server with the fastest recent response time. More sophisticated than least-connections but requires latency tracking.
Mistake 3: Forgetting Health Checks
Candidates draw 5 backend servers behind a load balancer. They never mention what happens when one of those servers crashes. The load balancer continues sending traffic to a dead server, and 20% of requests fail.
Why interviewers notice: Health checks are not an advanced concept. They are a basic operational requirement. A load balancer without health checks is a traffic distributor that sends requests to dead servers. Every production load balancer has health checks configured. Mentioning them demonstrates that you think about failure, not just the happy path.
What to say instead: "The load balancer performs health checks on each backend server every 10 seconds by sending an HTTP GET to /health. If a server fails 3 consecutive checks, it is removed from the rotation. When it passes 2 consecutive checks, it is re-added. This means a server failure is detected within 30 seconds. During that window, the load balancer's retry logic sends failed requests to a healthy server, so users experience a retry delay (approximately 100ms) rather than an error."
Mistake 4: Ignoring Sticky Sessions When They Matter
Some systems require that all requests from a single user session go to the same backend server. Authentication state stored in server memory, shopping cart data in local storage, or WebSocket connections that are inherently server-specific.
Candidates rarely mention session affinity, even when their design requires it.
Why interviewers notice: If you design a system where the application server stores session data in memory (which is common for early-stage architectures), your load balancer must support sticky sessions. Without it, a user's first request might go to server A (which creates the session), and their second request goes to server B (which has no session data, so the user appears logged out).
What to say instead: "Since the application server stores session tokens in memory, I need session affinity at the load balancer. I would use cookie-based sticky sessions: the load balancer inserts a cookie on the first response that identifies the backend server. Subsequent requests from the same browser include this cookie, and the load balancer routes to the same server. The trade-off is that if that server crashes, the user loses their session and must re-authenticate. A better long-term solution is externalizing session state to Redis, which eliminates the need for sticky sessions and makes the application servers truly stateless."
The senior follow-up: "Externalizing session state to Redis costs approximately 1ms per request (the Redis lookup), but it means any server can handle any request. This simplifies scaling (just add servers), deployment (rolling restarts do not kill sessions), and failure handling (server crash loses nothing). I would start with sticky sessions for a quick launch, then migrate to Redis sessions as the system matures."
Mistake 5: Not Distinguishing Load Balancer From API Gateway
Many candidates use "load balancer" and "API gateway" interchangeably. They are related but different components with different responsibilities.
A load balancer distributes traffic across multiple instances of the same service. Its job is horizontal scaling and high availability.
An API gateway sits in front of multiple different services and handles cross-cutting concerns: authentication, rate limiting, request transformation, API versioning, and routing to different backend services based on the request path.
Why interviewers notice: When a candidate says "the load balancer handles authentication and rate limiting," the interviewer knows the candidate has confused the two concepts. Load balancers distribute traffic. API gateways enforce policies. In many architectures, you have both: an API gateway that handles authentication and routes to the correct service, and a load balancer in front of each service that distributes traffic across instances.
What to say instead: "Requests first hit the API gateway, which validates the JWT token, applies rate limiting (100 requests per minute per user), and routes the request to the correct service based on the URL path. Behind each service, a load balancer distributes requests across the service's instances using least-connections. The API gateway handles cross-cutting concerns. The load balancer handles horizontal scaling."
Mistake 6: Single Load Balancer as a Single Point of Failure
Candidates draw one load balancer at the top of their architecture. The interviewer asks: "What happens if the load balancer goes down?" The candidate has no answer because they never considered that the component responsible for availability is itself a single point of failure.
Why interviewers notice: This is a foundational distributed systems concept. Every component that is critical to the system's availability must itself be highly available. A single load balancer means that one hardware failure or one software crash takes down the entire system, regardless of how many backend servers you have.
What to say instead: "I would deploy the load balancer in an active-passive pair. The primary load balancer handles all traffic. The secondary monitors the primary using a heartbeat mechanism. If the primary fails, the secondary takes over using virtual IP (VIP) failover, typically within 2 to 5 seconds. For cloud deployments, managed load balancers like AWS ALB and Google Cloud Load Balancer handle this redundancy automatically. I would use a managed service rather than self-hosting to eliminate this operational burden."
The advanced version: "For global traffic distribution, I would use DNS-based load balancing (GeoDNS or anycast) to route users to the nearest regional load balancer. Each region has its own active-passive load balancer pair. This provides both geographic latency optimization and fault isolation: a load balancer failure in US-East does not affect users routed to EU-West."
Bonus Mistake: Ignoring SSL Termination
Candidates almost never mention where SSL/TLS is terminated. This might seem like an implementation detail, but it has significant architectural implications.
SSL termination at the load balancer means the load balancer decrypts HTTPS traffic and forwards plain HTTP to the backend servers. This offloads the CPU-intensive encryption/decryption work from the application servers.
The benefit: backend servers run faster because they do not perform TLS handshakes.
The trade-off: traffic between the load balancer and backend servers is unencrypted. In a secure network (same VPC, same data center), this is usually acceptable. In a multi-tenant or compliance-sensitive environment (HIPAA, PCI-DSS), you may need end-to-end encryption.
SSL passthrough means the load balancer forwards encrypted traffic directly to the backend server without decrypting it.
The backend handles TLS.
The benefit: end-to-end encryption.
The trade-off: the load balancer cannot inspect request content, so it operates at L4 only (no path-based routing, no cookie-based sticky sessions, no header inspection).
What to say: "I would terminate SSL at the load balancer to offload TLS processing from the application servers. Traffic between the load balancer and backend servers runs over plain HTTP within our VPC. If we needed end-to-end encryption for compliance, I would use SSL passthrough but lose L7 routing capabilities, or re-encrypt traffic between the LB and backends (which adds latency but satisfies compliance)."
Global Load Balancing: The Multi-Region Answer
For systems that serve global users, a single regional load balancer is not enough.
Candidates who only draw one load balancer in one region miss the opportunity to demonstrate global infrastructure thinking.
DNS-based global load balancing routes users to the nearest region before they even reach a load balancer. A user in Tokyo resolves the API domain to the Tokyo region's IP address. A user in Berlin resolves to the Frankfurt region's IP.
How it works: The global DNS service (Route 53, Cloud DNS, Cloudflare) returns different IP addresses based on the user's geographic location. Each IP points to a regional load balancer. Each regional load balancer distributes traffic across that region's application servers.
Failover at the DNS level: If the Tokyo region goes down entirely, the DNS service detects the health check failure and stops returning Tokyo's IP address. Users in Tokyo are automatically routed to the next closest region (perhaps Singapore or US-West). This failover takes 30 to 60 seconds (DNS TTL propagation).
What to say: "For our global user base, I would add a DNS-based global load balancing layer. Route 53 with latency-based routing directs users to the nearest of our three regions: US-East, EU-West, and AP-Southeast. Each region has its own ALB distributing traffic across application servers. If a region fails, Route 53 health checks detect it within 30 seconds and stop routing traffic there. This gives us both latency optimization and regional fault isolation."
Putting It All Together: Interview Walkthrough
Here is how load balancing comes up in a real system design interview and how to handle it without making any of the six mistakes.
Interviewer: "Design a food delivery application."
You (during the infrastructure discussion): "For the load balancing layer, I would use two levels. First, a DNS-based global load balancer using Route 53 with latency-based routing to direct users to the nearest regional deployment. Second, within each region, an L7 application load balancer, specifically AWS ALB."
"L7 because we need path-based routing: /api/restaurants goes to the restaurant service, /api/orders goes to the order service, /api/delivery goes to the delivery tracking service. The algorithm is least-connections because order creation involves database writes and payment processing that take 200 to 500 milliseconds, while restaurant listing is a cached read that returns in 10 milliseconds. Round robin would overload servers processing orders."
"Health checks run every 10 seconds on each service's /health endpoint. A service instance is removed after 3 consecutive failures and re-added after 2 successes. The ALB is a managed service, so redundancy is handled automatically."
"SSL terminates at the ALB. Backend traffic runs over HTTP within the VPC. All services are stateless with sessions in Redis, so no sticky sessions needed. This means I can scale any service independently by adding instances behind its load balancer."
This answer takes 60 to 90 seconds and covers every dimension: global routing, L4 vs L7 decision, algorithm with justification, health checks with parameters, SSL termination, high availability, and session handling.
The interviewer heard zero of the six mistakes and six positive signals.
Notice what this answer does not do: it does not spend 5 minutes explaining what a load balancer is. It does not draw a single box labeled "LB" and move on. It treats the load balancer as a design decision with trade-offs, just like every other component.
That is the shift: from "load balancer as a checkbox" to "load balancer as a deliberate architectural choice."
The interviewer walks away knowing that you understand infrastructure at a level that goes beyond diagrams.
You have discussed routing algorithms, failure detection, SSL architecture, session management, and global distribution in under 90 seconds. That density of engineering insight, packed into a short discussion of a single component, is what separates a "strong hire" from "borderline."
And it started with simply not treating the load balancer as a black box.
How to Discuss Load Balancing in 60 Seconds
Here is the structured answer that hits every signal interviewers evaluate:
"For the load balancing layer, I would use an L7 load balancer (something like AWS ALB or Nginx) with least-connections routing. L7 because we need path-based routing to different backend services, and least-connections because our API has mixed request latency. Health checks run every 10 seconds on the /health endpoint, with a 3-failure threshold for removal and 2-success threshold for re-addition. The load balancer is deployed as a managed service for automatic redundancy. For session handling, the backend services are stateless with sessions externalized to Redis, so no sticky sessions are needed. This keeps the servers interchangeable and simplifies both scaling and deployment."
This answer takes 30 to 45 seconds and covers: type (L7), algorithm (least-connections with justification), health checks (with specific parameters), high availability (managed service), and session handling (stateless design). No wasted words. Every sentence carries a signal.
For structured practice incorporating load balancing decisions into complete system designs, the Grokking the System Design Interview course covers infrastructure design patterns including load balancer placement, algorithm selection, and failure handling.
Conclusion: Key Takeaways
-
Do not treat the load balancer as a black box. Specify L4 vs L7, name the algorithm, mention health checks. These details take 30 seconds and signal operational depth.
-
Round robin is rarely the right answer. With mixed request latency, least-connections distributes load more evenly. Name the algorithm and justify it.
-
Always mention health checks. Interval, failure threshold, recovery threshold. A load balancer without health checks sends traffic to dead servers.
-
Know when sticky sessions matter. If session state is in server memory, you need session affinity. The better answer is to externalize sessions to Redis and eliminate the need entirely.
-
Load balancer is not an API gateway. Load balancers distribute traffic. API gateways enforce policies (auth, rate limiting, routing). Many architectures use both.
-
Avoid the single point of failure. Active-passive pair for self-hosted. Managed service for cloud. GeoDNS for global distribution.
What our users say
Tonya Sims
DesignGurus.io "Grokking the Coding Interview". One of the best resources I’ve found for learning the major patterns behind solving coding problems.
Eric
I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.
Ashley Pean
Check out Grokking the Coding Interview. Instead of trying out random Algos, they break down the patterns you need to solve them. Helps immensely with retention!
Access to 50+ courses
New content added monthly
Certificate of completion
$29.08
/month
Billed Annually
Recommended Course

Grokking the System Design Interview
172,834+ students
4.7
Grokking the System Design Interview is a comprehensive course for system design interview. It provides a step-by-step guide to answering system design questions.
View Course