Common security vulnerabilities and mitigation in distributed system designs
Security vulnerabilities in system design are architectural weaknesses that allow attackers to compromise the confidentiality, integrity, or availability of a distributed system. Broken access control remains the most prevalent vulnerability, found in 34% of applications tested according to OWASP data. In system design interviews, security is evaluated as a non-functional requirement alongside scalability, availability, and performance. Candidates who proactively discuss authentication, encryption, and threat mitigation without being prompted signal the senior-level thinking interviewers reward. In 2026, with increasing API-first architectures and microservices deployments, understanding how security vulnerabilities emerge from architectural decisions—and how to mitigate them at the design level—is essential for every engineer above mid-level.
Key Takeaways
- Security is a system design trade-off, not an add-on. Every architectural decision—database selection, API design, service-to-service communication, data storage—has security implications that should be discussed during the design phase.
- The six vulnerability categories that appear most frequently in system design contexts are: broken access control, injection attacks, insecure inter-service communication, data exposure, DDoS, and supply chain vulnerabilities.
- Defense in depth is the guiding principle: no single security control protects the system. Layer authentication, authorization, encryption, rate limiting, input validation, and monitoring so that a failure in one layer does not compromise the entire system.
- In interviews, discuss security at three points: during non-functional requirements ("What are our security and compliance constraints?"), during the deep dive ("I would encrypt data at rest and in transit"), and during trade-offs ("mTLS between services adds 2ms latency but prevents internal network attacks").
- Zero-trust architecture—where every request is authenticated and authorized regardless of network location—has become the default security model for distributed systems in 2026.
Why Security Matters in System Design Interviews
System design interviews are not security interviews. You will not be asked to write a penetration test or configure a WAF. But interviewers evaluate whether you consider security as part of your architecture, just as they evaluate whether you consider scalability or reliability.
At Amazon, "Security" is an explicit Leadership Principle. Interviewers note whether candidates discuss authentication, authorization, and data protection unprompted. At Meta and Google, proactively raising security during the trade-offs phase signals maturity. At the staff level and above, ignoring security entirely is a negative signal—it suggests you design systems for the happy path without considering adversarial conditions.
The key is knowing when and how much to discuss. You do not need to design a complete security architecture. You need to identify the most critical security concerns for the system you are designing and propose appropriate mitigations at the architectural level.
The Six Most Common Vulnerability Categories
1. Broken Access Control
What it is: Users accessing resources or performing actions they are not authorized for. This includes vertical privilege escalation (a regular user accessing admin functions), horizontal privilege escalation (a user accessing another user's data), and missing function-level access controls.
How it appears in system design: An API endpoint that returns user data based on a user_id parameter without verifying that the requesting user owns that user_id. A microservice that trusts all internal requests without authentication. An admin dashboard accessible to any authenticated user without role verification.
Mitigation strategies:
Implement authorization checks at every service boundary, not just at the API gateway. Use role-based access control (RBAC) or attribute-based access control (ABAC) consistently across all services. Follow the principle of least privilege—every service, user, and process gets the minimum permissions required. Validate object ownership on every request: when a user requests /api/orders/123, verify that order 123 belongs to the authenticated user before returning data.
Interview phrasing: "Every API endpoint validates both authentication (who is this?) and authorization (are they allowed to do this?). The order service checks that the requesting user_id matches the order's owner_id before returning data. This prevents horizontal privilege escalation—a user cannot access another user's orders by guessing order IDs."
2. Injection Attacks
What it is: Untrusted input is interpreted as code or commands by a backend system. SQL injection, NoSQL injection, command injection, and LDAP injection are the most common variants.
How it appears in system design: A search service that constructs database queries by concatenating user input. A notification service that passes user-supplied templates to a rendering engine without sanitization. Any system that accepts user input and passes it to an interpreter.
Mitigation strategies:
Use parameterized queries (prepared statements) for all database interactions—never concatenate user input into SQL strings. Validate and sanitize all input at the API gateway level before it reaches backend services. Apply allowlisting over denylisting: define what valid input looks like rather than trying to filter out malicious input. Use ORM frameworks that handle parameterization automatically.
Interview phrasing: "The search service uses parameterized queries for all database interactions. User input is validated at the API gateway—the query parameter is limited to alphanumeric characters and a maximum of 200 characters. This prevents SQL injection even if a downstream service has a bug in its query construction."
3. Insecure Inter-Service Communication
What it is: Services communicating over unencrypted channels or without mutual authentication, allowing attackers who gain access to the internal network to intercept, modify, or impersonate traffic between services.
How it appears in system design: Microservices communicating over plain HTTP within a VPC, assuming the network perimeter provides sufficient protection. A service that accepts requests from any source within the internal network without verifying the caller's identity. Message queues transmitting sensitive data without encryption.
Mitigation strategies:
Implement mutual TLS (mTLS) between all services so that both the client and server authenticate each other. Use a service mesh (Istio, Linkerd) to manage mTLS automatically without modifying application code. Encrypt all data in transit, including internal traffic. In zero-trust architecture, no network is trusted—every request must be authenticated regardless of origin.
Interview phrasing: "I would implement mTLS between all microservices using a service mesh like Istio. This ensures that the payment service only accepts requests from authenticated internal services, not from an attacker who has compromised a single container within the network. The trade-off is approximately 2ms of additional latency per request for the TLS handshake."
4. Data Exposure and Cryptographic Failures
What it is: Sensitive data (credentials, personal information, payment data) exposed through weak encryption, improper storage, excessive logging, or unprotected API responses.
How it appears in system design: Storing passwords in plaintext or with weak hashing (MD5, SHA-1). Returning sensitive fields (SSN, credit card numbers) in API responses that do not need them. Logging request bodies that contain authentication tokens or personal data. Storing encryption keys alongside the encrypted data.
Mitigation strategies:
Encrypt data at rest using AES-256 and data in transit using TLS 1.3. Hash passwords with bcrypt, scrypt, or Argon2—never MD5 or SHA-1. Manage encryption keys using a dedicated key management service (AWS KMS, GCP Cloud KMS, HashiCorp Vault)—never hard-code keys or store them in configuration files. Apply data minimization: API responses include only the fields the client needs. Mask sensitive data in logs—replace credit card numbers with the last four digits.
| Data State | Protection | Implementation |
|---|---|---|
| At rest | AES-256 encryption | Database-level encryption (Aurora, DynamoDB), encrypted S3 buckets |
| In transit | TLS 1.3 | HTTPS for external, mTLS for internal service-to-service |
| In use | Access controls, tokenization | Mask sensitive fields in logs, tokenize payment data |
| Key management | Hardware security modules | AWS KMS, GCP Cloud KMS, HashiCorp Vault |
Interview phrasing: "User passwords are hashed with bcrypt (cost factor 12) and stored in the authentication database. The payment service tokenizes credit card numbers through Stripe—we never store raw card data, which removes PCI DSS scope from our infrastructure. All database volumes are encrypted with AES-256 via AWS KMS."
5. Denial of Service (DDoS)
What it is: Overwhelming a system with traffic or requests to make it unavailable to legitimate users. Distributed denial-of-service attacks use botnets to generate traffic from thousands of sources simultaneously.
How it appears in system design: An API without rate limiting that allows a single client to send millions of requests per second. A system without auto-scaling that crashes under unexpected traffic spikes. A search endpoint that triggers expensive database queries, amplifying the impact of each request.
Mitigation strategies:
Implement rate limiting at the API gateway using token bucket or leaky bucket algorithms. Use a CDN (CloudFront, Cloudflare) as the first line of defense—CDNs absorb volumetric DDoS traffic at edge locations before it reaches your origin servers. Deploy AWS Shield or Cloudflare DDoS protection for network-layer attacks. Design endpoints to limit computational cost per request—cap search results, paginate responses, and timeout expensive queries.
Interview phrasing: "I would implement three layers of DDoS protection. First, CloudFront absorbs volumetric attacks at the edge. Second, the API gateway enforces rate limiting at 100 requests per second per authenticated user and 10 requests per second per IP for unauthenticated endpoints. Third, the search service has a 5-second query timeout—if a query takes longer, it is terminated to prevent a single expensive request from consuming all database connections."
6. Supply Chain and Dependency Vulnerabilities
What it is: Vulnerabilities introduced through third-party libraries, open-source dependencies, container base images, or CI/CD pipeline compromises.
How it appears in system design: A microservice using an outdated library with a known remote code execution vulnerability. A Docker base image containing unpatched system packages. A CI/CD pipeline where a compromised dependency injects malicious code during the build process.
Mitigation strategies:
Scan all dependencies with software composition analysis (SCA) tools (Dependabot, Snyk, FOSSA) in the CI/CD pipeline. Pin dependency versions and review updates before adopting them. Use minimal base images (Alpine, distroless) to reduce the attack surface of containers. Generate and maintain a Software Bill of Materials (SBOM) for every deployed service. Sign container images to verify their integrity before deployment.
Interview phrasing: "Every service runs on a distroless base image to minimize the attack surface. Dependabot scans dependencies weekly and creates automated pull requests for security patches. The CI/CD pipeline fails the build if any critical or high-severity vulnerability is detected in the dependency tree."
The Zero-Trust Security Model
Zero trust is the dominant security architecture for distributed systems in 2026. The core principle: never trust, always verify. Every request is authenticated and authorized regardless of whether it originates from inside or outside the network perimeter.
Traditional perimeter security: "Everything inside the VPC is trusted. Only external traffic is authenticated."
Zero-trust security: "Nothing is trusted. Every service-to-service call requires authentication (mTLS), authorization (policy check), and is encrypted (TLS). An attacker who compromises one service cannot move laterally to others."
In interviews, framing your security approach as zero-trust signals current thinking: "I would design this system with a zero-trust model. Every inter-service request is authenticated with mTLS and authorized against a centralized policy engine. Even if an attacker compromises the image processing service, they cannot access the payment service because the payment service rejects requests without valid mTLS certificates and the appropriate service identity."
For structured practice integrating security considerations into complete system design solutions, Grokking the System Design Interview covers non-functional requirements including security across 18 real-world design problems. For advanced security patterns in production-scale distributed systems—including multi-region encryption, secret management, and secure multi-tenant architectures—Grokking the Advanced System Design Interview builds the depth required for L6+ interviews.
How to Discuss Security in System Design Interviews
During requirements gathering: "Before I design the architecture, I want to clarify our security constraints. Does this system handle PII or payment data? What compliance requirements apply—GDPR, HIPAA, PCI DSS? What is our threat model—are we protecting against external attackers, insider threats, or both?"
During high-level design: "I am placing an API gateway in front of all services. The gateway handles authentication (JWT validation), rate limiting (100 req/s per user), and TLS termination. Internal services communicate over mTLS."
During deep dive: "The authentication service issues JWTs with 15-minute expiry and rotatable refresh tokens stored in a secure HTTP-only cookie. Passwords are hashed with bcrypt. Failed login attempts are rate-limited to 5 per minute per account to prevent brute force."
During trade-offs: "mTLS between all services adds approximately 2ms of latency per hop. For our use case—a payment system where security is paramount—this is an acceptable trade-off. For a low-latency gaming system, I might consider service mesh with automatic certificate rotation to minimize the overhead."
Frequently Asked Questions
How important is security in system design interviews?
Security is a non-functional requirement evaluated alongside scalability and reliability. You do not need to design a complete security architecture, but you should proactively identify the most critical security concerns for your system and propose appropriate mitigations. At Amazon, security is an explicit Leadership Principle. At the staff level, ignoring security is a negative signal.
What are the most common security vulnerabilities in distributed systems?
The six most common are: broken access control (34% of applications), injection attacks (SQL, NoSQL, command), insecure inter-service communication (unencrypted internal traffic), data exposure and cryptographic failures, denial-of-service attacks, and supply chain vulnerabilities from third-party dependencies.
What is zero-trust architecture?
A security model where every request is authenticated and authorized regardless of network location. No network is trusted—internal traffic receives the same scrutiny as external traffic. Implemented through mTLS between services, centralized policy engines, and the principle of least privilege. It is the dominant security model for distributed systems in 2026.
How do I protect data in a distributed system?
Encrypt data at rest (AES-256), in transit (TLS 1.3 for external, mTLS for internal), and manage keys using a dedicated KMS (AWS KMS, HashiCorp Vault). Hash passwords with bcrypt or Argon2. Apply data minimization—API responses include only necessary fields. Mask sensitive data in logs. Tokenize payment data to avoid storing raw card numbers.
What is the difference between authentication and authorization?
Authentication verifies identity (who is this user?). Authorization verifies permissions (is this user allowed to perform this action?). Both must be checked at every service boundary. A common vulnerability is checking authentication at the API gateway but skipping authorization in downstream services.
How should I discuss rate limiting in a system design interview?
Frame rate limiting as both a security and reliability mechanism. "The API gateway enforces rate limiting using a token bucket algorithm—100 requests per second per authenticated user. This protects against DDoS attacks and prevents a single misbehaving client from degrading service for all users. Unauthenticated endpoints have a stricter limit of 10 requests per second per IP."
What is mTLS and when should I use it?
Mutual TLS is a protocol where both the client and server authenticate each other using certificates. Use it for all service-to-service communication in a microservices architecture. It prevents an attacker who gains access to the internal network from impersonating a service. Service meshes like Istio automate mTLS certificate management.
How do I prevent injection attacks at the architecture level?
Use parameterized queries for all database interactions. Validate and sanitize input at the API gateway before it reaches backend services. Apply allowlisting (define valid input) over denylisting (filter malicious input). Use ORM frameworks that handle parameterization automatically. These are architectural decisions, not just coding practices.
Should I use HTTPS for internal service-to-service communication?
In a zero-trust model, yes. Internal traffic should be encrypted with mTLS. The traditional approach of relying on VPC security (encrypting only external traffic) is insufficient—if an attacker breaches one service, unencrypted internal traffic allows them to intercept credentials and data flowing between all other services.
How do I protect against supply chain attacks in system design?
Pin dependency versions, scan with SCA tools (Dependabot, Snyk) in CI/CD, use minimal container base images (distroless, Alpine), sign container images, and maintain an SBOM for every service. Fail the build on critical vulnerabilities. These practices are architectural decisions that should be mentioned when discussing deployment and operations.
TL;DR
Security vulnerabilities in distributed system design stem from architectural decisions—broken access control, injection attacks, insecure inter-service communication, data exposure, DDoS susceptibility, and supply chain weaknesses. Mitigate through defense in depth: authenticate and authorize at every service boundary, use parameterized queries, encrypt data at rest (AES-256) and in transit (TLS 1.3, mTLS), implement rate limiting at the API gateway, and scan dependencies in CI/CD. Adopt a zero-trust model where every request is verified regardless of network location. In interviews, discuss security at three points: during requirements (compliance constraints), during the deep dive (specific controls like JWT authentication, bcrypt hashing, mTLS), and during trade-offs (mTLS adds 2ms latency but prevents lateral movement). Proactively raising security without being asked signals the senior-level judgment interviewers reward.
GET YOUR FREE
Coding Questions Catalog

$197

$72

$78