How do you set up synthetic monitoring and availability probes?

Synthetic monitoring simulates user actions from outside your system to validate availability and performance. Availability probes operate inside your infrastructure to determine if an instance or service should receive traffic. Used together, they provide both external and internal visibility into system health, improving reliability and reducing downtime.

Why It Matters

In a system design interview, understanding synthetic monitoring and probes shows you can think beyond building features. You demonstrate readiness for real-world reliability challenges. Synthetic checks reveal user-facing issues like certificate expiry or CDN failure. Probes ensure that unhealthy instances are removed from load balancers automatically. Together, they form the foundation of an observable and resilient architecture.

How It Works (Step-by-Step)

1. Define Critical Journeys Identify core user flows such as login, checkout, or video playback. These are your “golden paths” that synthetic checks will simulate.

2. Choose Check Types

  • HTTP Checks: Validate URL reachability and expected responses.
  • API Checks: Test request-response correctness with sample payloads.
  • Browser Journeys: Use headless browsers to perform real clicks and flows.
  • Network Checks: Measure DNS, TCP, and TLS handshake success.

3. Select Vantage Points Run synthetic tests from multiple regions and networks. This helps detect geo-specific issues and routing errors.

4. Configure Frequency and Timeouts Critical flows can run every 1 minute while less critical checks may run every 5 minutes. Apply appropriate timeouts to avoid false positives.

5. Define Alerting Rules Alert only after multiple consecutive failures across different regions. Use aggregation and suppression to minimize noise.

6. Build Health Endpoints for Probes

  • Liveness Probe: Checks if the app process is running.
  • Readiness Probe: Confirms if the service is ready to accept traffic.
  • Startup Probe: Ensures slow-booting services are not restarted prematurely.

7. Integrate with Load Balancers and Orchestrators Orchestrators like Kubernetes use readiness and liveness probes to control traffic routing and restarts. Configure intervals and thresholds to prevent flapping.

8. Automate Pre- and Post-Deployment Checks Run synthetic tests before deployment to ensure baseline health, and after rollout to catch regressions early.

9. Trend and Analyze Data Track metrics such as availability percentage, latency distribution, and regional failure rate. Use this data to adjust Service Level Objectives (SLOs).

Real-World Example

Consider Amazon’s e-commerce platform. Synthetic checks simulate “add to cart” and “checkout” every minute from different continents. Availability probes run inside each EC2 instance, marking them “ready” only after caches are loaded and dependencies are healthy. When a data center experiences latency spikes, synthetic checks flag the issue, while probes help Kubernetes drain unhealthy pods automatically.

Common Pitfalls or Trade-offs

  • Overly Strict Probes: Services may restart unnecessarily during transient dependency failures.
  • Unrealistic Synthetic Flows: Mocked logins or bypassed security can hide real user issues.
  • Single Region Monitoring: Misses regional failures or CDN misconfigurations.
  • Alert Fatigue: Poorly tuned alert thresholds create noise and desensitize teams.
  • Blind Spots: Synthetic checks don’t reflect load conditions; combine with telemetry and tracing.

Interview Tip

Interviewers often test your operational mindset by asking, “How would you detect an expired TLS certificate before users are affected?” A great answer is to describe an automated daily TLS synthetic check that triggers alerts 30 days before expiry and blocks deployment if under 7 days.

Key Takeaways

  • Combine synthetic monitoring (external) and availability probes (internal) for complete coverage.
  • Run checks from multiple regions to detect user-facing issues early.
  • Tune probes to prevent flapping or unnecessary restarts.
  • Store, trend, and visualize metrics to improve reliability over time.
  • Automate pre-release and post-release synthetic validations.

Table of Comparison

ApproachPrimary GoalSignal SourceStrengthsLimitationsBest Use Case
Synthetic MonitoringValidate real user journeysExternal agentsDetects DNS, TLS, CDN issuesMay miss load-induced problemsTop user flows and post-deploy checks
Availability ProbesControl routing and restartsInternal serviceFast failure detectionLimited to internal viewManaging service health in clusters
Real User MonitoringMeasure actual experienceUser devicesTrue user perspectiveSlow detectionLong-term UX trends
Tracing & MetricsExplain internal causesApplication telemetryDeep insightsNeeds instrumentationDebugging and performance tuning

FAQs

Q1. What is the main difference between liveness and readiness probes?

Liveness checks if the process is still running, while readiness confirms if the instance can serve traffic. Use liveness for recovery, readiness for routing.

Q2. How often should I run synthetic checks?

Run top business flows every minute from at least two regions, and secondary flows every five minutes for balanced coverage.

Q3. Can probes check external dependencies?

Ideally, no. Readiness should verify only what’s essential for safe traffic. External dependency checks belong in synthetic monitoring.

Q4. How can I prevent false positives in synthetic checks?

Run tests from multiple regions, randomize timing, require multiple consecutive failures, and verify both status code and response content.

Q5. What metrics should I collect from probes and synthetic checks?

Collect uptime percentage, latency percentiles (p95/p99), regional distribution, and failure cause categories such as DNS, TLS, or application errors.

Q6. How do I secure health endpoints?

Expose only minimal information, restrict access to internal networks, and avoid sensitive data in probe responses.

Further Learning

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.