How do you handle clock skew (NTP/PTP) and monotonic time?

Clock skew is the difference in wall clock time between machines. In distributed systems even a small offset can break leases, expire tokens too early, or reorder events in confusing ways. Network Time Protocol and Precision Time Protocol keep wall clocks in sync, while monotonic time is a local clock that only moves forward and is ideal for measuring durations. Handling all three correctly is the foundation of reliable timestamps, timeouts, and ordering in a scalable architecture.

Why It Matters

Time shows up everywhere in a system design interview and in production systems. You set TTL on cache entries, calculate request timeouts, rotate keys, compute analytics windows, and decide who is leader based on a lease with an expiry. If clocks drift, a server may think a lease has expired when it has not. If the kernel steps the clock backward, a scheduler may believe a task completed in negative time. Using NTP or PTP for wall clocks and a monotonic clock for durations keeps your system safe from these surprises.

How It Works Step by Step

Separate concepts of time Wall clock time is calendar time used for human events, logs, and timestamps. Monotonic time is a steadily increasing clock that ignores manual clock changes and leap seconds. Use wall clock for labels. Use monotonic time for durations and timeouts.
Use a monotonic clock for every duration Read a monotonic clock at start and stop to compute elapsed time. Never subtract two wall clock timestamps to find a duration. This single habit prevents negative durations during clock steps and leap second events.
Choose NTP for general fleets and PTP for high precision zones NTP is widely available and good to a few milliseconds on local networks. PTP uses hardware timestamping and special switches to reach microsecond level accuracy in data centers.
Harden the time client Use multiple time sources for redundancy. Prefer clients that support modern security. Enable slew mode where possible to gradually adjust time. For PTP enable hardware timestamping on the network interface and use boundary or transparent clocks in switches.
Publish a clock error budget Decide a maximum tolerated offset for each tier, such as one millisecond for PTP zones and fifty milliseconds for general clusters. Expose metrics for offset, jitter, and root dispersion. Gate risky operations when the offset breaches budget.
Apply uncertainty to time based logic When expiring items or validating leases, subtract an uncertainty margin from the expiry or wait out the uncertainty window before declaring success. This mirrors the idea of commit wait to ensure the chosen timestamp is in the global past.
Use logical clocks for consistent ordering Avoid using wall clock to order cross node events with strong guarantees. Use Lamport clocks, vector clocks, or hybrid logical clocks to achieve reliable ordering without perfect sync.
Tame leap seconds with a consistent policy Choose either time smear or an inserted second and apply it across the fleet. Document the behavior so developers and testers know what to expect.
Make logs and tracing bilingual Log a wall clock timestamp for humans and a monotonic based duration for machines. Tracing spans should report elapsed time from monotonic sources.
Test with deliberate skew Run chaos tests that skew clocks on some hosts both forward and backward. Verify that leases, timeouts, and retention logic behave correctly and that alerts fire when offsets exceed budget.

Real World Example

Consider a global relational service that offers external consistency similar to Google Spanner. Each replica synchronizes with multiple time sources and exposes an uncertainty window. During a write the coordinator chooses a commit timestamp at the upper bound of that window, then performs a short commit wait so that once the client receives success, the timestamp is guaranteed to be in the past everywhere. Readers can safely use snapshot reads at any timestamp older than the current uncertainty. Inside each node, timeouts and backoff use a monotonic clock so local scheduling remains correct even if the wall clock changes. In networks that require very tight windows, the same cluster enables PTP with hardware timestamping to keep uncertainty small and throughput high.

Common Pitfalls or Trade offs

Using wall clock to compute durations which creates negative or absurd values when the clock is stepped
Depending on timestamps for total order of events across nodes rather than logical or hybrid logical clocks
Running a single time source which turns a routine outage into a fleet wide time jump
Ignoring uncertainty in leases which causes leaders to overlap during skew and split brain moments
Handling leap seconds differently across services which leads to misaligned windows and confusing logs
Forcing PTP everywhere which adds cost and operational complexity where millisecond accuracy is more than enough

Interview Tip

A favorite prompt is to ask you to design a session store with a thirty minute expiry across several regions. A strong answer says sessions carry an issue timestamp from the authority, services apply a safety margin equal to measured uncertainty, and all timeouts and backoff use a monotonic clock. Mention that you monitor offset and refuse session writes if clock error exceeds a threshold until sync recovers.

Key Takeaways

Treat wall clock and monotonic clock as different tools for different jobs
Use NTP for general sync and PTP where you need microsecond level accuracy
Add an explicit uncertainty margin to leases and time based invariants
For cross node ordering use logical or hybrid logical clocks rather than raw timestamps
Log wall clock for humans and measure durations with monotonic time for correctness

Table of Comparison

Concept	What it delivers	Typical accuracy	Best fit	Main drawback
NTP	Network wide wall clock sync at low cost	Few milliseconds on local networks	General computing fleets and web services	Not ideal for ultra tight ordering guarantees
PTP	Hardware assisted wall clock sync	Microsecond level in data centers	Trading, storage, high rate telemetry	Requires special switches and setup
Monotonic time	Stable local clock for durations and timeouts	Not absolute time, only relative	Retries, backoff, circuit breakers, schedulers	Cannot label events for humans
Lamport or vector clocks	Order without relying on wall clock	Logical only	Conflict resolution in stores and queues	Less intuitive than timestamps
Hybrid logical clocks	Blend of physical time and counters	Close to physical with small uncertainty	Global ordering with good performance	More complex to implement correctly
TrueTime style uncertainty	Timestamp interval with commit wait	Bounded by sync quality	Externally consistent transactions	Write latency rises with uncertainty
Time smear for leap seconds	Smooths the extra second over a day	Invisible to most apps	Large fleets with mixed stacks	Needs consistent fleet wide policy

FAQs

Q1. What is the practical difference between wall clock and monotonic time?

Wall clock aligns with calendars and is used for timestamps and reports. Monotonic time only moves forward and is used to measure durations and enforce timeouts. Never use wall clock to compute elapsed time.

Q2. When should I choose PTP instead of NTP?

Choose PTP when you need microsecond level accuracy, such as in trading, storage, or very low jitter telemetry. For most services NTP is sufficient and simpler to operate.

Q3. How do I make leases safe under clock skew?

Estimate the current uncertainty on each host and subtract a margin from lease duration, or perform a short wait before declaring a commit or a leadership claim valid. This prevents overlapping leaders.

Q4. How do leap seconds affect applications?

If the kernel inserts a leap second, time may appear to pause or repeat, which can confuse timestamp based logic. A smear avoids a sudden jump by spreading the adjustment over a day. Pick one policy and apply it everywhere.

Q5. How do I detect time drift in production?

Collect metrics from NTP or PTP clients including offset, jitter, and root dispersion. Alert when offsets breach your error budget. During incidents, hold writes that depend on strict ordering.

Q6. Can I rely on timestamps for total ordering across nodes?

Only if you can prove very small bounded uncertainty. In most systems prefer logical or hybrid logical clocks for ordering, and reserve physical timestamps for labels and coarse windows.

Further Learning

Build your timing instincts with Grokking System Design Fundamentals and master clocks, retries, and timeouts from first principles.
Practice end to end designs that use leases, sessions, and commit wait in Grokking the System Design Interview.
Dive deeper into distributed transactions and cross region consistency in Grokking Scalable Systems for Interviews with hands on case studies.