Explain Model Serving vs Batch Scoring.

Model serving delivers predictions in real-time via APIs, while batch scoring generates predictions on large datasets at scheduled intervals.

When to Use

Model Serving: fraud detection at checkout, real-time recommendations, chatbots, search ranking, dynamic pricing.
Batch Scoring: nightly churn predictions, customer segmentation, credit risk assessments, forecasting, precomputing features for campaigns.

Example

An e-commerce site uses model serving for live product recommendations during browsing and batch scoring overnight to decide which customers should receive discount coupons the next morning.

Want to master these trade-offs?

Explore Grokking System Design Fundamentals, Grokking the Coding Interview, or practice with Mock Interviews with ex-FAANG engineers.

Why Is It Important

Choosing the right strategy impacts latency, scalability, and cost. Many production systems use a hybrid: batch scoring for bulk prep + serving for personalized, last-mile predictions.

Interview Tips

Start with clear definitions.
Compare along latency, throughput, cost, and data freshness.
Mention monitoring (drift detection, canaries), and explain when to combine both approaches.

Trade-offs

Model Serving:
- Fresh, user-specific, low latency
- Complex scaling, higher cost per request
Batch Scoring:
- Cost-efficient, reproducible, simpler pipelines
- Stale results, scheduling delays, less personalized