System Design

Online Serving, Batch Scoring, and Caching

Reason about latency budgets, retrieval tiers, fallbacks, and cost-aware inference paths.

Recommended on day 61100 minutesAdvanced

Learning objectives

  • Choose between online, asynchronous, and batch inference patterns
  • Explain where caching helps and where it silently hurts freshness
  • Connect latency budgets to feature design and model complexity

Interview prompts

  • How do you design a low-latency fraud scoring service?
  • When should you use a candidate generation and ranking split?