ML Interview Roadmap

System Design

Online Serving, Batch Scoring, and Caching

Reason about latency budgets, retrieval tiers, fallbacks, and cost-aware inference paths.

Recommended on day 61100 minutesAdvanced

Learning objectives

• Choose between online, asynchronous, and batch inference patterns
• Explain where caching helps and where it silently hurts freshness
• Connect latency budgets to feature design and model complexity

Interview prompts

• How do you design a low-latency fraud scoring service?
• When should you use a candidate generation and ranking split?

Prerequisites

ML System Design Framework Feature Stores and Training-Serving Consistency