Design from problem to production

ML System Design

Learn how to structure open-ended interviews around requirements, data, training, serving, monitoring, and trade-offs.

Featured topics

7 topic cards built for interview prep

Each topic includes a summary, practical learning goals, representative interview prompts, and a suggested roadmap day.

Intermediate85 minDay 58

ML System Design Framework

Use a repeatable framework to drive an ambiguous ML system design interview from start to finish.

Learning objectives

  • Clarify product requirements, data availability, and evaluation metrics
  • Break the system into training, inference, feedback, and monitoring layers
Intermediate90 minDay 58

Requirements, Metrics, and Scope in ML System Design

Start ML system design interviews with product goals, users, constraints, labels, metrics, baselines, and failure modes.

Learning objectives

  • Clarify requirements before proposing models
  • Separate product, model, system, and guardrail metrics
Advanced90 minDay 60

Feature Stores and Training-Serving Consistency

Learn when feature stores help, where they add overhead, and how they relate to freshness and parity.

Learning objectives

  • Contrast online and offline feature stores
  • Explain freshness, backfills, and point-in-time correctness
Advanced100 minDay 61

Online Serving, Batch Scoring, and Caching

Reason about latency budgets, retrieval tiers, fallbacks, and cost-aware inference paths.

Learning objectives

  • Choose between online, asynchronous, and batch inference patterns
  • Explain where caching helps and where it silently hurts freshness
Advanced120 minDay 63

Retrieval, Ranking, and Recommendation Systems

Prepare multi-stage recommender and search designs with retrieval, ranking, reranking, diversity, freshness, and feedback loops.

Learning objectives

  • Design candidate generation, ranking, reranking, and exploration layers
  • Handle cold start, position bias, diversity, and delayed labels
Advanced120 minDay 68

Search, Ads, and Feed Ranking Design

Prepare for high-frequency ranking systems with query understanding, auctions, personalization, calibration, and latency budgets.

Learning objectives

  • Compare search, ads, and feed ranking architectures
  • Reason about retrieval, auction, relevance, calibration, and business constraints
Advanced110 minDay 70

Real-Time Fraud, Risk, and Abuse Systems

Cover low-latency risk scoring, graph features, rules plus ML, delayed labels, investigation queues, and adversarial adaptation.

Learning objectives

  • Design real-time feature generation and low-latency inference
  • Use rules, ML, graph signals, and human review together

Practice prompts

Daily-plan topics tied directly to this pillar

These are pulled from the same 133-day roadmap content used by Browse Questions.

Day 106ML System Design · Framework

7-step framework: clarify → metrics → data → model → infra → eval → edge cases

  • Walk me through your 7-step framework for any ML system design interview.
  • How do you avoid running out of time on the model section?
Day 106ML System Design · Framework

Clarifying questions: scope, scale, latency, constraints

  • What are the first five clarifying questions you ask in any ML system design interview?
  • How do you confirm the business metric vs the ML metric without burning 10 minutes on it?
Day 106ML System Design · Framework

Online vs offline metrics, business metrics

  • How do you map a business metric to an offline ML metric?
  • Walk through three real cases where offline gains didn't translate online.
Day 106ML System Design · Framework

Product ML judgment: first principles, Rules of ML, technical debt

  • How would you map a vague product opportunity into data, labels, baseline, metrics, and launch criteria?
  • When should you avoid ML and ship a heuristic or rules-based product first?
Day 107ML System Design · Recommendations

Design YouTube/TikTok feed recommendations

  • Walk me through designing TikTok's For You feed end-to-end.
  • How would you handle the cold-start problem for a new user?
Day 107ML System Design · Recommendations

Two-stage candidate generation + ranking

  • Why do industrial recsys use a two-stage (recall + rank) architecture?
  • How do you avoid training/serving skew between recall and ranking?
Day 108ML System Design · Search ranking

Design Google/Amazon search ranking

  • Walk me through designing Amazon's product search.
  • How do you balance relevance, freshness, and diversity in search ranking?
Day 108ML System Design · Search ranking

Learning-to-rank: pointwise, pairwise, listwise

  • Compare pointwise, pairwise, and listwise LTR — when does each fit?
  • Why does pairwise LTR usually beat pointwise on click-data, even though pointwise is simpler?