Day 120 of 133
Latency vs accuracy: caching, distillation, cascades + DSA review
What you actually do when accuracy is great but P99 is too high.
DSA · NeetCode Trees
- Same TreeDSA · Trees
Interview questions to prep
- Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
- What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
- What's the relationship between this problem's invariant and the BST property (if any)?
ML System Design · Cross-cutting trade-offs
Interview questions to prep
- What levers do you pull when accuracy is great but latency misses the budget?
- Walk through where you'd add caching in a RAG + LLM pipeline to halve P99.
Interview questions to prep
- How would you design a cascade: cheap model first, expensive only when needed?
- What's the right verifier for the cheap model's output — and when does it dominate cost?
Interview questions to prep
- Walk through cold-start strategies for new users vs new items.
- Compare bandit-based exploration vs content-based bridges for cold start — when does each fit?
Interview questions to prep
- When would you reach for federated learning vs differential privacy vs on-device inference?
- What's the accuracy cost of DP-SGD at typical ε values, and how do you decide if it's acceptable?
References & further reading
- Eugene Yan — applied ML writing ↗Eugene Yan
- vLLM docs ↗vLLM
- Anthropic — Prompt Engineering Guide ↗Anthropic