Day 120 of 133

Latency vs accuracy: caching, distillation, cascades + DSA review

What you actually do when accuracy is great but P99 is too high.

DSA · NeetCode Trees

Same TreeDSA · Trees
Interview questions to prep
1. Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
2. What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
3. What's the relationship between this problem's invariant and the BST property (if any)?

Latency vs accuracy: caching, distillation, cascadesML System DesignEugene Yan
Interview questions to prep
1. What levers do you pull when accuracy is great but latency misses the budget?
2. Walk through where you'd add caching in a RAG + LLM pipeline to halve P99.
Multi-model routing & cascadesML System DesignAnyscale
Interview questions to prep
1. How would you design a cascade: cheap model first, expensive only when needed?
2. What's the right verifier for the cheap model's output — and when does it dominate cost?
Cold-start strategies (recsys, search, ads)ML System DesignEugene Yan
Interview questions to prep
1. Walk through cold-start strategies for new users vs new items.
2. Compare bandit-based exploration vs content-based bridges for cold start — when does each fit?
Privacy-preserving ML: federated, DP, on-deviceML System DesignGoogle
Interview questions to prep
1. When would you reach for federated learning vs differential privacy vs on-device inference?
2. What's the accuracy cost of DP-SGD at typical ε values, and how do you decide if it's acceptable?

References & further reading