Day 73 of 133
Decoding strategies + speculative decoding + JSON mode + DSA Greedy
Greedy/beam/top-k/top-p/temperature; constrained generation.
DSA · NeetCode Greedy
- Jump Game IIDSA · Greedy
Interview questions to prep
- Prove the greedy choice — why is the locally-optimal pick safe globally? (Exchange argument or staying-ahead.)
- When does greedy fail on a similar-looking problem, and what would you reach for instead (DP, BFS)?
- Walk through edge cases that often break naive greedy: ties, negatives, single element.
- Gas StationDSA · Greedy
Interview questions to prep
- Prove the greedy choice — why is the locally-optimal pick safe globally? (Exchange argument or staying-ahead.)
- When does greedy fail on a similar-looking problem, and what would you reach for instead (DP, BFS)?
- Walk through edge cases that often break naive greedy: ties, negatives, single element.
GenAI · Decoding strategies
Interview questions to prep
- Compare greedy, beam, top-k, and nucleus (top-p) decoding.
- Why is beam search usually a bad choice for open-ended generation?
- What does temperature actually do to the softmax distribution?
Interview questions to prep
- How does speculative decoding speed up inference without hurting quality?
- Why doesn't speculative decoding always help — what's the relationship between draft acceptance rate and speedup?
Interview questions to prep
- How would you force an LLM to emit valid JSON — and what are the failure modes?
- Compare grammar-constrained decoding vs prompt-+-retry-+-validate — when does each fit?
Interview questions to prep
- Implement top-k and nucleus sampling inside an autoregressive generate() loop.
- How do temperature, top-k, and top-p interact when a model becomes repetitive or incoherent?
- What stopping conditions do you need for a chat model beyond max_new_tokens?
References & further reading
- Hugging Face LLM course ↗Hugging Face
- vLLM docs ↗vLLM
- OpenAI platform docs ↗OpenAI