Day 73 of 133

Decoding strategies + speculative decoding + JSON mode + DSA Greedy

Greedy/beam/top-k/top-p/temperature; constrained generation.

DSA · NeetCode Greedy

Jump Game IIDSA · Greedy
Interview questions to prep
1. Prove the greedy choice — why is the locally-optimal pick safe globally? (Exchange argument or staying-ahead.)
2. When does greedy fail on a similar-looking problem, and what would you reach for instead (DP, BFS)?
3. Walk through edge cases that often break naive greedy: ties, negatives, single element.
Gas StationDSA · Greedy
Interview questions to prep
1. Prove the greedy choice — why is the locally-optimal pick safe globally? (Exchange argument or staying-ahead.)
2. When does greedy fail on a similar-looking problem, and what would you reach for instead (DP, BFS)?
3. Walk through edge cases that often break naive greedy: ties, negatives, single element.

Greedy, beam, top-k, top-p, temperatureGenerative AIHF
Interview questions to prep
1. Compare greedy, beam, top-k, and nucleus (top-p) decoding.
2. Why is beam search usually a bad choice for open-ended generation?
3. What does temperature actually do to the softmax distribution?
Speculative decoding & draft modelsGenerative AILeviathan et al.
Interview questions to prep
1. How does speculative decoding speed up inference without hurting quality?
2. Why doesn't speculative decoding always help — what's the relationship between draft acceptance rate and speedup?
Constrained / structured decoding (JSON mode, grammars)Generative AIOutlines
Interview questions to prep
1. How would you force an LLM to emit valid JSON — and what are the failure modes?
2. Compare grammar-constrained decoding vs prompt-+-retry-+-validate — when does each fit?
Implement a generate() loop for a tiny GPTGenerative AINeetCode ML
Interview questions to prep
1. Implement top-k and nucleus sampling inside an autoregressive generate() loop.
2. How do temperature, top-k, and top-p interact when a model becomes repetitive or incoherent?
3. What stopping conditions do you need for a chat model beyond max_new_tokens?

References & further reading