Level 2 · 35 min
Decoding Strategies: Sampling, Beam Search, Speculative
Decoding turns probability distributions into text. Temperature, top-p, top-k, beam search, and speculative decoding change diversity, determinism, speed, and error shape. The same model can behave very differently under different decoding.
Mental model for decoding
Decoding Strategies: Sampling, Beam Search, Speculative is useful only when you can explain the abstraction and its failure boundary. Start by naming inputs, outputs, guarantees, and what the component refuses to guarantee. That framing prevents cargo-cult use of a technique that happens to be popular.
Production design questions
For a senior interview, connect the concept to reliability, latency, cost, security, and observability. Explain what you would measure, what assumption could break first, and how you would roll out a change safely.
Common failure mode
The common mistake is treating decoding as a black box. When the system fails, you need enough internal model to inspect inputs, intermediate state, and outputs without guessing.
Code example
Checklist:
1. Define the user-facing goal
2. State the system guarantee
3. Identify assumptions
4. Add measurement
5. Test the most likely failure mode