Top K

Top-K selects the next token by selecting from the k most probable tokens. Higher k increases randomness.

How it Works #

To illustrate Top-K sampling, consider generating the next word after the quick brown fox using a model trained on extensive English text. Suppose the model’s predicted probabilities for the next word are as follows:

jumps: 0.4
runs: 0.3
walks: 0.2
eats: 0.05
sleeps: 0.05

Applying Top-K sampling with k=3 limits our choices to jumps, runs, and walks.

These are then adjusted to a new distribution where the probabilities sum to 1, making jumps more likely while still allowing for runs or walks to be chosen, based on a random draw.