nucleus sampling (top_p)
- finds the smallest number of tokens with total probability > p
- then samples from these tokens.
i.e.
- sort the tokens by probability in descending order
- take all tokens until prefix sum > p
Transclude of temperature-vs-nucleus-sampling.excalidraw#^arealqivecaghcyncm573z0dkLink to original
temperature sampling
Transclude of temperature-vs-nucleus-sampling.excalidraw#^areawetlobcrzbh03u6zwbjjqidea
adjusts the probability distribution of the tokens so that they are more uniform (even out the distribution) → thus more likely to pick tokens with low probability
formula
with:
- : the probability of the i-th token after rescaling using temperature
- : <mark class=“hltr-green”;> the logit of the i-th token
- : number of tokens in the vocab
- T : the temperature
→ higher temp = more diverse examples (more creative)
choosing temperature for pass@k
Link to original
- for small k, use low temp
- e.g. k = 1, temp = 0.2
- for bigger k, use smaller temp ⇒ to get more diverse examples
- e.g. k = 100, temp = 0.8
temperature sampling vs nucleus sampling (top_p)
temperature-vs-nucleus-sampling.excalidraw#^groupvdytww2lsxumo3b1m8ps_