sampling

nucleus sampling (top_p)

sampling

  • finds the smallest number of tokens with total probability > p
  • then samples from these tokens.

i.e.

  • sort the tokens by probability in descending order
  • take all tokens until prefix sum > p

Transclude of temperature-vs-nucleus-sampling.excalidraw#^arealqivecaghcyncm573z0dk

Link to original

temperature sampling

sampling

Transclude of temperature-vs-nucleus-sampling.excalidraw#^areawetlobcrzbh03u6zwbjjq

idea

adjusts the probability distribution of the tokens so that they are more uniform (even out the distribution) thus more likely to pick tokens with low probability

formula

with:

  • : the probability of the i-th token after rescaling using temperature
  • : <mark class=“hltr-green”;> the logit of the i-th token
  • : number of tokens in the vocab
  • T : the temperature

higher temp = more diverse examples (more creative)

choosing temperature for pass@k

  • for small k, use low temp
    • e.g. k = 1, temp = 0.2
  • for bigger k, use smaller temp to get more diverse examples
    • e.g. k = 100, temp = 0.8
Link to original

temperature sampling vs nucleus sampling (top_p)

temperature-vs-nucleus-sampling.excalidraw#^groupvdytww2lsxumo3b1m8ps_