sampling

Transclude of temperature-vs-nucleus-sampling.excalidraw#^areawetlobcrzbh03u6zwbjjq

idea

adjusts the probability distribution of the tokens so that they are more uniform (even out the distribution) thus more likely to pick tokens with low probability

formula

with:

  • : the probability of the i-th token after rescaling using temperature
  • : <mark class=“hltr-green”;> the logit of the i-th token
  • : number of tokens in the vocab
  • T : the temperature

higher temp = more diverse examples (more creative)

choosing temperature for pass@k

  • for small k, use low temp
    • e.g. k = 1, temp = 0.2
  • for bigger k, use smaller temp to get more diverse examples
    • e.g. k = 100, temp = 0.8