Transclude of temperature-vs-nucleus-sampling.excalidraw#^areawetlobcrzbh03u6zwbjjq
idea
adjusts the probability distribution of the tokens so that they are more uniform (even out the distribution) → thus more likely to pick tokens with low probability
formula
with:
- : the probability of the i-th token after rescaling using temperature
- : <mark class=“hltr-green”;> the logit of the i-th token
- : number of tokens in the vocab
- T : the temperature
→ higher temp = more diverse examples (more creative)
choosing temperature for pass@k
- for small k, use low temp
- e.g. k = 1, temp = 0.2
- for bigger k, use smaller temp ⇒ to get more diverse examples
- e.g. k = 100, temp = 0.8