perplexity vs entropy

source: In NLP, why do we use perplexity instead of the loss? - Quora

p er pl e x i t y = 2^{e n t ro p y}

Entropy is a measure of information. Without going into details, entropy involves logarithm which, in principle can be in any base. If you calculated entropy using natural logarithm (base $e$ ) you will calculate $p er pl e x i t y$ with $e n t ro p y$ . Computer Scientist likes $lo g_{2}$ because it corresponds to bits, therefore you will often face base 2 $lo g$ when looking information theory literature.

while entropy can be seen as information quantity, perplexity can be seen as the “number of choices” the random variable has.

The fact is that the answer is just: we do prefer perplexity. Its mathematically the same to “return” entropy or perplexity just like I could tell you that the following sentences are equivalent:

“This die has 6 faces”

“This die has 2.58 entropy”

Also, entropy uses logarithms as we said. Perplexity, with its e^ bring it back to a linear scale. Which we, human, usually prefer.

Lone's notes

Recently Updated

Adaptive KV cache pruning with selective features

eigen values and Page Rank

The effects of neural net layers on activation space

which features represented by Attention layers vs by MLP layers

Mechanistic Interpretability

All notes

perplexity vs entropy

Graph View