vector magnitude in LLMs acts as a filter

Most computation in LLMs are in self attention and MLP block.
But inputs to these blocks have fixed length
That means the vector length is a degree of freedom that is not used to for representation capability. So what is it used for ?

Hypothesis: It’s a filter mechanism. Suppose that an activation is a weighted sum of related features: $a = Σ_{i} c_{i} f_{i}$

Lone's notes