theoryhypothesisresearchquestion
- Suppose that the column vectors of model weights are nearly orthogonal (unit ?) vectors
- That would mean if we apply a transformation such that 1 of those vectors become a standard basis, then all other vectors will align with other standard bases
- Then we would have a nearly sparse matrix with a few nearly 1 but not exactly 1 and a lot of nearly 0 but not exactly 0
- Then take the delta of with the sparse matrix would be a matrix of near 0 values
- What if the values in follows a Gaussian ? Then we have decomposed a weight matrix into
- a lower-rank matrix
- a sparse vector
- and a Gaussian representing
- Possible applications:
- A more efficient way of storing model weights
- We could force one of the column matrix to be a standard basis before training and then we would have a near-sparse weight matrix for free after training
- If that works then we can instead of learning a weight matrix, we can learn a sparse matrix and a Gaussian
- then we can look at this from a graphical modelling perspective or like a VAE