theoryhypothesisresearchquestion

  • Suppose that the column vectors of model weights are nearly orthogonal (unit ?) vectors
  • That would mean if we apply a transformation such that 1 of those vectors become a standard basis, then all other vectors will align with other standard bases
  • Then we would have a nearly sparse matrix with a few nearly 1 but not exactly 1 and a lot of nearly 0 but not exactly 0
  • Then take the delta of with the sparse matrix would be a matrix of near 0 values
  • What if the values in follows a Gaussian ? Then we have decomposed a weight matrix into
    • a lower-rank matrix
    • a sparse vector
    • and a Gaussian representing
  • Possible applications:
    • A more efficient way of storing model weights
    • We could force one of the column matrix to be a standard basis before training and then we would have a near-sparse weight matrix for free after training
    • If that works then we can instead of learning a weight matrix, we can learn a sparse matrix and a Gaussian
      • then we can look at this from a graphical modelling perspective or like a VAE