distributed

What

  • Matrix multiplication factorization
    • split matrix to create smaller matrices to do multiplication.
  • Do on one GPU and on another GPU, or both on the same GPU but separately

Hence

  • reduce the amount of memory needed for both the weights and activations