ideanlp

  • start with pairs of document in 2 different languages (e.g. books)
  • force the embeddings of the mode for the target language to be aligned with that of the model for the source language
    • using extra encoders to aligned the encoded embeddings ?
      • document embeddings must be calculated using the token embeddings
    • using a reward model ?