remove least semantically significant tokens first
the most semantically significant tokens (in sequence) are the ones that most effectively summarize the meaning of the input. E.g.:
a quick brown fox jumps over the lazy dog
_ quick brown fox jumps over ___ lazy dog
_ _____ brown fox jumps over ___ ____ dog
_ _____ _____ fox jumps ____ ___ ____ dog
KL-divergence/OT distance between the prob dist of newly generated tokens
denoising process
each step generate new tokens that doesn’t change the semantic too much
restrict on generating only at positions next to existing tokens. How ? question
Filter out only positions that next to a token ?
Make the generate-able positions dynamic, e.g.
The current sequence: a quick brown fox
Generate-able positions: _ a _ quick _ brown _ fox _→ the sequence length is not fixed. How to implement this ? question
Always add generate-able positions at the beginning and ending of the sequence to make generation expandable on both ends
Again, the sequence length is not fixed → heavy engineering needed question