idea from: Episode 6: Sam Altman - YouTube - 6:32
what
- For LLMs, currently we spend the same amount of compute for every token
- “a dump one or figuring out some complicated math”
- “Do the Riemann hypothesis… is the same as compute as saying The”
- Can we implement dynamic computation based on the complexity of the inputs ?question
- Can it be generalize for all models not just LLMs ?question
related papers
- Mixture of Depth
- RHO-1: Not all tokens are what you need