idea from: Episode 6: Sam Altman - YouTube - 6:32

what

  • For LLMs, currently we spend the same amount of compute for every token
    • “a dump one or figuring out some complicated math”
    • Do the Riemann hypothesis… is the same as compute as saying The
  • Can we implement dynamic computation based on the complexity of the inputs ?question
    • Can it be generalize for all models not just LLMs ?question

related papers

  • Mixture of Depth
  • RHO-1: Not all tokens are what you need