• A base vocab that includes all possible characters is large (all unicode characters)
  • GPT-2 uses bytes as the base vocab to force the size of 256