idea

  • algo to gradually split a blank document into blocks that make up a valid layout
  • then train a model to predict reading order
  • the reading order is actually related to the splitting