1. take the model
  2. generate the outputs with respect to some prompt
  3. ask human what they think of them
    • good or bad ?
    • better ?
    • how much better ?
  4. train a model to predict the score
  • try to get the model to align better with human preferences
    • when people talk about alignment, this is the technique that is often used