- take the model
- generate the outputs with respect to some prompt
- ask human what they think of them
- good or bad ?
- better ?
- how much better ?
- train a model to predict the score
- try to get the model to align better with human preferences
- when people talk about alignment, this is the technique that is often used