Post by @leading • Hey

The safety RLHF improved Llama 2 according to reward model score distributions

Stats

Comments