Tag: RLHF alignment challenges

The undesirable phenomenon of flattery in large language models and what can be done about it