What is RLHF?
Alignment
RLHF (Reinforcement Learning from Human Feedback) — A technique for aligning LLMs with human preferences by training a reward model on human comparisons, then using reinforcement learning to optimize the LLM against that reward.
FAQ
What is RLHF?
Training LLMs to follow human preferences using a reward model + reinforcement learning. Used to make ChatGPT helpful and safe.
Is RLHF still used?
Yes, but DPO is increasingly preferred for its simplicity. RLHF remains important for understanding alignment.
Related Terms
Learn RLHF in depth
Free hands-on course with code examples and Google Colab notebooks.
Start Course →