But that's still false. RLHF is not instruction fine-tuning. It is alignment. GP...

		elcomet on May 13, 2023 \| parent \| context \| favorite \| on: GitHub Copilot Chat Leaked Prompt But that's still false. RLHF is not instruction fine-tuning. It is alignment. GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.

You're right, thanks for the correction