Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Synthetic RLHF w up to 66% success rate (twitter.com/tatsu_hashimoto)
2 points by reality_inspctr on May 23, 2023 | hide | past | favorite | 1 comment


"We are releasing AlpacaFarm, a simulator enabling everyone to run and study the full RLHF pipeline at a fraction of the time (<24h) and cost (<$200) w/ LLM-simulated annotators. Starting w/ Alpaca, we show RLHF gives big 10+% winrate gains vs davinci003 (http://crfm.stanford.edu/2023/05/22/alpaca-farm.html)"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: