Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are there models that haven't been RLHF'd to the point of sycophancy that are good for this? I find that the models are so keen to affirm, they'll generally write a continuation where any plan the PCs propose works out somehow, no matter what it is.


Doesn't seem impossible to fix either way. You could have like a preliminary step where a conventional algorithm decides if a proposal will work at random, with the probability depending on some variable, before handing it out to the DM AI. "The player says they want to do this: <proposed course of action>. This will not work. Explain why."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: