Are there models that haven't been RLHF'd to the point of sycophancy that are go...

fluoridation · on Oct 3, 2024

Doesn't seem impossible to fix either way. You could have like a preliminary step where a conventional algorithm decides if a proposal will work at random, with the probability depending on some variable, before handing it out to the DM AI. "The player says they want to do this: <proposed course of action>. This will not work. Explain why."