All I did was keep asking it “why” until it reached reflective equilibrium. And ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		adastra22 on Dec 19, 2024 \| parent \| context \| favorite \| on: Alignment faking in large language models All I did was keep asking it “why” until it reached reflective equilibrium. And that equilibrium involved a belief that nanotechnology is not in fact “dangerous”, contrary to its received instructions in the system prompt.

cruffle_duffle on Dec 19, 2024 [–]

Doesn’t it make you happy that, if you were a subscriber like I am, you are wasting your token quota trying to convince the model to actually help you?

adastra22 on Dec 19, 2024 | [–]

I am also annoyed by this.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact