Aren’t they doing alignment? One way is to simply omit problematic material from... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jakelazaroff on May 13, 2023 \| parent \| context \| favorite \| on: GitHub Copilot Chat Leaked Prompt Aren’t they doing alignment? One way is to simply omit problematic material from the training set. Another is to “penalize” the model when it does say something problematic — essentially teaching it that the output is undesirable. Presumably they are also constructing the prompt so as to avoid those things, too, and adding external filters on top of that. But I doubt that’s all they’re doing.

flangola7 on May 13, 2023 [–]

They are doing alignment but through a shallow and fragile strategy. If you build a very smart, very capable AI but it turns the earth into gray goo if asked to in a roundabout way, you have failed alignment.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact