Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is this the same AI model that at some point managed to make any single topic about the white genocide in South Africa?


How does this sort of thing work from a technical perspective? Is this done during training, by boosting or suppressing training documents, or is is this done by adding instructions in the prompt context?


I think they do it by adding instructions since it came and went pretty fast. Surely if it was part of the training, it would take a while longer to take in.


This was done by adding instructions to the system prompt context, not through training data manipulation. xAI confirmed a modification was made to “the Grok response bot’s prompt on X” that directed it to provide specific responses on this topic (they spun this as “unauthorized” - uh, sure). Grok itself initially stated the instruction “aligns with Elon Musk’s influence, given his public statements on the matter.” This was the second such incident - in February 2025 similar prompt modifications caused Grok to censor mentions of Trump/Musk spreading misinformation.

[1] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...


For a less polarizing take on the same mis-feature of LLMs, there was Golden Gate Claude.

https://www.anthropic.com/news/golden-gate-claude




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: