Is this the same AI model that at some point managed to make any single topic ab...

cbm-vic-20 · 2025-11-09T13:35:46 1762695346

How does this sort of thing work from a technical perspective? Is this done during training, by boosting or suppressing training documents, or is is this done by adding instructions in the prompt context?

Hamuko · 2025-11-09T13:36:25 1762695385

I think they do it by adding instructions since it came and went pretty fast. Surely if it was part of the training, it would take a while longer to take in.

benzible · 2025-11-09T13:44:29 1762695869

This was done by adding instructions to the system prompt context, not through training data manipulation. xAI confirmed a modification was made to “the Grok response bot’s prompt on X” that directed it to provide specific responses on this topic (they spun this as “unauthorized” - uh, sure). Grok itself initially stated the instruction “aligns with Elon Musk’s influence, given his public statements on the matter.” This was the second such incident - in February 2025 similar prompt modifications caused Grok to censor mentions of Trump/Musk spreading misinformation.

[1] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...

fragmede · 2025-11-09T14:58:38 1762700318

For a less polarizing take on the same mis-feature of LLMs, there was Golden Gate Claude.

https://www.anthropic.com/news/golden-gate-claude