It certainly feels like certain patterns are hardcoded special cases, particularly to do with math.
"Solve (1503+5171)*(9494-4823)" reliably gets the correct answer from ChatGPT
"Write a poem about the solution to (1503+5171)*(9494-4823)" hallucinates an incorrect answer though
That suggests to me that they've papered over the models inability to do basic math, but it's a hack that doesn't generalize beyond the simplest cases.
There's a few things there that could be going on that seem more likely than "hardcoded".
1. The part of the network that does complex math and the part that write poetry are overlapping in strange ways.
2. Most of the models nowadays are assumed to be some mixture of experts. So it's possible that saying write the answer as a poem activates a different part of the model.
Watch for ChatGPT or Claude saying "analyzing" - which means they have identified they need to run a calculation and outsourced it to Python (ChatGPT) or JavaScript (Claude)
The poem thing probably causes them to not decide to use those tools.
To be clear I was testing with 4o, good to know that o1 has a better grasp of basic arithmetic. Regardless my point was less to do with the models ability to do math and more to do with OpenAI seeming to cover up its lack of ability.
“a poem about” reads to me at least like the solution need not be in the answer; maybe something like “a poem that includes the answer in the last stanza”
"Solve (1503+5171)*(9494-4823)" reliably gets the correct answer from ChatGPT
"Write a poem about the solution to (1503+5171)*(9494-4823)" hallucinates an incorrect answer though
That suggests to me that they've papered over the models inability to do basic math, but it's a hack that doesn't generalize beyond the simplest cases.