Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It certainly feels like certain patterns are hardcoded special cases, particularly to do with math.

"Solve (1503+5171)*(9494-4823)" reliably gets the correct answer from ChatGPT

"Write a poem about the solution to (1503+5171)*(9494-4823)" hallucinates an incorrect answer though

That suggests to me that they've papered over the models inability to do basic math, but it's a hack that doesn't generalize beyond the simplest cases.



There's a few things there that could be going on that seem more likely than "hardcoded".

1. The part of the network that does complex math and the part that write poetry are overlapping in strange ways.

2. Most of the models nowadays are assumed to be some mixture of experts. So it's possible that saying write the answer as a poem activates a different part of the model.


Watch for ChatGPT or Claude saying "analyzing" - which means they have identified they need to run a calculation and outsourced it to Python (ChatGPT) or JavaScript (Claude)

The poem thing probably causes them to not decide to use those tools.



To be clear I was testing with 4o, good to know that o1 has a better grasp of basic arithmetic. Regardless my point was less to do with the models ability to do math and more to do with OpenAI seeming to cover up its lack of ability.


i think it’s mostly that o1 mini can think through the solution before it starts writing the poem.

i’m able to reproduce your failure on 4o


“a poem about” reads to me at least like the solution need not be in the answer; maybe something like “a poem that includes the answer in the last stanza”


yeah but it like actually gets the answer wrong not just omits it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: