I dunno, these reasoning models seems kinda "dumb" because they try to bootstrap itself via reasoning, even though a simple direct answer might not exist (for example key information are missing for a proper answer).
Ask something like: "Ravioli: x = y: France, what could be x and y?" (it thought for 500s and the answers were "weird")
Or "Order from left to right these items ..." and give partial information on their relative position, eg Laptop is on the left of the cup and the cup is between the phone and the notebook. (Didn't have enough patience nor time to wait the thinking procedure for this)
IME all "reasoning" models do is confuse themselves, because the underlying problem of hallucination hasn't been solved. So if the model produces 10K tokens of "reasoning" junk, the context is poisoned, and any further interaction will lead to more junk.
I've had much better results from non-"reasoning" models by judging their output, doing actual reasoning myself, and then feeding new ideas back to them to steer the conversation. This too can go astray, as most LLMs tend to agree with whatever the human says, so this hinges on me being actually right.
Ask something like: "Ravioli: x = y: France, what could be x and y?" (it thought for 500s and the answers were "weird")
Or "Order from left to right these items ..." and give partial information on their relative position, eg Laptop is on the left of the cup and the cup is between the phone and the notebook. (Didn't have enough patience nor time to wait the thinking procedure for this)