LLMs spend a fixed amount of effort on each token they output, and in a feedforw...

LLMs spend a fixed amount of effort on each token they output, and in a feedforward manner. There's no recursion in the network other than through predicting predicated on the token that it just output. So it's not really time pressure in the same way that you might experience it, but it makes sense that sometimes the available compute is not enough for the next token (and sometimes it's excessive). Thinking modes try to improve this by essentially allowing the LLM to 'talk to itself' before sending anything to the user.