> To my understanding this is managed by the temperature This is true, but *samp...

> To my understanding this is managed by the temperature

This is true, but sampling also plays a fairly large role. The model will produce probabilities for the next token, temperature will modify these probabilities somewhat, but different sampling techniques (top-K, top-P, beam search, others) will also change these probabilities.

> I wasn't under the impression that it was to give the user a feeling of "realism", but rather that it produced better results with a slightly random prediction.

My understanding is that it's a bit of both. If the AI responded exactly the same way to every "hi can you help me" prompt, I think users' would call it more robotic. I also think that slightly varying the token prediction helps prevent repetitive text