> I think it’s effectively built in to the design. It isn't. There is no guarant...

hexaga · on June 7, 2024

> Temperature is a parameter for how you sample those logits in a non-greedy fashion.

I think temperature is better understood as a pre-softmax pass over logits. You'd divide logits by the temp, and then their softmax becomes more/less peaky.

    probs = (logits / temp).softmax()

Sampling is a whole different thing.

qeternity · on June 7, 2024

Sure, my comment about softmax was simply about the probability distribution. But temperature is still part of sampling. If you’re greedy decoding, temperature doesn’t matter.