Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I think it’s effectively built in to the design.

It isn't. There is no guarantee that successive tokens will be comprehensible.

> Usually it outputs “logits”, which become a probability distribution when combined with a “temperature” parameter.

The logits are the probability distribution (well technically, you would apply softmax). Temperature is a parameter for how you sample those logits in a non-greedy fashion.



> Temperature is a parameter for how you sample those logits in a non-greedy fashion.

I think temperature is better understood as a pre-softmax pass over logits. You'd divide logits by the temp, and then their softmax becomes more/less peaky.

    probs = (logits / temp).softmax()
Sampling is a whole different thing.


Sure, my comment about softmax was simply about the probability distribution. But temperature is still part of sampling. If you’re greedy decoding, temperature doesn’t matter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: