Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is intriguing but I don't quite follow - really naive, but:

isn't the final token as some position N?

And given context size limit Y, when we generate the next token, right now I get attention from N - Y to N?

And this supposes I get attention from 0 to N, but the attention decreases exponentially as we approach token 0?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: