This is intriguing but I don't quite follow - really naive, but: isn't the final...

This is intriguing but I don't quite follow - really naive, but:

isn't the final token as some position N?

And given context size limit Y, when we generate the next token, right now I get attention from N - Y to N?

And this supposes I get attention from 0 to N, but the attention decreases exponentially as we approach token 0?