Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not disagreeing with your comment in general, but this particular sentence annoys me a bit:

> where tokens are roughly equally important throughout the text, such as a dense academic paper or a reference manual.

Even in these, not all tokens are equal, most of a text is actually pretty low-information, with key packs of token that contain most of the information that you're going to need throughout the entire text (that's why we use highlighters when learning). And that's why those O(n²) attention are pretty wasteful, at the same time, you need to be able to pick the proper token, and I agree with you that picking them through simple heuristic is probably not going to be enough.



Better phrasing would have been "the important tokens are roughly evenly distributed throughout the text", that was the intended reading.


Are you thinking more like a research paper or more like a textbook?

For a textbook at least, it often seems to be the case that you need to have fully ingested the big picture ideas of one chapter to move on to some later ones, but this seems to me at least more like updating your model, rather than sampling context from the whole book (I mean it is an analogy of course, so neither matches perfectly).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: