Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> html

Would be willing to bet this is the issue. Adding html files to context for gemini models results in a ton of token use.



why?

EDIT: why must users care?


Gotta learn all the quirks of the model before it's replaced in 8 minutes.


Quirks? like context window?


I'm saying it's egregious to expect all users to know the fact that an HTML document, for some reason, uses an enormous amount of context in an LLM designed specifically for working with code.



The accepted answer is one that doesn’t care about the questioner‘s use case and instead gives a pretty excessive "Don‘t do it"


It does also give the right solution, using an xml parser.


We don’t know the use case.

Maybe the questioner is also in full control of the HTML creation and they don’t need a parser for all possible HTML edge cases.


Maybe they are, but they would also need to ensure a well-defined subset of HTML and also show that the subset is a reglar (Chomsky Type 3) grammar.

It seems that even the very conceptually simple example given by the questioner is impossible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: