I agree entirely with you. While Claude Code is amazing, it is also slow as hell and the context issue keeps coming up (usually at what feels like the worst possible time for me).
It honestly feels like dialup most LLMs (apart from this!).
AFIAK with traditional models context size is very memory intensive (though I know there are a lot of things that are trying to 'optimize' this). I believe memory usage grows at the square of context length, so even 10xing context length requires 100x the memory.
(Image) diffusion does not grow like that, it is much more linear. But I have no idea (yet!) about text diffusion models if someone wants to chip in :).
It honestly feels like dialup most LLMs (apart from this!).
AFIAK with traditional models context size is very memory intensive (though I know there are a lot of things that are trying to 'optimize' this). I believe memory usage grows at the square of context length, so even 10xing context length requires 100x the memory.
(Image) diffusion does not grow like that, it is much more linear. But I have no idea (yet!) about text diffusion models if someone wants to chip in :).