Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are the current ways to minimize context usage when streaming with multiple tool calls? I can offload some stuff to tools themselves, i.e. they wrap some LLM doing heavy lifting like going through a 200k-token-long markdown and return only some structured distillation, however, even that can fill main model's context quickly in some scenarios.


How are you orchestrating this? Just usual sub-agents or something custom?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: