Maybe it depends on the use case, but my opinion is, if you do need to apply com...

Maybe it depends on the use case, but my opinion is, if you do need to apply compression, it should be done via a tool call real time instead of in a pipeline.

For example, if you’re trying to summarize the status of a project, instead of feeding an agent (in real time or via summarization pipeline), it’s better to write a script that summarizes the status of all of the jira tickets, instead of asking the agent to read all of the tickets to create a summary

Another small data point, I think people would prefer to ask questions of an AI model instead of reading the generated summaries.