Maybe it depends on the use case, but my opinion is, if you do need to apply compression, it should be done via a tool call real time instead of in a pipeline.
For example, if you’re trying to summarize the status of a project, instead of feeding an agent (in real time or via summarization pipeline), it’s better to write a script that summarizes the status of all of the jira tickets, instead of asking the agent to read all of the tickets to create a summary
Another small data point, I think people would prefer to ask questions of an AI model instead of reading the generated summaries.
For example, if you’re trying to summarize the status of a project, instead of feeding an agent (in real time or via summarization pipeline), it’s better to write a script that summarizes the status of all of the jira tickets, instead of asking the agent to read all of the tickets to create a summary
Another small data point, I think people would prefer to ask questions of an AI model instead of reading the generated summaries.