You’re right, but I think it’s also pretty clear that
A) there is demand for functionality that depends on semi-real-time data, e.g. a prompt like “explain {recent_trending_topic} to me and describe its evolution” where the return could be useful in various contexts;
B) the degradation of search experience and the explosion of chat interfaces seem to indicate “the future of search is chat” and the number of Google searches prefixed or suffixed with “Reddit” make it obvious that LLM-powered chat models with search functionality will want to query Reddit extensively, and in the example prompt above, the tree of queries generated to fulfill a single prompt could be sizeable;
C) improvements to fine-tuning pipelines make it more and more feasible to use real-time data in the context of LLMs, such as a “trending summary” function that could cache many potentially related queries from Reddit, Twitter, etc and use them to fine-tune a model which would serve a response to my example prompt
A) there is demand for functionality that depends on semi-real-time data, e.g. a prompt like “explain {recent_trending_topic} to me and describe its evolution” where the return could be useful in various contexts;
B) the degradation of search experience and the explosion of chat interfaces seem to indicate “the future of search is chat” and the number of Google searches prefixed or suffixed with “Reddit” make it obvious that LLM-powered chat models with search functionality will want to query Reddit extensively, and in the example prompt above, the tree of queries generated to fulfill a single prompt could be sizeable;
C) improvements to fine-tuning pipelines make it more and more feasible to use real-time data in the context of LLMs, such as a “trending summary” function that could cache many potentially related queries from Reddit, Twitter, etc and use them to fine-tune a model which would serve a response to my example prompt