RAG doesn't require much data or involve any training, it is a fancy name for "a...

loudmax · on Feb 28, 2024

That's not how RAG works. What you're describing is something closer to prompt optimization.

Sibling comment from discordance has a more accurate description of RAG. There's a longer description from Nvidia here: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge...

tveita · on Feb 28, 2024

Right, you read something nebulous about how "the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user", and you think there is some magic going on, and then you click one link deeper and read at https://ai.meta.com/blog/retrieval-augmented-generation-stre... :

> Given the prompt “When did the first mammal appear on Earth?” for instance, RAG might surface documents for “Mammal,” “History of Earth,” and “Evolution of Mammals.” These supporting documents are then concatenated as context with the original input and fed to the [...] model

Finding the relevant context to put in the prompt is a search problem, nearest neighbour search on embeddings is one basic way to do it but the singular focus on "vector databases" is a bit of hype phenomenon IMO - a real world product should factor a lot more than just pure textual content into the relevancy score. Or is your personal AI assistant going to treat emails from yesterday as equally relevant as emails from a year ago?

machiaweliczny · on Feb 28, 2024

Legit explanation, that's how it works AFAIK.