Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I will totally pay for something like this if it answers from my local documents, bookmarks, browser history etc.


There are already several RAG chat open source solutions available. Two that immediately come to mind are:

Danswer

https://github.com/danswer-ai/danswer

Khoj

https://github.com/khoj-ai/khoj


Stupid question but what does RAG stand for?


Retrieval augmented generation. In short you use an LLM to classify your documents (or chunks from them) up front. Then when you want to ask the LLM a question you pull the most relevant ones back to feed it as additional context.


I dont get it. To my understanding it takes huge amounts of data to build any any form of RAG. Simply because it enlarges the statistical model you later prompt. If the model is not big enough how would you expect it to answer you in a non qualifying matter ? It simply can't.

So I don't really buy it and I have yet to see it work better than any rdbms search index.

Tell me I am wrong, I would like to see a local model based on my own docs being able to answer me quality answers based on quality prompts.


RAG doesn't require much data or involve any training, it is a fancy name for "automatically paste some relevant context into the prompt"

Basically if you have a database of three emails and ask when Biff wanted to meet for lunch, a RAG system would select the most relevant email based on any kind of search - embeddings are most fashionable, and create a prompt like

"""Given this document: <your email>, answer the question "When does Biff want to meet for lunch?"""


That's not how RAG works. What you're describing is something closer to prompt optimization.

Sibling comment from discordance has a more accurate description of RAG. There's a longer description from Nvidia here: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge...


Right, you read something nebulous about how "the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user", and you think there is some magic going on, and then you click one link deeper and read at https://ai.meta.com/blog/retrieval-augmented-generation-stre... :

> Given the prompt “When did the first mammal appear on Earth?” for instance, RAG might surface documents for “Mammal,” “History of Earth,” and “Evolution of Mammals.” These supporting documents are then concatenated as context with the original input and fed to the [...] model

Finding the relevant context to put in the prompt is a search problem, nearest neighbour search on embeddings is one basic way to do it but the singular focus on "vector databases" is a bit of hype phenomenon IMO - a real world product should factor a lot more than just pure textual content into the relevancy score. Or is your personal AI assistant going to treat emails from yesterday as equally relevant as emails from a year ago?


Legit explanation, that's how it works AFAIK.


RAG:

1. First you create embeddings from your documents

2. Store that in a vector db

3. Ask what the user wants and do a search in the vector db (cosine similarity etc)

4. Feed the relevant search results to your LLM and do the usual LLM stuff with the returned embeddings and chunks of the documents


Although RAG is often implemented via vector databases to find 'relevant' content, I'm not sure that's a necessary component. I've been doing what I call RAG by finding 'relevant' content for the current prompt context via a number of different algorithms that don't use vectors.

Would you define RAG only as 'prompt optimisation that involves embeddings'?


Sure thing, your RAG approach sounds intriguing, especially since you're sidestepping vector databases. But doesn't the input context length cap affect it? (chatgpt plus at 32K [0] or gpt4 via open ai at 128K [1]) Seems like those cases would be pretty rare though.

[0]: https://openai.com/chatgpt/pricing#:~:text=8K-,32K,-32K

[1]: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...


Yes, context window is a limiting factor, but that's true however you identify the content to augment generation.


You're misunderstanding. Imagine your query is matched against chunks of text from the database, where the relevance of information is evaluated for the window each time it slides. Then collecting the n most relevant chunks, these are included in the prompt so then the llm can provide answers from source documents verbatim. This is useful for cases where precise and exact answers are needed. For example, searching the docs for some package for the right api to call. You don't mant a name that's close to right it has to be correct to the character.


Ahh ok I see. It's basically what MS CoPilot 365 does too, with its "Grounding" step.


Yes.


This. There was a post in HN last week, iirc, referring to just such a solution called ZenFetch (?). I would have adopted it in a heartbeat but they don’t currently have a means of exporting the source data you feed to it (should you elect it as your sole means of bookmarking, etc)


Hey there,

This is Gabe, the founder of Zenfetch. Thanks for sharing. We're putting together an export option where you can download all your saved data as a CSV and should get that out by end of week.


Seems like this would be a good tool to build lessons on - if you could share a "class" and export a link for others to then copy the class, and expand on the lesson/class/topic into their own AI. but as a separate "class" and not fully integrated to my regular history blob?

I want the ability to search all my downloaded files and organize them based on context within. Have it create a category table, and allow me to "put all pics of my cat in this folder, and upload them to a gallery on imgur."


We're working on the ability to share folders of your knowledge so that others can search/chat across them.

We've been thinking of this as a "subscription" to the creator's folder. Similar to how you might subscribe to a Spotify playlist


Consider using tar files for this. Lots of tooling (versioning, hashing, storage) around this already, and docker layers comes to mind.


Or aN RSS?


Yes it would be the next big focus on this. Personal data connectivity is what I see where local AI would excel - despite model power differences.


I have doubts about that. Most personal data actually lives in the cloud these days. If you need your Gmail emails, you'll need to use their API which is guarded behind $50k certification fee or so. I think there is a simpler version for personal use, but you still need to get the API key. Who's going to teach their mom about API keys? So I think for a lot of these data sources you'll end up with enterprise AIs integrating them first for a seamless experience.


Why wouldn't you be able to use IMAP over the gmail api? IMAP returns the text and headers of all your emails, which is what you'd want the LLM to ingest anyway.


Seconding a sibling question: What $50k API fee? To access your gmail? I've been using gmail since 2008 or so without ever touching their web/app interface or getting an API key. You just use it as an IMAP server.


To use Google's sensitive APIs in production you have to certify your product and that costs tens of thousands. To be honest, didn't think about imap at first, but it looks like that could be getting tougher soon too https://support.google.com/a/answer/14114704?hl=en. Soon they will require oAuth for imap and with oAuth you'll need the certification: https://developers.google.com/gmail/imap/xoauth2-protocol. If it's for personal use, you might be able to get by with just with some warnings in the login flow but it won't be easy to get oAuth flow setup in the first place.


Yeah, Thunderbird integrated oAuth in the last few releases, mainly to keep up with the Gmail and Hotmail requirements. Made it very user-friendly to set up in the GUI right within T-bird. I don't see this being a major obstacle.

I'm not sure I can imagine a scenario in production where Google would, or should, allow API access to individual gmail accounts. What's that for? So you can read all your employees' mail without running your own email server?


I'm not sure what you mean.

> You will no longer use a password for access (with the exception of app passwords)

I'm not seeing anywhere that I'd need to pay money to use OAuth via an app like Thunderbird or another email client. That app would either need to support using OAuth to let the user auth and get credentials, or use an app password.


Right, but Thunderbird had to pay up and set themselves as a middleman to allow this. My point is that local LLMs might not have that many advantages for personal data because most of that data doesn't live locally on your computer, to begin with. I guess an argument could be made that running them locally prevents an AI provider from gobbling up ALL of your data. On the other hand, Google already has most of our my data: emails, youtube, gmail, etc.


I think this is a good take. While there's big enough niche for personal data locally, I'd love if there's a way to solve for email/cloud data requiring API keys.


Ideally, though, a sufficiently smart LLM shouldn't need API access. It could navigate to your social media login page, supply your credentials, and scrape what it sees. Better yet, it should just reverse-engineer the API ;)


What?

I manage both gmail and protonmail via thunderbird - where I have better search and sort using IMAP.


Good to know there's a market for that. Currently building out something. Integrating from numerous sources, processing and then utilizing those.

nice.


Yeah, we’re getting closer to “Her”


I would even let it have longer processing times for queries to apply against each document in my system, allow it to specialize/train itself on a daily basis…

Use all the resources you want if you save me brainpower


Agree, there's a non real-time angle to this.


"give me a summary of the news around this topic each morning for my daily read"

Help me plan for upcoming meetings whereby if I put something in calendar, it will build a little dossier for the event, and include relevant info based on the type of event or meeting, mostly scheduling reminders or prompting you with updates or changes to the event etc.


“filter out baby pictures from my family text threads”



Next version of MacOS will probably have that.


As long as you use Safari for browsing, Notes for note taking, iCloud for mail …


https://news.ycombinator.com/item?id=38787892 ("Show HN: Rem: Remember Everything (open source)") ?

https://github.com/jasonjmcghee/rem


plus one, I would love to configure a folder of markdown/txt(+ eventually images and pdfs) files that this can have access to. Ideally it could RAG over them in a sensible way. Would love to help support this!


Thank you! I'd love to learn more about your use cases. Would you mind sending an email to feedback@recurse.chat or DM me on https://x.com/chxy to get the conversation started?


I use paperless-ngx for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: