It is surprising that it works (I haven't tried it). `Content-Length` had one goal - to ensure data integrity by comparing the response size with this header value. I expect http client to deal with this out of the box, whether gzip or not. Is it not the case? If yes, that changes everything, a lot of servers need priority updates.
I read the complete article. It does feel like written by human but only if you don't read it all. So overall, I would have saved time if it was written by AI completely (by choosing to not read after a glance)
Only 20 training samples improved llm performance, that sounds unrealistic! My experience with RLHF for LLM perf differs. Can you be more specific about the case where you achieved this and share technical details about how do you do that?
We are not doing RLHF but fine-tuning directly on a reward function. Our task was around improving a coding agent, coding in JSONata(https://jsonata.org).
GPT4o is quite bad in this, as there are not too many JSONata snippets on the internet. We collected 20 coding problems; the reward function then just assigned a scalar value based on whether the code output of the model was syntactically correct or not (Most interestingly, we found that by optimizing the syntax, it also got better at getting the semantics correct)
I think the discrepancy between our result with direct RL and your experience with RLHF comes from the fact that RLHF is built around non-verifiable/subjective domains, where intrinsically, the reward signal obtained by the HF-proxy is weak(er), i.e. for the same training scenario/prompt you need more samples to get to the same gradient.
What is the architecture/tech-stack used in building this? I didn't find this info neither on github readme, nor on website.
I like the fact that it is written in Go and small enough to skim over the weekend, but after repeatedly burning my time on dozens of llm ecosystem tools, I'm careful in choosing to even explore the code myself without seeing these basic disclosures upfront. I'm sure you'd see more people adopting your tool if you can provide a high-level overview of the project's architecture (ideally in a visual manner)
Hey! Yes, that's something I was planning to do—a complete documentation on the code, its architecture, and the entire stack to allow others to develop alongside me. I just deployed a functional version, and soon, the website will have documentation with its architecture and a visualization of the entire code.
but for now here is the stack used:
Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution)
CLI Framework: Cobra (for command-line interface structure)
LLM Integration: Ollama API (for embeddings and completions)
Storage: Local filesystem-based storage (JSON files for simplicity and portability)
Vector Search: Custom implementation of cosine similarity for embedding retrieval
Hi, if you want to keep using a Go embedded/in-process vector store, but with some additional features, you can check out my project https://github.com/philippgille/chromem-go
I got most of my answers from the README. Well written. I read most of it.
Can you share what kind of resources (and how much of them) were required to fine tune Wav2Vec2-BERT?
I taught my parents how to use LLM chat apps. I was pleasantly surprised to see them use it all the time. And even more shocked to see them pasting entire whatsap messages containing passwords, upload income tax files, and a lot more private details with LLMs. They rarely pause to think about privacy/security before sharing info with LLM services. So I'm working on an interface that works as a privacy filter, making sure the private info does not leave the device. It redacts /anonymizes/obfuscates private information from what we share with LLMs via on-device model, and plugs back the output with the private info to make it appear almost similar to the output as before.