Hacker Newsnew | past | comments | ask | show | jobs | submit | foundzen's commentslogin

It is surprising that it works (I haven't tried it). `Content-Length` had one goal - to ensure data integrity by comparing the response size with this header value. I expect http client to deal with this out of the box, whether gzip or not. Is it not the case? If yes, that changes everything, a lot of servers need priority updates.


You don't need to set a content length header, it'll take the page as finished when you close the connection


I read the complete article. It does feel like written by human but only if you don't read it all. So overall, I would have saved time if it was written by AI completely (by choosing to not read after a glance)


Only 20 training samples improved llm performance, that sounds unrealistic! My experience with RLHF for LLM perf differs. Can you be more specific about the case where you achieved this and share technical details about how do you do that?


We are not doing RLHF but fine-tuning directly on a reward function. Our task was around improving a coding agent, coding in JSONata(https://jsonata.org).

GPT4o is quite bad in this, as there are not too many JSONata snippets on the internet. We collected 20 coding problems; the reward function then just assigned a scalar value based on whether the code output of the model was syntactically correct or not (Most interestingly, we found that by optimizing the syntax, it also got better at getting the semantics correct)

I think the discrepancy between our result with direct RL and your experience with RLHF comes from the fact that RLHF is built around non-verifiable/subjective domains, where intrinsically, the reward signal obtained by the HF-proxy is weak(er), i.e. for the same training scenario/prompt you need more samples to get to the same gradient.


RLHF != RL


why everyone ends up using yjs as their choice of CRDT framework. aren't there better alternative or it is just following the popular choice?


the author posted it on HN, marked their task done, and went to sleep. and we are wondering why do we need another project management tool.


What is the architecture/tech-stack used in building this? I didn't find this info neither on github readme, nor on website.

I like the fact that it is written in Go and small enough to skim over the weekend, but after repeatedly burning my time on dozens of llm ecosystem tools, I'm careful in choosing to even explore the code myself without seeing these basic disclosures upfront. I'm sure you'd see more people adopting your tool if you can provide a high-level overview of the project's architecture (ideally in a visual manner)


Hey! Yes, that's something I was planning to do—a complete documentation on the code, its architecture, and the entire stack to allow others to develop alongside me. I just deployed a functional version, and soon, the website will have documentation with its architecture and a visualization of the entire code.

but for now here is the stack used: Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution) CLI Framework: Cobra (for command-line interface structure) LLM Integration: Ollama API (for embeddings and completions) Storage: Local filesystem-based storage (JSON files for simplicity and portability) Vector Search: Custom implementation of cosine similarity for embedding retrieval


Hi, if you want to keep using a Go embedded/in-process vector store, but with some additional features, you can check out my project https://github.com/philippgille/chromem-go


Why not use an established open source vector db like pg_vector etc? I imagine your implementation is not going to be as performant


Defeats the point of the single binary installation if you have to set up dependencies.


rlama requires a python install (and several dependencies via pip) to extract text.

https://github.com/DonTizi/rlama/blob/main/internal/service/...


I recommend using this hybrid vector/full text search engine that works across many runtimes: https://github.com/oramasearch/orama


Love the creativity in the branding but it did not work in my case either. Gibberish raw content and error in answering any question.


I got most of my answers from the README. Well written. I read most of it. Can you share what kind of resources (and how much of them) were required to fine tune Wav2Vec2-BERT?


It takes about 45 minutes to do the current training run on an L4 GPU with these settings:

    # Training parameters
    "learning_rate": 5e-5,
    "num_epochs": 10,
    "train_batch_size": 12,
    "eval_batch_size": 32,
    "warmup_ratio": 0.2,
    "weight_decay": 0.05,

    # Evaluation parameters
    "eval_steps": 50,
    "save_steps": 50,
    "logging_steps": 5,

    # Model architecture parameters
    "num_frozen_layers": 20
I haven't seen a run do all 10 epochs, recently. There's usually an early stop after about 4 epochs.

The current data set size is ~8,000 samples.


I taught my parents how to use LLM chat apps. I was pleasantly surprised to see them use it all the time. And even more shocked to see them pasting entire whatsap messages containing passwords, upload income tax files, and a lot more private details with LLMs. They rarely pause to think about privacy/security before sharing info with LLM services. So I'm working on an interface that works as a privacy filter, making sure the private info does not leave the device. It redacts /anonymizes/obfuscates private information from what we share with LLMs via on-device model, and plugs back the output with the private info to make it appear almost similar to the output as before.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: