Hacker Newsnew | past | comments | ask | show | jobs | submit | abmfy's commentslogin

Thanks for commenting! Actually in this case, "the work being done" can be really fast because it can be done asynchronously. For context, here’s how this translates in a real-world application.

The original algorithm was provided by DeepSeek, and our optimized implementation achieves a 92× speedup over it. The 5x number is comparing with another baseline that is undisclosed yet.

When integrating EPLB into vLLM, I discovered—somewhat unexpectedly—that the open-source algorithm consumes nearly half of the total time of a rearrangement step, with the remaining time spent transferring weights across GPUs. To address this, I applied OpenEvolve to the algorithm, setting the primary objective to improve speed while maintaining the same balance factor. It performed remarkably well. With additional optimizations on the weight transferring, the overall overhead has now become almost negligible.


While no one will deny you (or I guess your system) the immense satisfaction of 100x improvement on a given step, I think it would be helpful to note the frequency of this rebalancing step, and to contextualize your result in terms of the runtime (or throughput) of the workload(s) you were using to evaluate.

e: also comparison a fixed (nothing faster than 0!) and random policy might be informative if your intent is to publish this as improvement for the object problem, not just a demonstration of ARDS.


Thanks for letting us know! While we’re tackling different problems, the core idea around load balancing is quite similar.

The pattern might be a familiar trick to those experienced with this kind of problem — you can see my thoughts on it here: https://news.ycombinator.com/item?id=45688236#45689440


It's okay to acknowledge that you missed something in your literature search. Everyone does. It's not okay to sweep it under the rug or pretend that it's novel after having having the prior work pointed out to you, especially when a central part of your thesis is that "AI" discovered a novel algorithm and it's very likely that this algorithm was part of the LLM's training data.


I spent 2~3 hours setting up, most of the time was spent on writing the evaluator

Actually I think the evaluator will be the most important part for the whole pipeline to work


Yes, getting the right workloads and ensuring correctness are crucial parts of the process


It's directly usable, since it need to pass the evaluator first; also it contains clear comments about the intent


I assume this means it still went through human review, more than the evaluator was complete enough to not require it?


Thanks! In realistic workloads, the differences won’t be orders of magnitude.

I agree that this is a fairly simple problem. Experienced engineers—or anyone who has faced similar challenges—can quickly come up with such solutions. The key point, however, is that others might get stuck in their research simply because they don’t realize these quick solutions exist (“I don’t know what I don’t know”). AI helps bridge that gap by making expert-level knowledge accessible to every researcher, allowing them to focus more on exploring the truly unknown parts.


Except that "AI" steals and mostly does not do citations.

EDIT: The chutzpah of downvoting this is striking. The paper says "surpasses highly optimized algorithms engineered by human experts to achieve a 5.0x speedup" and https://news.ycombinator.com/item?id=45689663 links to a 2024 paper where humans discovered a 4.2x speedup using a snake pattern. The 2024 paper is not cited.


Given that, maybe the submission title should be changed?


this should be the top comment

What "AI" is best at is enabling theft without crediting the true creators


that's true for any application of AI :(


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: