Hacker Newsnew | past | comments | ask | show | jobs | submit | aurohacker's commentslogin

Any tips on reading material for code generation and scheduling for parallel engines?

Are there similar efforts for PgVector, the PostgreSQL extension.


Great answers here, in that, for MoE, there's compute saving but no memory savings even tho the network is super-sparse. Turns out, there is a paper on the topic of predicting in advance the experts to be used in the next few layers, "Accelerating Mixture-of-Experts language model inference via plug-and-play lookahead gate on a single GPU". As to its efficacy, I'd love to know...


Figure 1 in the paper is all about the encoder and how the context and query is packaged and sent to the decoder. I wish it were more complete...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: