Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Each token generated may only use a subset of the parameters (86billion instead of 314billion), but the next generated token might use a different subset. If it's anything like Mixtral, it will switch between experts constantly. It helps with memory bandwidth, but all the parameters still need to be in RAM or it would be unbearably slow.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: