Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s not 8x86B. Total number of parameters is 314B.

Perhaps it’s 8x39B to fit on a single 8xA100 (40GB) server?



They all do this marketing bull.

Mixtral has an 8x7B model but it's actually 46.7B, not 56B params.

Kinda similar to how 4K displays are 3840 pixels wide, not true 4K which would be 4096. Marketing people called it 4K, not engineers.


I've always thought of 4K as "4x FullHD". In that way it makes sense.


Bleh no, K means thousand.

For a long time we specified displays by their vertical dimension -- 480p, 720p, 1080p.

Then the marketing guys came along and decided that the horizontal dimension sounds bigger. If we stuck with the less-bullshitty way of doing things and kept comparisons 1:1, we'd call 3840x2160 displays 2160p or "2K" displays, but instead, the marketing people decided that we're going to change things to horizontal and called 3840x2160 "4K".


TV and Digital Cinema have different standards, because of course they do


It's 2x Full HD though


2x in a single direction, 4x the number of pixels


Oh yeah... What I meant is, 1920x2 = 3840 ~~ 4000


Active parameters is 86B, so wouldn't that be the size of the largest two experts (where they may all be the same) + the weights of the selector?


Most likely it's a MoE of Grok-0 which would be 8x33B + 50B for the router.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: