Kimi K2 paper said that the model sparsity scales up with parameters pretty well...

throwdbaaway · 2025-08-05T22:40:25 1754433625

I thought Kimi K2 uses 8 active experts out of 384? Sparsity should be 48:1. Indeed Llama4 Maverick is the only one that has 128:1 sparsity.

liuliu · 2025-08-06T17:06:40 1754500000

You are right. I mis-remembered the sparsity part of K2. The "done wrong" part I was thinking about how the scout -> maverick -> behemoth doesn't scale sparsity according to any formula (less sparse -> sparse -> less sparse).

throwdbaaway · 2025-08-07T04:19:16 1754540356

> how the scout -> maverick -> behemoth doesn't scale sparsity according to any formula (less sparse -> sparse -> less sparse)

Ah I see. I didn't notice that behemoth has the same sparsity as scout. That seems quite random indeed.