It's not just Qwen; we also recently had GLM-4.7-Flash in the same roughly 30B-A3 range. Seems to me like there's no shortage of competition for good old GPT-OSS 20B (not just Qwen3.5-35B and GLM-4.7-Flash, but also Qwen3(-Coder)-30B or Granite 4 Small).
Yeah, I had written the blog to wrap my head around the idea of 'how would someone even be printing Weights on a chip?' 'Or how to even start to think in that direction?'.
I didn't explore the actual manufacturing process.
That's the kind of hardware am rooting for. Since it'll encourage Open weighs models, and would be much more private.
Infact, I was thinking, if robots of future could have such slots, where they can use different models, depending on the task they're given. Like a Hardware MoE.
Is this accurate? I don't know enough about hardware, but perhaps someone could clarify: how hard would it be to reverse engineer this to "leak" the model weights? Is it even possible?
There are some labs that sell access to their models (mistral, cohere, etc) without having their models open. I could see a world where more companies can do this if this turns out to be a viable way. Even to end customers, if reverse engineering is deemed impossible. You could have a device that does most of the inference locally and only "call home" when stumped (think alexa with local processing for intent detection and cloud processing for the rest, but better).
It's likely possible to extract model weights from the chip's design, but you'd need tooling at the level of an Intel R&D lab, not something any hobbyist could afford.
I doubt anyone would have the skills, wallet, and tools to RE one of these and extract model weights to run them on other hardware. Maybe state actors like the Chinese government or similar could pull that off.
I came across KOReader when I was trying to jailbreak my kindle. It's UI looked great on e-ink screen. And it handled almost all ebook formats properly.
Lately, I've used it on Android, and UI which is more suited for e-ink screens, look not so polished on phones, but that's just nitpicking. It's fully usable and keeps adding support for new platforms.
It really shines on e-ink Android readers such as the tablets Boox makes. I almost exclusively use it on my Boox because the built-in reading app is terrible.
I found Adam Mastroianni's blog through a HN post titled "How to debog Yourself". Unlike another pop-sci articles, this one had actual depth and enjoyed reading it.
Since then, I've read and digested most of his posts and comments. He usually writes about human behavior, not really offering the solutions, but the reasons.
He is the one author I've screenshot-ed most in 2024. I'd recommend starting with this post:
Especially that Qwen3.5-35B-A3 looks great for cheaper GPUs. Since a quant version of it would need a <32 GB RAM.