I guess Microsoft’s investment into Graphcore didn’t pay off. Not sure what they’re planning but more of that isn’t going to cut it. At the time (late 2019) I was arguing for either a GPU approach or specialized architecture targeting transformers.
There was a split at MS where the ‘Next Gen’ bayesian was being done in the US and the frequentist work was being shipped off to China. Chris Bishop was promoted to head of MSR Cambridge which didn’t help.
Microsoft really is an institutionally stupid organization so I have no idea on which direction they actually go. My best guess is that it’s all talk.
Microsoft lacks the credibility and track record for this to be anything but talk. Hardware doesn’t simply go from zero to gigawatts of infrastructure on talk. Even Apple is better positioned for such a thing.
They do have such a dedicated chip, the MAIA 100 chip which is an in-house chip, and it is a chip that was designed in the era of transformers, and this is what is being discussed in the interview.
I missed that, it’s been a few years since I’ve paid attention to MS hardware and it is very possible that my thoughts are out of date. I left MS with a rather bad taste in my mouth. I’m checking out the info on that chip and what I am seeing is a little light on details. Just TPUs and fast interconnects.
What I’ve found; MIAI 200 the next version is having issues due to brain drain, and MIAI 300 is to be an entirely new architecture so the status for that is rather uncertain.
I think a big reason MS invested so heavily into OpenAI was to have a marquee customer push cultural change through the org, which was a necessary decision. If that eventually yields in a useful chip I will be impressed, I hope it does.
I used to work for Graphcore. Microsoft gave up on Graphcore fairly early on actually and I think it was a wise decision. There were a number of issues with Graphcore's GC1 and GC2 chips:
* a bet on storing the entire model (and code) in 900MB of SRAM. Hell of a lot of SRAM but it only really works for small models and the world wants enormous models.
* Blew it's weirdness budget by a lot. Everything is quite different so it's a significant effort to port software to it. Often you did get a decent speedup (like 2-10x) but I doubt many people thought that was worth the software pain.
* The price was POA so normal people couldn't buy one. (And it would have been too expensive for individuals anyway.) So there was little grass roots community support and research. Nvidia gets that because it's very easy to buy a consumer 4090 or whatever and run AI on it.
* Nvidia were killing it with Grace Hopper.
GC3 was way more ambitious and supports a mountain of DRAM with crazy memory bandwidth so if they ever finish it maybe they'll make a comeback.
There was a split at MS where the ‘Next Gen’ bayesian was being done in the US and the frequentist work was being shipped off to China. Chris Bishop was promoted to head of MSR Cambridge which didn’t help.
Microsoft really is an institutionally stupid organization so I have no idea on which direction they actually go. My best guess is that it’s all talk.