"Swift for TensorFlow rethinks machine learning development ... I imagined, advocated for, coded the initial prototype and many of the subsystems after that; recruited, hired and trained an exceptional engineering team; we drove it to an open source launch and are continuing to build out and iterate on infrastructure."
Can we get some examples where the dominant cofounder was the reason why the business did not succeed? These success stories reference generally smart and savvy executives.
Nvidia was probably under contract for this before mining blew up to where it is today. Provisioning for these machines starts 5-10 years before they ever come online.
What’s not inside the Summit Supercomputer speaks volumes: Intel.
Knights Landing/Hill/Mill is simply not compelling; Omni-Path was created as an infiniband knockoff that doesn’t beat Mellanox. The Cray Gemini/Aries interconnects can be found all over the top of the list (and the Intel acquisition of those interconnects happened in 2012), but you don’t see Omni-Path replacing anything.
Meanwhile, Nvidia comes out with NVLink and begins to build small clusters of GPUs connected by larger networks containing IBM and Mellanox. A vacuum was created, and IBM and Mellanox moved (back) in.
The DOE (and DOD, but with whom I’m less familiar) tends to spread out these purchases over multiple vendors to keep multiple US-based providers able to build and support these machines (and I imagine to keep costs competitive).
The last few acquisitions by ORNL and LANL have been Crays while ANL and LLNL were buying IBM Blue Genes. With this generation, it looks like things have switched. As another poster mentioned, it certainly seems like ANL’s next one will be Cray/Intel. It was going to be based on Knight’s Hill, but Intel cancelling that sort of put the architecture up for grabs.
I would love to see Intel tweaking the Phi line with asymmetric cores like some ARMs do. Having a couple brawny proper Xeon cores and a bunch of smaller 4-thread cores, all coupled with local HBM (and maybe some dedicated HBM for each core) would make it a very versatile part that, with some tuning in number of cores, cache sizes, HBM size, etc, could cover from low-end server all the way to supercomputing.
I don't think there is much doubt core count will increase on all segments and that asymmetric core tech that's currently used in ARM is pretty cool.
It makes sense for multi-node machines, much like we do some tasks mostly on CPUs and others on GPUs within a single node. A processor like this makes much more sense on desktops and general-purpose servers, as most of the time my Xeon cores are doing things an Atom would be perfectly capable of doing at a fraction of the power consumed. This translates into more heat and more cooling. If you consider a Xeon Phi uses 300 Watts for 256 threads, this translates roughly to 1.2 W per thread, which is well within what I would expect from a very puny Atom core. Being able to power down most of my computer while, say, I write this, would be a very nice feature.
Intel knows better than anyone that they need to sell a roadmap, not a chip. Who’s going to put in a large advance order on a bunch of future silicon that may get cancelled as well?
Knockoff? Infinipath used Infiniband at first because we could only build one chip, not a host adapter and a switch. Turns out that Infiniband is flexible enough that we kept on doing that for several generations.
Omni-path isn't support on AMD chips, let alone Power 9. I don't agree with the decisions, but there you go. We switched from 3 generations of Pathscale IB adapters to Mellanox because of it.
Sometimes I wonder about the longer-term future of infiniband. Originally it was a multi vendor standard, but nowadays it's a one-shop show. Squeezed from above by high end networks supporting things like adaptive non-minimal routing, and from below by ethernet (roce etc.) And squeezed from the side by Intel with their deep pockets.
And apparently mlnx is pressured by some activist investors to reduce R&D expenses and pay more dividends instead.
Yes, but rumor has it that Intel will still be the supplier for Argonne’s next computer. Which if everything goes to plan will be the first exascale machine.
Even the previous generation Phi's pack about 3.x peak GFLOPS per socket. Tuning HBM size and AVX512 pipelines can certainly help increase that, even if their 10nm is proving harder than expected. It's a matter of time before they can go full 10nm.
You certainly see OPA replacing IB -- I personally know several examples, though some HPC vendors now seem to be making it difficult to buy. It has a fairly healthy showing at top500, considering it hasn't been widely available for long. I know the Mellanox propaganda, and have some experience of it, along with promises of help to make it work that evaporated.
I can't remember where to look for OPA's features to help MPI implementation, but someone else might be able to comment.