Yes the M4 Base has 120 GB/s, Pro 273 GB/s and Max has 546 GB/s... That means M5 Pro is potentially around 348 GB/s and M5 Max is almost at 700 GB/s - for comparison a 4090 has around 1,000 GB/s. So pretty incredible!
Also I think even an M3 Ultra is more cost effective at running LLMs than 4090 or 5090. Mostly due to being more energy efficient. And less fragile than running a gamer PC build.
It can run larger models quite slowly but lacks matmul acceleration (included in the M5) that is very useful for context and prompt performance at inference time. I will probably burn my budget with an M5 Max with 256gb (maybe even 512gb) memory, the price will be upsetting but I guess that is life!
Yes! I think smaller models on the M3 Ultra is interesting enough, but now with matmul/ tensors on M5 Ultra or Max, with decent unified mem, it will be a gamechanger.
I can easily imagine companies running Mac Studios in prod. Apple should release another Xserve.
DDR5-9600 is 153GB/s from a single channel, Max has 4 channels… these are all theoretical values of course - real world none of these, even the graphics card will get that near to those… so not sure what you’re saying.