Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Novice question: If they built something other than classic dram modules with the wafers maybe they could achieve faster bus speeds? How does Apple do it?




> How does Apple do it?

Apple uses off the shelf LPDDR modules. They have nothing whatsoever special about them.

Apple gets high bandwidth out of these modules not with high bus speeds, but with a very, very wide bus. This is expensive on the SoC side (requires a large die, which is why others don't necessarily do this), but allows for commodity memory modules.


That's what HBM is actually. The memory dies are directly next to the GPU die, on the same substrate. The main difference between Apple SoC and GPUs is the former use regular LPDDR while GPUs use HBM.

One of the key points of HBM is that dies are stacked up with many, MANY, more signals and channels. That's how NVIDIA has a memory bandwidth an order of magnitude higher than M4: 550GB/s for the M4 Max, 4.6TB/s for H200. And yes, that's bytes per second, not bits per second.


> while GPUs use HBM.

some GPUs use HBM. Most use GDDR. AMD and Nvidia still extract huge bandwidth from GDDR via high bus speeds + wide buses (like the 1.79 TB/s on the 5090)


Indeed! I implied "AI GPUs" since that’s where HBM is commonly used (despite AMD pioneering it on some consumer cards). And yeah, thousand-bit wide busses get close in performance.

shorter traces than soldered DIMM allow higher MT/s, this is fixed by CUDIMM/CAMM2, the other part of this is # of memory channels on the board, not sure why, but most consumer DDR5 boards have been 2 memory channels, you need to go to threadripper to get 4 or 8, It's unclear to me if this will still be an issue with future platforms.

You're not paying enough attention to the performance and cost impact of connectors and sockets. CAMM2/LPCAMM and CUDIMM have yet to be demonstrated operating at speeds that speeds that match the fastest soldered LPDDR, let alone GDDR; there's still a clear advantage for soldering memory.

CPU sockets with more than two memory channels are also far more expensive; the higher pin count usually increases the number of layers the motherboard needs, and the larger size of the socket requires more metal for stiffening (and EPYC CPUs still have issues with imperfect mounting leading to some IO lanes not working).

Using BGA soldering for both the processor and the memory sidesteps a bunch of engineering challenges.


> Using BGA soldering for both the processor and the memory sidesteps a bunch of engineering challenges.

by trading them with longevity challenges.

Even though 2 decades have past after the "usual suspect" lead-free solder, gpu or vram chips needing a reball is still a common occurrence from a cursory look at YouTube channels of professional electronics repairmen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: