> What the VLIW of Itanium needed and never really got was proper compiler suppo...

bri3d · 2025-05-09T19:41:07 1746819667

No, VLIW is even worse than this. Describing it as a compiler problem undersells the issue. VLIW is not tractable for a multitasking / multi tenant system due to cache residency issues. The compiler cannot efficiently schedule instructions without knowing what is in cache. But, it can’t know what’s going to be in cache if it doesn’t know what’s occupying the adjacent task time slices. Add virtualization and it’s a disaster.

sitkack · 2025-05-09T20:12:04 1746821524

It only works for fixed workloads, like accelerators, with no dynamic sharing.

pezezin · 2025-05-10T01:39:23 1746841163

Yeah, VLIW is still used for stuff like DSP and GPUs, but it doesn't make sense for general computing.

justsid · 2025-05-10T16:28:36 1746894516

GPUs have long since moved away from VLIW as well

gpderetta · 2025-05-11T13:30:36 1746970236

They are still mostly statically scheduled which I think it is the point the parent is making.

rcxdude · 2025-05-10T00:05:29 1746835529

>What happened was that static scheduling stayed really hard while the transistor overhead for dynamic scheduling became irrelevantly cheap

Is the latter part true? AFAIK most of modern CPU die area and power consumption goes towards overhead as opposed to the actual ALU operations.

wtallis · 2025-05-10T01:29:26 1746840566

If it's pure TFLOPs you're after, you do want a more or less statically scheduled GPU. But for CPU workloads, even the low-power efficiency cores in phones these days are out of order, and the size of reorder buffers in high-performance CPU cores keeps growing. If you try to run a CPU workload on GPU-like hardware, you'll just get pitifully low utilization.

So it's clearly true that the transistor overhead of dynamic scheduling is cheap compared to the (as-yet unsurmounted) cost of doing static scheduling for software that doesn't lend itself to that approach. But it's probably also true that dynamic scheduling is expensive compared to ALUs, or else we'd see more GPU-like architectures using dynamic scheduling to broaden the range of workloads they can run with competitive performance. Instead, it appears the most successful GPU company largely just keeps throwing ALUs at the problem.

titzer · 2025-05-10T02:17:27 1746843447

I think OP meant "transistor count overhead" and that's true. There are bazillions of transistors available now. It does take a lot of power, and returns are diminishing, but there are still returns, even more so than just increasing core count. Overall what matters is performance per watt, and that's still going up.