The linear node layout is not the point at all. It's serving two purposes here: ...

The linear node layout is not the point at all.

It's serving two purposes here:

1) Providing us with a correct "guess" of what the next node is. 2) Ensuring that in all cases we're running from the L1 cache.

In real world code, you'd be correct -- getting things to run out of L1/L2 is the most important attribute. This is specifically about a micro-optimization that allows you to beat the obvious code even when running completely from cache!