1) Providing us with a correct "guess" of what the next node is.
2) Ensuring that in all cases we're running from the L1 cache.
In real world code, you'd be correct -- getting things to run out of L1/L2 is the most important attribute. This is specifically about a micro-optimization that allows you to beat the obvious code even when running completely from cache!
It's serving two purposes here:
1) Providing us with a correct "guess" of what the next node is. 2) Ensuring that in all cases we're running from the L1 cache.
In real world code, you'd be correct -- getting things to run out of L1/L2 is the most important attribute. This is specifically about a micro-optimization that allows you to beat the obvious code even when running completely from cache!