Isn't this what Dalvik does? (And by extension, ART)
IIRC, a register-based virtual machine would alleviate the cache-miss behavior the GP talks about. (Because larger, more complex instructions = fewer hits to cache)
It's been a while since I've mucked around with this type of stuff, though.
Nope, you're thinking a one level too low in the stack.
The issue is that each object is a reference, and because it's a reference it's position in memory is by nature ambiguous(compacting GCs make this even worse by moving things around).
Until you can guarantee memory location you don't know that the N+1 object you're going to access is in the same cache line(or prefetched) and any sort of cache optimization is shot to hell. Only by knowing your standard execution flow and data set size can you write code that's as efficient about cache misses as possible.
Hotspot only tends to focus on tight inner loops where the stuff I'm talking about involves looking at the larger program(and the structures you choose to put data into). Think Structure of Arrays(SoA) rather than Arrays of Structures(AoS). Possible to do in Java, but very painful.
Not only that, but IIRC at the beginning each ref was a pointer into a table that would store the actual address of the object. So every ref was a double pointer dereference.
IIRC, a register-based virtual machine would alleviate the cache-miss behavior the GP talks about. (Because larger, more complex instructions = fewer hits to cache)
It's been a while since I've mucked around with this type of stuff, though.