Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You should first check if the compiler doesn't do it for you.

No, compiler cannot rewrite your bytecode into a threaded code.

> Otherwise you're just making your code harder to maintain and less portable for zero gain.

No. Threaded code is almost always faster, even if run only once.

> a proper jump table

I'm not talking about jump tables. I'm talking about replacing the bytecode indices with jump addresses once (yes, with a precalculated table) and then doing an indirect jump for each translated opcode from the input stream.

An example: https://github.com/ocaml/ocaml/blob/trunk/byterun/interp.c

(see the Next macro definition)



I don't follow you, what do computed gotos have to do with threaded code?

I'm not sure I understand the code you've linked but "goto (void )(jumptbl_base + *pc++)" looks very much like my big "match" statement except that I match only a subset of the 32 bit instruction.

I don't see why the compiler wouldn't be able to rewrite a bit switch/match statement as a jump table without having to spell it out directly.

> I'm not talking about jump tables. I'm talking about replacing the bytecode indices with jump addresses once (yes, with a precalculated table) and then doing an indirect jump for each translated opcode from the input stream.

Now you've lost me completely.


> I don't follow you, what do computed gotos have to do with threaded code?

1. Build a jump table

2. Read a bytecode

3. Replace all the instruction codes in the bytecode stream with values from the jump table (i.e., build a threaded code from your bytecode).

4. In a loop, take the next value at PC and jump to it - it's already an address. No table lookups (just once, when you load your bytecode). So it's much faster than any switch can be.

EDIT: probably you did not notice that for most of the targets jumptbl_base = 0, and an indirection is removed.


Oh I think I get it, thanks (although I still don't understand what "threaded" means in this context).

The only downside I can see is that you'd significantly increase the size of the code. On a 64bit architecture you trade each byte for a 64bit address, effectively multiplying by eight the footprint in the data cache. A 256 entry LUT on the other hand will fit snugly in cache and the lookup shouldn't be very costly.

Also if I understood you correctly what you're proposing doesn't have much to do with the "computed gotos" extension.


> I still don't understand what "threaded" means in this context

https://en.wikipedia.org/wiki/Threaded_code#Indirect_threadi...

> The only downside I can see is that you'd significantly increase the size of the code

This is why adding the jumptbl_base on 64-bit platforms. Pointers are still 32-bit, just with an added offset (and a tiny overhead).

> doesn't have much to do with the "computed gotos" extension

You cannot implement indirect threaded code without a computed goto. You can implement a direct threaded code, of course (generating jump or call instructions directly), but this is a totally different thing.


Ah, thank you for the link, that was the missing piece of the puzzle! :)


Even without inlining? Since the compiler cannot know what instruction will be called, it cannot inline it in the loop.


Why do you want any inlining? It's a threaded code. You know an address of each instruction handler, so you can replace an opcode with this address and eliminate a switch and any table lookups altogether.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: