Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't this crippling a compile-time thing?

Is there something in binary that executes best-performing instructions (as opposed to execute just the instructions compiled in) when it's being executed on a specific CPU? If so, how exactly does it work?



It's actually a runtime switch. A compiled x86 binary that uses extra-wide number-crunching instructions (SSE etc) must also work on older processors that don't have those instructions, so it will have two or more code paths. The code paths all perform equivalent computations, but using different instructions.

For example, if you are adding 4 pairs of 64-bit numbers, and there's a special add-4-pairs-of-64-bit-numbers instruction, but it's specified as part of SSE4 (I made that up, but it's the kind of thing that you would find), then you can ask the CPU if it supports SSE4. If it does, then you say great, use this code path that requires SSE4, and we'll do the whole operation in three instructions: load, add, store. Or something.

However, if the CPU says that it doesn't support SSE4, then you'd better have a backup plan. It doesn't have to run as fast, but it should compute the same answer. If it's compiled C code (as opposed to hand-written assembler), the compiler will have you covered. Instead of a single SSE4 instruction, maybe it will take 4 regular 64-bit x86 add instructions instead.

(And if you've written it in assembler, then you probably provided the compiler with a backup C implementation to use if SSE4 isn't supported.)

Intel's compiler is being unfair to AMD CPUs because -- even if they support the instructions that you want -- it won't use them. It will unnecessarily fall back to the plain old non-SSE x86 instructions.


Thanks for your answer, it did not occur to me initially, but it makes a lot of sense!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: