The main problem is that NVidia is screwing things up. NVidia is only supporting...

The main problem is that NVidia is screwing things up. NVidia is only supporting OpenCL1.1, which means that if you want to use C++ / SPIR, you pretty much are locked to AMD / Intel (Intel CPUs have an OpenCL -> AVX layer, so you can always "worst-case" turn OpenCL code into native CPU code)

NVidia of course owns CUDA, which means they want those "premium features" locked to CUDA-only.

--------

AMD's laptop offerings offer some intriguing features on OpenCL as well. Since their APUs have a CPU AND a GPU on the same die, the data-transfer between CPU / GPU on the AMD APUs (ie: an A10 laptop chip) is absurdly fast. Like, they share L2 cache IIRC, so the data doesn't even hit main-memory, or even leave the chip.

But there's basically no point optimizing for that architecture, as far as I can tell anyway.