The main problem is that NVidia is screwing things up. NVidia is only supporting OpenCL1.1, which means that if you want to use C++ / SPIR, you pretty much are locked to AMD / Intel (Intel CPUs have an OpenCL -> AVX layer, so you can always "worst-case" turn OpenCL code into native CPU code)
NVidia of course owns CUDA, which means they want those "premium features" locked to CUDA-only.
--------
AMD's laptop offerings offer some intriguing features on OpenCL as well. Since their APUs have a CPU AND a GPU on the same die, the data-transfer between CPU / GPU on the AMD APUs (ie: an A10 laptop chip) is absurdly fast. Like, they share L2 cache IIRC, so the data doesn't even hit main-memory, or even leave the chip.
But there's basically no point optimizing for that architecture, as far as I can tell anyway.
NVidia of course owns CUDA, which means they want those "premium features" locked to CUDA-only.
--------
AMD's laptop offerings offer some intriguing features on OpenCL as well. Since their APUs have a CPU AND a GPU on the same die, the data-transfer between CPU / GPU on the AMD APUs (ie: an A10 laptop chip) is absurdly fast. Like, they share L2 cache IIRC, so the data doesn't even hit main-memory, or even leave the chip.
But there's basically no point optimizing for that architecture, as far as I can tell anyway.