Except it is only worth doing, if when taking into account loading data into the GPU and getting the results back, is still faster than total execution on the CPU.
It doesn't help that GPU beats the CPU in compute, if a plain SIMD approach outperforms the total execution time.
It doesn't help that GPU beats the CPU in compute, if a plain SIMD approach outperforms the total execution time.