As someone who has been writing hand-optimised GPU code for a while now, I'd argue that this high level view (and thus an appropriate functional high-level language) can get you reasonably decent performance on a GPU for little effort for many highly parallel problems.
And there are still plenty of cases where this will leave performance improvements of an order of magnitude on the table, because the compiler isn't smart enough to work out the optimal sequencing of operations. Often the computational core which benefits from running on a GPU is small and its complexity low enough that, if performance is a priority, you are better off writing it in an imperative low-level language as a library which can be accessed from higher level code.
I'm asking because I'm part of the team that's working on the Futhark language[0], which is exactly what you're describing: a high-level array language in a functional style, targeting the GPU. We're always on the lookout for interesting applications or benchmarks. In particular, we're interested in knowing exactly where our shortcomings are, compared to hand-written GPU code, so that we may work on improving them. For the curious, we think we're doing reasonably well compared to benchmark suites such as Rodinia[1], though there are still cases where we cannot keep up.
And there are still plenty of cases where this will leave performance improvements of an order of magnitude on the table, because the compiler isn't smart enough to work out the optimal sequencing of operations. Often the computational core which benefits from running on a GPU is small and its complexity low enough that, if performance is a priority, you are better off writing it in an imperative low-level language as a library which can be accessed from higher level code.