That's the only layer where it makes sense because that's where you know what it is that you are trying to achieve. The overhead in GPU programming is such that if you make one small assumption that doesn't hold true in practice you may end up sinking your performance in a terrible way. So you need a lot of control over where and how things are laid out. For more generic stuff there are libraries, but those too run at the behest of your application. As this technology matures you'll see more and more abstraction and automation of the parts that squeeze out the most performance. But for now that's where you can make the biggest gains, just like any other kind of special purpose co-processor.