I don't know why you were downvoted for this. OpenGL spec is a mess. I think a lot of the driver issues arise from the fact that OpenGL has so much backwards compatibility and so much complexity.
Not to mention how obtuse it is to learn. Even without the backwards compatibility issues (which are very real), the entire mental model of how OpenGL works is completely messed up.
Things like having a texture ID, which you then bind to a particular target on a particular texture unit in order to use it make so little sense to someone learning it for the first time. On the CPU, I just pass the pointer to my image to a function to manipulate it. I don't have to put it into a special slot of a special structure in a special place in memory! It took me years to understand many of these things, and I see others struggling in the exact same way on Stack Overflow, for example. So sad.
Yeah, the API is basically insane. Not to mention how many "best" practices aren't. For instance, I remember a few years ago the advice was always "Use VBO's, don't use display lists they're deprecated". Ok, but when I benchmarked them against each other display lists were still twice as fast for the geometry I was rendering than very carefully constructed VBO's. Wtf.
If I'm not mistaken there are hardware reasons why VBOs will never be as fast as other methods, at least that is what I dimly recall having heard from a talk by a guy at Valva / Nvidia.
Texture binding in GL is utterly insane. The D3D model, as I understand it is really straightforward - textures are basically pointers to some information on an image buffer, and you can store a texture directly into a texture uniform. So to 'bind' a texture in D3D you just store it directly into the sampler. I forget whether sampling options like filtering mode are part of that, or part of a separate structure - either way, in GL sampling is part of the texture, while in D3D it is separately configurable which is a GODSEND.
IIRC in modern versions of D3D (10+? 11+?) they've expanded on this and just have the general concepts of views and buffers, so that you can treat textures and vertex buffers as if they share underlying properties, and shader code can manipulate them in similar ways. This is great for compute and GPU-accelerated processing and GPU feedback.
You can separate Sampler and Texture in OpenGL too (and they were informally separate since multitexturing extension circa GL 1.2). But when not using a Sampler, legacy Texture settings still apply of course :)
The difficulty of implementation is at most half the problem, IMO. Driver vendors can afford to maintain a 'merged' version of the spec with extension diffs applied, and can pay experts to acquire and retain knowledge on the breadth of the spec.
The challenges arise in validation and actual development. Validating that a driver works correctly is VERY difficult due to the complexity of the spec, and you can't really afford to have a huge test team with as much experience and knowledge as your driver development team. Even once you've validated and shipped your driver, you can't know how end-users are going to exercise it.
Then as a developer, not only do you have poor knowledge of the spec, but you have no knowledge of how each vendor interpreted the spec and whether or not their implementation matches their expectations.
As the surface area of OpenGL and the complexity of each entry point grow, this is only getting worse.