So it's the fault of OpenCL for being low-level, providing you more control and allowing you to tune for different architectures? Do you want languages like RenderScript to take over where you have less control? Then you can never get 100% performance out of any architecture. Take your pick.
Its not a fault, some things have to be done that way, but to do that with every single piece of hardware and generations, just make it a pain in the ass. This is what API's are supposed to do, to lessen the burden on programmers as they set the rules and guidelines to create something that is streamlined and works well on all hardware that supports that API, yes you still need to be weary of the underlying architecture but not so much as to make extremely different paths for all GPU's and then different GPU's of the same gen too.
Look at mining for example, can you tell me the difference between a cuda miner vs. an Open Cl one? which would work better over all if one was to optimize for said hardware and use all the features of a said API?