I'm basically arguing that Nvidia has added a lot of features that are, and may always be, only for Nvidia. AMD is trying to get into the market (And failing so far), and Intel's GPUs, and other chips, will probably be based heavily on training neural nets. But they probably also won't have the exact features put in by Nvidia so any work you do for Volta, any research for it, is simply non transferable. And Nvidia's CUDA libraries were even worse. You got to set up Neural Nets faster, but hey now you're locked into Nvidia (they hope) too bad you can't ever run those programs on anyone else's hardware! Come buy our $5k card because you spent all those months writing for our own tech and have no choice! The point is simple, Nvidia didn't build their libraries out of kindness. They did it because they gambled they'd make more money off it, by locking people in, than it cost in the first place.
Nvidia didn't make anything you couldn't with OpenCL, they just made it exclusive to them and tried lure you into their proprietary ecosystem. And that spells nothing but trouble, it's what Sony used to do. Buy products that only work with other Sony products! It's what Apple and Android both did, or tried to, with their apps, you'll hesitate to switch if you've invested hundreds in apps that suddenly won't work anymore (not that anyone buys apps other than games anymore, and those are F2P so who cares). Point is, they lure you in by making it seem easy, then trap you by trying to lock all the work you've done exclusively to their hardware.
Although that ignores one of the big selling points and that being for many CUDA and its integration with various frameworks while also having very optimised libraries with quite a lot of flexibility when considering diverse solutions and implementations.
Whichever large scale HW solution scientists/devs use they will have to spend a lot of time learning and optimising their code, especially if they require both modelling-simulation and training.
Importantly Nvidia heavily support a broad range of frameworks.
But as a reference even moving from traditional Intel Xeon to Phi meant a lot of reprogramming/optimsing to make it worthwhile, one of the HPC labs investigated this and published their work.
I agree CUDA will split opiniions though, with some looking to avoid it while others embrace it from an HPC perspective.