Not sure what you mean by "all spatial associativity" - yes, there are deep similarities with the human visual system, but the way it's implemented is quite different - it's still basically linear algebra on very large matrices/tensors. GPUs were not designed for that kind of workload, they were originally designed for pixel shaders which are a mix of scalar and small vector instructions with dependent texture instructions where the address depends on the previous computations. There's nothing like that in deep learning at the moment...
The logical connections for the human eye neurons, linking between the receptors and the brain, look almost exactly like any basic neural net layer visualization you'd see to today. They are so close you'd mistake the two if you didn't pay attention. It's literally layers of neighboring neurons sending signals to the next layer up and so on and so on based on received signals. Just suggesting the reason GPUs happened to be better than CPUs is because GPUs were designed to work on producing visuals, which are processed by a set of real neurons hooked up in tight neighboring layers. So running a set of virtual neurons in tight neighboring layers is just doing something similar enough that a GPU ended up being pretty good at that as well.
This is especially true with training NNs. For deploying you can sit there and do your giant matrix calculations with specialized equipment much easier, but training NNs works so well on GPUs that major companies are happy to keep buying them instead of creating specialized chips (at least deploying NN's was the priority over replacing GPUs for training). But ithat's only true for today, as neural nets today, as mentioned, look (organizationally) a lot like the visual neurons we have attached to our eyes. Initial research suggests our brains look rather different in how the connections are laid out. And this could be why neural nets today can do better than humans at recognizing images, yet spend hundreds of thousands of hours and they still can't drive to save it's (or our) life. And why you need something beyond a basic NN, like recursive networks, to translate speech and etc. IE neural nets can do better than trained doctors at spotting problems in scans, but can't drive with a thousands sensors while a 16 year old (hopefully, it's half a joke) can drive with just two eyes.
The way GPUs work at the moment is you can reasonably expect to get all the activations from external memory - for many (not all) workloads the datasets are too large relative to the cache to hit much. What the L2 cache is mostly good for is improving reuse between different SMs reading the same data inside a single layer.
I really like the idea of using memristors for AI - it feels like a very good fit to some things, and I'm still reasonably excited about memristors despite their relative lack of progress (vs early claims). But that summary seems to imply 91% accuracy on MNIST which isn't very impressive... It's more about something "good enough" at very low cost than a state of the art solution.
Maybe I'm misunderstanding what you're trying to say, but it sounds wrong to me. All of the deep learning frameworks use NVIDIA's cuDNN framework (which is mostly hand-written assembly by NVIDIA). It was the first non-CPU deep learning API supported by a wide variety of framework, and NVIDIA was deeply involved with the framework developers to add support for it.
NVIDIA basically applied the same strategy they had with "The Way It's Meant To Be Played" by spending their own engineering resources to help those framework developers - that's why they have widespread support and nobody else does. Some of the newer frameworks like newer TensorFlow and MXNet support systems that allow third party HW vendors to more easily add their own acceleration - but that wasn't the case back then. There's an argument this is unfair in the same way TWIMTBP was unfair, but your claims about wasting years of multi million dollar salaries developing specifically for Volta feels completely implausible to me.
I'm basically arguing that Nvidia has added a lot of features that are, and may always be, only for Nvidia. AMD is trying to get into the market (And failing so far), and Intel's GPUs, and other chips, will probably be based heavily on training neural nets. But they probably also won't have the exact features put in by Nvidia so any work you do for Volta, any research for it, is simply non transferable. And Nvidia's CUDA libraries were even worse. You got to set up Neural Nets faster, but hey now you're locked into Nvidia (they hope) too bad you can't ever run those programs on anyone else's hardware! Come buy our $5k card because you spent all those months writing for our own tech and have no choice! The point is simple, Nvidia didn't build their libraries out of kindness. They did it because they gambled they'd make more money off it, by locking people in, than it cost in the first place.
Nvidia didn't make anything you couldn't with OpenCL, they just made it exclusive to them and tried lure you into their proprietary ecosystem. And that spells nothing but trouble, it's what Sony used to do. Buy products that only work with other Sony products! It's what Apple and Android both did, or tried to, with their apps, you'll hesitate to switch if you've invested hundreds in apps that suddenly won't work anymore (not that anyone buys apps other than games anymore, and those are F2P so who cares). Point is, they lure you in by making it seem easy, then trap you by trying to lock all the work you've done exclusively to their hardware.