And only now, in 2015, for some strange reason these features suddenly appear.
There are a lot of reasons why "now" is a good time. For some reason people have gotten the impression that because today we can do a pretty nice, portable low-overhead API, that this was always the case. It absolutely wasn't... GPU hardware itself has only recently gotten to the level of generality required to support something like DX12. It would not have worked nearly as nicely on a ~DX10-era GPU...
Another huge shift in the industry lately is the centralization of rendering technology into a small number of engines, written by a fairly small number of experts. Back when every studio write their own rendering code something like DX12 would have been far less viable. There are still people who do not want these lower-level APIs of course, but most of them have already moved to using Unreal/Unity/etc
More than that, I've seen numerous developers trying to persuade MSFT that the change is needed, but it didn't happen up until now.
The notion that it hasn't been blatantly clear to everyone for a long time that this was a problem is silliness. Obviously it has been undesirable from day one and it's something that has been discussed during my entire career in graphics. Like I said, it's a combination of secondary factors that make things like DX12 practical now, not some single industry event.
yet they still manage to drive forward a much needed initiative like Mantle, and then everyone else kicks them into the shins, takes the project and runs off with it. Up next: Nvidia and Intel take Adaptive sync via display port and run off with it.
That's a twisted view of the world. Frankly it's not terribly difficult to make an API that maps well to one piece of hardware. See basically every console API ever. If you feel bad that the other "modern" APIs are apparently ripping off Mantle (and just trivially adding portability in an afternoon
) then you should really feel sorry for Sony and others in the console space who came up with these ideas much earlier still (see libgcm). But the whole notion is stupid - it's a small industry and we all work with each other and build off good ideas from each other constantly.
Ultimately if Mantle turns out to be a waste of engineering time (which is far from clear yet), AMD has no one to blame but themselves in terms of developing and shipping a production version. If their intention was really to drive the entire industry forward they could have spent a couple months on a proof of concept and then been done with it.
The adaptive sync thing is particularly hilarious as that case is a clear reaction to what NVIDIA was doing... but building it on top of DP is
precisely because this stuff was already in eDP! It's also highly related to other ideas that have been cooking there for some time (panel self refresh). Don't get fooled by marketing "we did it first" nonsense.
So much goes on behind the scenes of API development that I'm not sure we will know the full story, or if given the number of players and viewpoints we can.
Yep, as you say I'm not even sure the notion of a "full story" is well-defined, despite human nature wanting a tidy narrative
Certainly multiple parties have been working on related problems for quite a long time here, and no one party has perfect information. As folks may or may not remember from the B3D API interview (
http://www.beyond3d.com/content/articles/120/3), these trends were already clearly established in 2011. Even if Mantle was already under development then (and at least it seems like Mike didn't know about it if it was), I certainly didn't know about it
Yet - strangely enough - we were still considering all of these possibilities then.
A lot of things can be done to shift things one way or another, and each party can cite their fraction of the overall story to justify their desired narrative.
Exactly, which is why it's ultimately a useless exercise. All of the statements can even be correct, because that's how the industry actually works: nothing ever comes from a vacuum.
It's kind of off topic, but imho GPU is "a better CPU. Period."
Where's the eye roll emoticon when I need it...
Yes. However the GPU is VERY good at viewport culling and occlusion culling. This means that the GPU needs to eventually be able to submit all the draw calls (because it knows the visible set, the CPU does not).
I'm not sure I'm totally bought in to either sentence there
The GPU is pretty good at the math for culling, but it's not great with the data structures. You can argue that for the number of objects you're talking about you might as well just brute force it, but as I showed a few years ago with light culling, that's actually not as power-efficient. Now certainly GPUs will likely become somewhat better at dealing with less regular data structures, but conversely as you note, CPUs are becoming better at math just as quickly, if not *more* quickly.
For the second part of the statement, maybe for a discrete GPU. I'm not as convinced that latencies absolutely "must be high" because we "must buffer lots of commands to keep the GPU pipeline fed" in the long run. I think there is going to be pressure to support much more direct interaction between CPU/GPU on SoCs for power efficiency reasons and in that world it is not at all clear what makes the most sense. Ex. we're going to be pretty close to a world where even a single ~3Ghz Haswell core can generate commands faster than the ~1Ghz GPU frontend can consume them in DX12 already so looping in the GPU frontend is not really a clear win if we end up in a world with significantly lower submission latencies.
It may go in the direction that you're saying and we certainly should pursue both ways, but I disagree that it's a guaranteed endpoint.