Sure but as with many such discussions in the past (we don't want to do dynamic branching on GPUs as that just encourages wasted lanes; we don't want to add caches to GPUs because it's better to prefetch or handle with occupancy; we don't want to do raytracing on GPUs because that is fundamentally divergent, bindless and random memory access, I can keep going...) I think we as an industry are just increasingly being in denial about how this hardware is and will be used to produce the next generation of graphics.I still somewhat disagree with the idea giving graphics programmers the ability to do generalized indirect shader dispatches since that will just encourage spilling.
Regular computation with predictable memory access patterns is great... until it isn't. Much of graphics is finding the right balance between regularity and efficient scaling of course, but as we move into the next phase of global queries and tons of user-space acceleration structures the architectures simply have to evolve or they will get crushed by inefficiency. There's obviously some push and pull, but ultimately if you want stuff like GI, you need a significant amount of what we would typically call "inefficient", irregular and often not-particularly-data-parallel computation. I hesitate to invoke the wrath of the "we just want texture mapped polygons at NATIVE 8k" crowd, but I think most would agree that a better use of the transistors at this point is to do stuff like RT, Nanite, Lumen, etc. But of course that stuff is not usually primarily limited by raw SIMD flops... to make it faster you need more robust hardware that doesn't fall on its face when you hit that one shader in the scene that needs a bazillion registers, or need to spawn some additional work on the GPU and so on.
Possibly, but before they did it all the IHVs have been saying it is impossible for over a decade (and I say this from personal experience at an IHV), like they did a decade before that with raytracing. I don't expect version 1 of either to be amazing, but we have an existence proof now and the needle can continue to shift.If anyone else had attempted to pull off a similar move in a competitive environment such as desktop or mobile graphics space, they'll either sink (hardware complexity/unsatisfactory performance) or swim (apps start taking advantage of the feature). Even Apple's dynamic register caching solution has limits where there's a *specific threshold* that just enough spilling will start cratering their performance.
I think the pull towards more generally robust performant hardware is undeniable at this point, at least on the high end. We're getting nowhere near the peak flops of these machines in complex workloads, so the solution is clearly not to just keep laying down more ALUs and shipping the next SKU. Some decent chunk of the transistors and logic needs to go into making the current hardware run more efficiently.
The only real pull back in the other direction at this point is ML to be honest. And hell maybe we end up in a world where some significant chunk of rendering is just a giant, regular ML matrix multiply and it's enough more power efficient that it doesn't matter if it's using way more theoretical FLOPS to do it. That said, there's also some indication that the ML stuff needs to get pulled back a bit in the other direction, exposing the ability to do finer-grained small kernels in-line with more general compute rather than monolithic stop-the-world dispatches.
As always, it'll be interesting to see where this all goes, but it's clearly not sustainable to keep shipping literally gigabytes of shader permutations due to fear about GPU performance if they are asked to do something so complicated as a function call... It's bad all the way from multi-hour/day developer game cook times to end users dealing with large downloads and last-second JITs.
I think most of the IHVs would agree on this point, but for good or for bad I think it was important for Microsoft to be able to tell people at the time that they should just upgrade to Windows 10 without worrying about buying a new PC to get all these benefits. The fact that existing hardware like Haswell IGP had to be supported (gotta be able claim some high % of systems can upgrade to DX12) is certainly constraining to what you'd really want to do with APIs. On the minor upside, prototyping on hardware that exists did help avoid some of the other pitfalls of previous API versions.I don't think supporting GPU's that were not released as DX12 GPU's helped things.
Microsoft avoided some of the worst of the legacy issues with things like the feature levels, heap tiers and now "DX12 Ultimate", but there's certainly a few areas of the API that could be simpler if it wasn't for that initial hardware. That said, I will say that once you start digging into the details of state management for PSOs it becomes pretty clear that there's a lot more divergence between hardware than you might think, especially if you ever want to include mobile hardware. Vulkan has certainly fallen into its own additional pitfalls on that front.