What gives you the idea that Nvidia doesn't work with Microsoft on evolving D3D too?
The recent posts by DavidGraham...
What gives you the idea that Nvidia doesn't work with Microsoft on evolving D3D too?
Obsolete features as in SSOs for everything except maybe fragment shaders. Possible future hardware designs may take even more advantage of PSOs so maybe not even SSOs for fragment shaders as well ...Obsolete features meaning geometry and tessellation shaders? GPL supports those shader stages too. The only real difference seems to be that AMD hardware is optimized for rolling vertex/geometry/tessellation shaders into one combined “primitive shader” stage while Nvidia retained support for dynamically linking those discrete stages at runtime.
It could be that we may still stick with PSOs in the end but there is nearly no industry momentum to have SSOs in it's purest form. Both GPL and SSOs are a lot of work to implement so it'll take some time to see them if at all but I wouldn't count on converging to the latter since there's still quite a bit of inertia against it (everyone NOT Nvidia) ...GPL isn’t any more forward looking than SSO. It simply maps better to AMD hardware.
Nvidia has a beta driver out with SSO support on day one so it seems they’re at least trying to fix the problem. Has any IHV implemented official support for GPL since it launched a year ago? It will be interesting to see where Microsoft takes DX13. Maybe they will drop support altogether for the legacy geometry pipeline.
Still doesn't change the fact that the idea of Intel endorsing SSOs is tone deaf since that's the absolute worst model for them ...It is not "opposite for Nvidia", they do both equally well (they actually do PSOs better than AMD atm if we take the issues with TLOU on PC as a sign of who's doing how in that right now). It is the deficit of non-Nv h/w which should be fixed in h/w - finally, as it's been more than 10 years of IHVs transferring this issue to s/w vendors instead.
And no, this is not "implementing D3D11 in h/w" because it has nothing to do with D3D11 beyond the fact that D3D11 s/w runs better on h/w which has fast state management. If you expose this advantage in a modern API like VK or D3D12 then it suddenly becomes "implementing the new feature of D3D12 in h/w". Sounds quite a bit different, is it?
General rule of thumb is for 2 vendors to support a feature in order for Microsoft to expose an API for it. Effectively this means that the tie-breaker (Either AMD or Intel) holds a filibuster. A Microsoft representative disclosed (timestamp 18:54 and 19:06) that they're mostly on the sidelines watching what happens and how it develops so they have no intention of immediately forcing anyone to implement anything yet ...What gives you the idea that Nvidia doesn't work with Microsoft on evolving D3D too? RT was added in D3D12 as an Nvidia exclusive feature. A bunch of other changes to stuff like RS and such were implemented because the original spec missed some things in Nv h/w making it run worse than it could. How's PSOs any different?
Even if that is true (which I very much doubt it is considering how the h/w is usually made and why APIs are being made after the h/w and not vice versa) you could apply the same logic to other h/w - that AMD's h/w is "closely implementing D3D12 model" - which means that all the issues we have at present with games running through that model are the issues of h/w which should be solved in h/w."Fast state management" is just the end result of hardware closely implementing the software model
Obsolete features as in SSOs for everything except maybe fragment shaders. Possible future hardware designs may take even more advantage of PSOs so maybe not even SSOs for fragment shaders as well ...
A more monolithic approach also translates to a more flexible/general purpose/stateless hardware design. Not including ray tracing, Nvidia has at least 8 different HW shader stages (vertex/hull/domain/geometry/Amplification/Mesh/Pixel/Compute) while AMD only needs 4 HW shader stages (Hull/Geometry/Pixel/Compute) and ray tracing ends up being compute shaders over there for good measure ...
PSOs allow hardware designers to move more state from hardware to programs/software. Hardware features essentially become software features ...The immutable precompiled PSO model has been given a fair shake already and been proven not to work well because of the dynamic nature of runtime game workloads. What’s going to change to make PSOs more viable?
You're conflating performance with flexibility ...If AMD’s hardware is more flexible then it presumably wouldn’t have any problem breaking down the pre-raster pipeline into more discrete steps. It seems that it’s actually less flexible i.e. you “must” treat geometry processing as a single compiled stage to get the most out of the hardware. While it seems Nvidia gives you the option to define geometry processing as either a monolithic stage or multiple decouple stages that can be mixed and matched at runtime. So how is Nvidia’s approach not the more flexible option here?
Essentially on hardware that supports SSO implementing GPL and PSO is trivial. It doesn’t work the other way around. The narrative that SSO is bad but GPL is good doesn’t really make sense. Ideally we would blow all of this up and force everyone to use mesh shaders or do rasterization in compute ala Nanite. It’s currently a mess. Nobody is even using mesh shaders yet.
It makes sense that Microsoft doesn’t want to push api features that IHVs don’t want to support. The end result though is that #stutterstruggle rolls on.
You can do both on h/w which provides "fixed functions" (again I'd call them "accelerators" since they are advantageous in a world where perf/watt matters more than anything else) but can only do these in s/w on h/w which doesn't. Thus the former is more flexible than the latter, not vice versa.Mobile graphics may not have blending units that we're used to seeing on desktop graphics hardware so they implement blending equations by embedding the blend states into a program instead that would normally be used to set specific registers to get the fixed function blending unit into the right equation on desktop hardware. The big implication in my example is that mobile hardware is running a *program* while desktop hardware is running a *finite-state machine*. This ultimately means that mobile hardware can make blending a programmable operation while on desktop hardware it becomes a fixed operation since the former doesn't have a distinct/separate stage for that process in their hardware pipeline and instead becomes a part of the fragment shader. I don't see how any graphics programmers would somehow argue that the hardware in that case is less flexible when they've gained another programmable stage of the graphics pipeline even though more PSOs we're needed because of the permutations caused by the many different states that need to be embedded into the binaries ...
PSOs allow hardware designers to move more state from hardware to programs/software. Hardware features essentially become software features ...
AMD HW specifically can probably implement other unique geometry pipelines with more features/less restrictions in the shaders as well besides the ones we already know like mesh shading or the current traditional geometry pipeline ...
That's assuming that these stages are any different from whatever units the h/w is using under the hood in modern APIs.The nvidia design to me looks like it maintains legacy stages because there’s still a lot of legacy software. People are even still writing dx11 and OpenGL based software. In terms of the mesh and amplification stages, I guess on AMD they’re just Hill/Geometry stages?
Well, consoles don't care about runtime compilation and on their fixed h/w the original PSO model works fine. You could say that it's a "console model" even since they are the best fit for such approach.I've been saying Digital Foundry should be testing CPUs and how quickly they compile shaders in these games with pre-compilation processes for a while now, because it will be an important metric to know for people who are considering specific CPU purchases. It looks as if it will be more applicable in the future than ever before. Though in truth, as newer Sony games will likely take PC in mind from the very beginning, it's likely that these processes can be cut down a fair amount from what they are when PC isn't taken into consideration at all during development. The Detroit Become Human guys touched upon this when porting that game over to PC.. stating that their future game will be much more efficient in how they author shaders, vastly reducing the amount of permutations.
Most games do the compilation process in less than 5 minutes. Most Unreal Engine 4 games which pre-compile the PSOs take like 1-2min.Well, consoles don't care about runtime compilation and on their fixed h/w the original PSO model works fine. You could say that it's a "console model" even since they are the best fit for such approach.
On PC though the model obviously doesn't work well and thus there are already various "enhancements" in place which aim at limiting the amount of PSOs generated by a game as much as possible - this was the topic of the last several pages, and games are pretty much will be forced to use them - or face the hours of shader (re)compilation time eventually.
That's because they either don't generate many PSOs, don't compile them all during the precompilation step or use the options available to limit PSO numbers on PCs - or a mix of all three.Most games do the compilation process in less than 5 minutes. Most Unreal Engine 4 games which pre-compile the PSOs take like 1-2min.
Most people don't have even 16T CPUs and I as an owner of a 24T CPU can't really agree that it's "quite quickly" in cases where it is actually visible.The issue of extremely long shader compilation processes is overblown by a few outliers, like this game and Detroit, for example, and usually by people on mid to low end CPUs. 16/24/32 thread CPUs chew through this stuff quite quickly.
This is likely to never happen.And I disagree with the last point.. because I think long before it ever gets THAT bad, there will be infrastructure in place to augment the shader pre-compilation process by connecting to a server which will throw tons of CPU cores at the issue compiling shaders multitudes of times quicker.
Even according to the link he posted, NVIDIA hardware is more flexible and faster in all modes.Ok but you haven’t addressed the question of why this greater hardware flexibility doesn't accommodate the SSO api (which you seem to think is less flexible).
....18:59 karolherbst: well, unless I missed anything, at least on Nvidia hardware it should be all the same in the end, just that pipelines objects might use more memory?
18:59 gfxstrand: karolherbst: Yes, this is ideal for NVIDIA
19:03 gfxstrand: karolherbst: Yeah, NVIDIA really is the only hardware where all this is easy.
Right... which is what developers can do when they build a game with PC in mind from the start..That's because they either don't generate many PSOs, don't compile them all during the precompilation step or use the options available to limit PSO numbers on PCs - or a mix of all three.
Most people don't have even 16T CPUs and I as an owner of a 24T CPU can't really agree that it's "quite quickly" in cases where it is actually visible.
This is likely to never happen.
Just because it's not working on the application doesn't mean it's not working from a hardware/driver design perspective. PSOs are a little bit more conservative in it's design about making assumptions in how hardware designs work than developers would like but the benefit is performance by safety on more configurations since drivers can resolve state explicitly ahead of time ...Yes but the point is we’ve already tried that and it’s not working.
SSOs just represent one of the many other ways to apply state changes. Straight up there's no flexibility gained or lost with SSOs in comparison to PSOs or GPL in regards to state changes. You can do the same state changes in all three different models. Each design makes their own implications as to how fast/slow these state changes are for the hardware ...Ok but you haven’t addressed the question of why this greater hardware flexibility doesn't accommodate the SSO api (which you seem to think is less flexible). You’re making points from two conflicting perspectives. Either the hardware is flexible enough to support PSO/GPL/SSO or it isn’t.