Shader Compilation on PC: About to become a bigger bottleneck?

It’s not really believable that AMD could scare/coerce Microsoft to do anything. What was their leverage? I assume Mantle gained traction because some very influential developers were clamoring to be unshackled from the heavy handed apis of the day. Mantle offered that promise and Microsoft probably just got caught up in the hype.

Fast forward to today and it’s clear everyone bit off more than they could chew with the Mantle approach. Developers really oversold their ability to do efficient state and memory management in modern games. Lesson learned and now we need to recalibrate.
 
Yep, Microsoft took Mantle "core" as a base of D3D12, it even shares resource model... heck even some whole parts of documentation etc.

It also makes sense when you realized that "similar" API is used on XSX/XSS
 
It’s not really believable that AMD could scare/coerce Microsoft to do anything. What was their leverage? I assume Mantle gained traction because some very influential developers were clamoring to be unshackled from the heavy handed apis of the day. Mantle offered that promise and Microsoft probably just got caught up in the hype.

Fast forward to today and it’s clear everyone bit off more than they could chew with the Mantle approach. Developers really oversold their ability to do efficient state and memory management in modern games. Lesson learned and now we need to recalibrate.

Yeah I just don't get that argument either. The number of games that adopted Mantle was tiny. I remember running some on my Radeon 270X at the time, and frankly it was really hit or miss - one of the problems was increased vram usage (surprise!) which could make it run worse. There were some wins, such as Thief where it really benefited my dual-core CPU at the time, but it also had some odd behavior over the DX11 path to boot. It was more of a curiosity than any real advantage at the time, and I struggle to think what kind of 'threat' MS could see this as. Obviously, D3D12 shares much in common with Mantle, but I don't necessarily see that involvement as coerced based on the power imbalance.

AMD's market penetration related to Nvidia was better in 2017 than today, but that isn't saying much:

1680551805024.png
 
It was more of a curiosity than any real advantage at the time, and I struggle to think what kind of 'threat' MS could see this as.
As the quote above says: "At least in the time that I've been here, it has been quite uncommon - D3D has often published features first."
Mantle changed that and was threatening to make MS leadership in deciding where all the main gaming APIs on PC are evolving a thing of the past, potentially making Windows irrelevant as a gaming platform.
That is a big threat to their core business to which they've had to react fast.
 
Possibly, but then that puts them smack dab back into the region of custom tailored drivers for every new game coming out like the times of DX11. I don't think any manufacturer wants to go back to needig separate driver internal codepaths for every individual popular game.

I'm thinking along the lines of dxvk. You have a translation layer that runs D3D9/10/11 on Vulkan. Or vk-d3dproton translating D3D12 to Vulkan with good performance. They have very good performance. So my thought is, you write D3D or Vulkan, it gets translated to a native API for Nvidia or AMD. Yah, every time the native api changes, the translation layer has to be updated and that's a risk. If someone writes a program to target the native api, you have some backwards compatibility issues to worry about. At the same time, you get real API abstractions that actually map to the hardware so you can get the best out of everything. Probably wouldn't go that way, but right now the situation is you have the two major players with different hardware designs and having a singular API style doesn't map to both of them.
 
I'm thinking along the lines of dxvk. You have a translation layer that runs D3D9/10/11 on Vulkan. Or vk-d3dproton translating D3D12 to Vulkan with good performance. They have very good performance. So my thought is, you write D3D or Vulkan, it gets translated to a native API for Nvidia or AMD. Yah, every time the native api changes, the translation layer has to be updated and that's a risk. If someone writes a program to target the native api, you have some backwards compatibility issues to worry about. At the same time, you get real API abstractions that actually map to the hardware so you can get the best out of everything. Probably wouldn't go that way, but right now the situation is you have the two major players with different hardware designs and having a singular API style doesn't map to both of them.

Drivers essentially translate from the common api to the IHV specific native interfaces today. I think translation works well only if there’s a 1:1 mapping between api features and hardware features. Otherwise you’re going to also need emulation which could be super slow. If a developer wanted to use SSOs in Vulkan for example AMD could theoretically emulate the behavior in a translation layer but it will likely have terrible performance because it doesn’t map well to the hardware.

If AMD and Nvidia were to expose custom apis game developers would need to implement their renderers multiple times for each vendor in order to fully unlock the hardware, similar to consoles. It could probably work for big engines like UE but would be a royal pain for most PC devs.

The current Vulkan model of core api + IHV specific extensions is a good compromise. It gives developers the option to write custom paths if they want to take full advantage of certain hardware but they’re not forced to do it.
 
@trinibwoy I guess maybe it makes more sense for the drivers to just support the various APIs directly, as they do now, but potentially expose a native API the way AMD did with Mantle. I know the fears of having games that only run on particular gpus. Just following a lot of the conversation about a single API extension, it's pretty clear there's a split in the industry about how the API should work and you have people on both sides saying one of the GPU vendors needs to change their hardware to either add or remove state. It seems weird to me to handcuff the hardware based on a consensus model created by an API committee, especially when all sides of the argument complain that the API body has created a mess.
 
Am I understanding it correctly that Nvidia hardware is what needs to change and AMD’s approach is preferable?
 
Am I understanding it correctly that Nvidia hardware is what needs to change and AMD’s approach is preferable?

I don't know if it's that easy. The same person that wrote this blog also wrote this twitter thread.


Her take is basically: "pipelines suck", something like the entity state objects extension is where they'd like to be in an "ideal world." It's easier to implement on Nvidia, a bit of a pain on AMD. Not sure if it's actually going to work out.

But even though it's harder to implement on AMD, maybe they'd still say the direct access model is preferable on the hardware side. Really don't know.
 
I don't know if it's that easy. The same person that wrote this blog also wrote this twitter thread.


Her take is basically: "pipelines suck", something like the entity state objects extension is where they'd like to be in an "ideal world." It's easier to implement on Nvidia, a bit of a pain on AMD. Not sure if it's actually going to work out.

But even though it's harder to implement on AMD, maybe they'd still say the direct access model is preferable on the hardware side. Really don't know.
Do you know which hardware vendor she previously worked for?
 
Her take is basically: "pipelines suck", something like the entity state objects extension is where they'd like to be in an "ideal world." It's easier to implement on Nvidia, a bit of a pain on AMD. Not sure if it's actually going to work out.

But even though it's harder to implement on AMD, maybe they'd still say the direct access model is preferable on the hardware side. Really don't know.

I don’t think the resource binding / descriptor post is related to the SSO/GPL pipeline issue. They’re related but separate topics.

DX12 defines shader resources (textures, constants, buffers etc) as a heap which is basically a contiguous array of pointers to those resources. It seems Nvidia hardware is built around the same model where resources are accessed by going through the heap. According to the same post AMD hardware is more flexible and supports direct access to the resources anywhere in memory without needing the intermediate heap reference. In this case you can argue the DX12 binding model was built around Nvidia hardware as it doesn’t expose AMD’s direct access capability.
 
Am I understanding it correctly that Nvidia hardware is what needs to change and AMD’s approach is preferable?
Nvidia h/w may change (not a big problem right now as it works fine as is) to be less restrictive in its descriptor heaps but that doesn't mean that AMD h/w shouldn't change too to handle state changes without huge bubbles. These issues aren't directly connected.
 
I don’t think the resource binding / descriptor post is related to the SSO/GPL pipeline issue. They’re related but separate topics.

DX12 defines shader resources (textures, constants, buffers etc) as a heap which is basically a contiguous array of pointers to those resources. It seems Nvidia hardware is built around the same model where resources are accessed by going through the heap. According to the same post AMD hardware is more flexible and supports direct access to the resources anywhere in memory without needing the intermediate heap reference. In this case you can argue the DX12 binding model was built around Nvidia hardware as it doesn’t expose AMD’s direct access capability.
Yes and no, some details explained in the following ...

Yes since D3D12 tries to match the resource limits on Nvidia hardware. One of Nvidia's texture binding model consists of a pointer to a 32-bit UINT index into a table where the 12-bits are assigned to the samplers and 20-bits for the image. A shader resource view can either be a buffer or a texture. The former fact isn't interesting since but the later has shows implications in D3D12's resource binding model. The 10^6 SRV limit on resource binding tier 2/3 matches 20^2 - 48K (driver internal use) hence the derived SRV limit. The sampler limit is similarly derived from Nvidia hardware as well. We have 4096 slots in total split in the following, 2032 of them are allocated for static samplers, 2048 dynamic samplers which is mentioned in resource binding tier 2/3, and we have 16 for internal driver use ...

No because D3D12 exposes bindless constant buffers which is a slow path on Nvidia hardware and developers use bindless constant buffers since root descriptor constant buffers didn't initially support bounds checking. Bindless textures on Nvidia are designed for usage with combined image/sampler as single unified bindings exactly as in OpenGL (optional in Vulkan) while on Direct3D 10+, images and samplers have been separate bindings. Also developers like modifying descriptors and reinterpreting them so as an example SRVs descriptors can change into CBV/UAV/Samplers descriptors and vice versa. Literally any descriptor type can mutate into any of those following descriptor types. Vulkan this feature is known as mutable descriptor types and Mantle offers the same functionality with "generic" descriptors ...
 
Back
Top