Direct3D feature levels discussion

OK, GS bypass is still not supported. Move along AMD, I want to defenestrate the need of geometry shaders everywhere is possible.
 
OK, GS bypass is still not supported. Move along AMD, I want to defenestrate the need of geometry shaders everywhere is possible.
AMD certainly supports GS bypass. OpenGL extension has been available for long time already. AMD has slower geometry shaders than Intel and NVIDIA, increasing the importance of this feature for them. My measurements (on Radeon HD 7970) show a 2.7x performance drop just by enabling the GS stage (simple GS: output = input).

Btw. What is the status of Intel and NVIDIA drivers? All three IHVs have hardware support for GS bypass.
 
OK, I've asked around and I got the confirm that AMD HW support it. ( :

As for Intel, iGPU HD 4400 of my Surface Pro 3 reports it as supported too. Unfortunately I don't have a D3D12 capable Geforce to query that cap-bit.
 
Last I checked the caps it was on for NVIDIA Maxwell 2 but off for Maxwell 1. I don't know if there are hardware differences there or whether they just have yet to implement it across all the architectures though. I also didn't test if it actually works yet, etc.

Should be working fine on all Intel parts, although obviously bugs are always possible with new features so let me know if you run into any :)
 
Last I checked the caps it was on for NVIDIA Maxwell 2 but off for Maxwell 1. I don't know if there are hardware differences there or whether they just have yet to implement it across all the architectures though. I also didn't test if it actually works yet, etc.
There are differences. Maxwell 2 supports a RT bitmask instead of an RT index. This allows Maxwell 2 to replicate the triangle to N viewports if needed. This is super nice for some algorithms.
Should be working fine on all Intel parts, although obviously bugs are always possible with new features so let me know if you run into any :)
Intel's geometry shaders are so fast that they are actually usable. Yet another proof of alien technology :D
Bypass will be better of course, but not as crucial as with other GPUs brands.
 
AMD certainly supports GS bypass. OpenGL extension has been available for long time already. AMD has slower geometry shaders than Intel and NVIDIA, increasing the importance of this feature for them. My measurements (on Radeon HD 7970) show a 2.7x performance drop just by enabling the GS stage (simple GS: output = input).
Have you tried Tonga or Fiji? There will still be a drop but in a lot of cases they will perform better.

How much does Intel drop in your test?
 
Have you tried Tonga or Fiji? There will still be a drop but in a lot of cases they will perform better.

How much does Intel drop in your test?
I have only tested with GCN gen1 (1.0) and gen2 (1.1). Both behave similarly. Our developement computers have high end cards, so no Tongas. That test was done before Fiji (Fury X) existed, and Fury X is still impossible to obtain in Finland.

Kepler GK110 (780 GTX) slow down was 1.5x. That is significantly better than AMD but still useless for shadow map rendering. I could re-run the test with Maxwell when I have time. Same goes for Intel. Xeons do not have GPUs so I need to perform Intel testing and optimizations by borrowing a test laptop.

Secretly I hope that Intel releases a Xeon with 128 MB of EDRAM, top model GPU (Skylale supports FL 12.0) and 8 cores (16 threads). That would be perfect for graphics programmer's workstation. Would make it easy to switch between the integrated GPU and one of the discrete ones, and would make DX12 explicit multiadapter (offload tail of the frame to iGPU) development much easier. Shouldn't be impossible to fit 8 cores + GPU inside the die since there already exists 24 core (48 thread) Broadwell-EX chips.

For Intel GS results, I recommend reading this:
http://www.joshbarczak.com/blog/?p=667
 
Last edited:
There are differences. Maxwell 2 supports a RT bitmask instead of an RT index. This allows Maxwell 2 to replicate the triangle to N viewports if needed. This is super nice for some algorithms.
Yes Maxwell 2 has stuff that goes above and beyond like the ability to "add" some attributes in GS but pass others through, and as you note the ability to multicast to several viewports. But I still thought that most NVIDIA hardware should be able to support the DX12 feature as-is, not just Maxwell 2, right? Guess it's probably just a driver thing.

Then again if everyone supports it, not sure why we need the cap bit ;)

Intel's geometry shaders are so fast that they are actually usable. Yet another proof of alien technology :D
Pro-tip: if you want something to be fast in Intel hardware, get it into a popular benchmark application ;)

In all seriousness though it's not terribly hard to make ~1:1 GS fast, I just don't think it has been very high priority for AMD/NVIDIA because game devs seem content to just not use it as long as it remains slow. If only that worked for other annoying/complex features too ;) Anyways with current consoles being what they are I don't anticipate this changing, even though GS does remain the most natural place to do a lot of stuff (including output VP index) and there's no need for it to be slow when it's 1:1.
 
I am pretty sure that all GPUs now should be able to use UAV across all shader stages, even on FL 11.0 (Tier1 of resource binding requires at least 8 UAV slots across all shader stages). I am not sure why this is not allowed in D3D11 (like through a cap-bit)

this would be quite a non-trivial change, since feature level 11_0 in Direct3D12 would be a superset of the same level in Direct3D11... which brings us back to the question whether "LEVEL_A" etc. would be a better naming scheme for Direct3D 12.

I had the time to take a look through whatever small bits MSDN has to offer on UAVs at every stage in Direct3D 11:
https://msdn.microsoft.com/en-us/li...=vs.85).aspx#use_uavs_at_every_pipeline_stage

I also took a closer look to resource heaps, resource binding and resource descriptors:
https://software.intel.com/en-us/articles/introduction-to-resource-binding-in-microsoft-directx-12
https://msdn.microsoft.com/en-us/library/windows/desktop/dn899109(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/dn770451(v=vs.85).aspx
etc.



I believe it's certain that Kepler/Maxwell do in fact support UAVs in every pipeline stage in Direct3D 12, since UAVs are just standard resource descriptors in a descriptor heap which is shared across all stages by design, and Max McMullen talked about the 64 "slot" limitation on RB Tier 2 in an earlier post, so it's quite easy to guess which vendor he was talking about since there are only three of them.

All the confusion, including the clarifying statement on Guru3D, stems from the fact that UAVs in every stage were tied with increased UAV slot count in Direct3D 11.1 and were not made two separate optional features on feature level 11_0 - and whatever reasons Microsoft had for enforcing this requirement, they do not seem to be valid for current Direct3D 12 hardware anymore....


We just need someone to actually to test it on Kepler/Maxwell-1 to confirm.
 
Last edited:
Back
Top