Direct3D feature levels discussion

DmitryKo · Jul 5, 2015

Actually MSDN has been updated with a detailed description of GPMMU virtual addressing model in WDDM 2.0...

Alessio1989 · Jul 8, 2015

Catalyst 15.7 released, first WHQL driver for WDDM 2.0,

It also come first with public (

) and official support for VSR and FRTC for more cards (GCN 1.0 7900 series and rebands too...).

Alessio1989 · Jul 9, 2015

OK, GS bypass is still not supported. Move along AMD, I want to defenestrate the need of geometry shaders everywhere is possible.

upnorthsox · Jul 9, 2015

Alessio1989 said:
OK, GS bypass is still not supported. Move along AMD, I want to defenestrate the need of geometry shaders everywhere is possible.

I actually had to google that word to make sure it was real, thank you for my new word of the day.

Alexko · Jul 9, 2015

upnorthsox said:
I actually had to google that word to make sure it was real, thank you for my new word of the day.

It's a great word:

(Clickable image)

sebbbi · Jul 9, 2015

Alessio1989 said:
OK, GS bypass is still not supported. Move along AMD, I want to defenestrate the need of geometry shaders everywhere is possible.

AMD certainly supports GS bypass. OpenGL extension has been available for long time already. AMD has slower geometry shaders than Intel and NVIDIA, increasing the importance of this feature for them. My measurements (on Radeon HD 7970) show a 2.7x performance drop just by enabling the GS stage (simple GS: output = input).

Btw. What is the status of Intel and NVIDIA drivers? All three IHVs have hardware support for GS bypass.

Alessio1989 · Jul 9, 2015

OK, I've asked around and I got the confirm that AMD HW support it. ( :

As for Intel, iGPU HD 4400 of my Surface Pro 3 reports it as supported too. Unfortunately I don't have a D3D12 capable Geforce to query that cap-bit.

Andrew Lauritzen · Jul 10, 2015

Last I checked the caps it was on for NVIDIA Maxwell 2 but off for Maxwell 1. I don't know if there are hardware differences there or whether they just have yet to implement it across all the architectures though. I also didn't test if it actually works yet, etc.

Should be working fine on all Intel parts, although obviously bugs are always possible with new features so let me know if you run into any

sebbbi · Jul 10, 2015

Andrew Lauritzen said:
Last I checked the caps it was on for NVIDIA Maxwell 2 but off for Maxwell 1. I don't know if there are hardware differences there or whether they just have yet to implement it across all the architectures though. I also didn't test if it actually works yet, etc.

There are differences. Maxwell 2 supports a RT bitmask instead of an RT index. This allows Maxwell 2 to replicate the triangle to N viewports if needed. This is super nice for some algorithms.

Andrew Lauritzen said:
Should be working fine on all Intel parts, although obviously bugs are always possible with new features so let me know if you run into any

Intel's geometry shaders are so fast that they are actually usable. Yet another proof of alien technology

Bypass will be better of course, but not as crucial as with other GPUs brands.

homerdog · Jul 10, 2015

Are the any features that Maxwell 1 supports that Kepler doesn't?

Infinisearch · Jul 10, 2015

What is GS bypass? Haven't come across that yet.

sebbbi · Jul 10, 2015

Infinisearch said:
What is GS bypass? Haven't come across that yet.

DX12 calls it VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation

Choose yourself whether you use the OpenGL extension name or the most awesome ever DX12 name. But choose wisely

Infinisearch · Jul 10, 2015

sebbbi said:
Choose yourself whether you use the OpenGL extension name or the most awesome ever DX12 name. But choose wisely

I don't think I'm ready for such responsibility yet sensei. But that is indeed awesome nameage.

Oh and thanks.

3dcgi · Jul 10, 2015

sebbbi said:
AMD certainly supports GS bypass. OpenGL extension has been available for long time already. AMD has slower geometry shaders than Intel and NVIDIA, increasing the importance of this feature for them. My measurements (on Radeon HD 7970) show a 2.7x performance drop just by enabling the GS stage (simple GS: output = input).

Have you tried Tonga or Fiji? There will still be a drop but in a lot of cases they will perform better.

How much does Intel drop in your test?

trinibwoy · Jul 10, 2015

sebbbi said:
DX12 calls it VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation

Choose yourself whether you use the OpenGL extension name or the most awesome ever DX12 name. But choose wisely

Just....wow [emoji46]

sebbbi · Jul 10, 2015

3dcgi said:
Have you tried Tonga or Fiji? There will still be a drop but in a lot of cases they will perform better.

How much does Intel drop in your test?

I have only tested with GCN gen1 (1.0) and gen2 (1.1). Both behave similarly. Our developement computers have high end cards, so no Tongas. That test was done before Fiji (Fury X) existed, and Fury X is still impossible to obtain in Finland.

Kepler GK110 (780 GTX) slow down was 1.5x. That is significantly better than AMD but still useless for shadow map rendering. I could re-run the test with Maxwell when I have time. Same goes for Intel. Xeons do not have GPUs so I need to perform Intel testing and optimizations by borrowing a test laptop.

Secretly I hope that Intel releases a Xeon with 128 MB of EDRAM, top model GPU (Skylale supports FL 12.0) and 8 cores (16 threads). That would be perfect for graphics programmer's workstation. Would make it easy to switch between the integrated GPU and one of the discrete ones, and would make DX12 explicit multiadapter (offload tail of the frame to iGPU) development much easier. Shouldn't be impossible to fit 8 cores + GPU inside the die since there already exists 24 core (48 thread) Broadwell-EX chips.

For Intel GS results, I recommend reading this:
http://www.joshbarczak.com/blog/?p=667

Alessio1989 · Jul 10, 2015

homerdog said:
Are the any features that Maxwell 1 supports that Kepler doesn't?

Typed UAV Loads. Kepler should not support them.

Alessio1989 · Jul 10, 2015

trinibwoy said:
Just....wow [emoji46]

Cause' ViewportAndRenderTargetArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGeometryShaderEmulation was too much verbose... u.u

Andrew Lauritzen · Jul 10, 2015

sebbbi said:
There are differences. Maxwell 2 supports a RT bitmask instead of an RT index. This allows Maxwell 2 to replicate the triangle to N viewports if needed. This is super nice for some algorithms.

Yes Maxwell 2 has stuff that goes above and beyond like the ability to "add" some attributes in GS but pass others through, and as you note the ability to multicast to several viewports. But I still thought that most NVIDIA hardware should be able to support the DX12 feature as-is, not just Maxwell 2, right? Guess it's probably just a driver thing.

Then again if everyone supports it, not sure why we need the cap bit

sebbbi said:
Intel's geometry shaders are so fast that they are actually usable. Yet another proof of alien technology

Pro-tip: if you want something to be fast in Intel hardware, get it into a popular benchmark application

In all seriousness though it's not terribly hard to make ~1:1 GS fast, I just don't think it has been very high priority for AMD/NVIDIA because game devs seem content to just not use it as long as it remains slow. If only that worked for other annoying/complex features too

Anyways with current consoles being what they are I don't anticipate this changing, even though GS does remain the most natural place to do a lot of stuff (including output VP index) and there's no need for it to be slow when it's 1:1.

DmitryKo · Jul 13, 2015

Alessio1989 said:
I am pretty sure that all GPUs now should be able to use UAV across all shader stages, even on FL 11.0 (Tier1 of resource binding requires at least 8 UAV slots across all shader stages). I am not sure why this is not allowed in D3D11 (like through a cap-bit)

DmitryKo said:
this would be quite a non-trivial change, since feature level 11_0 in Direct3D12 would be a superset of the same level in Direct3D11... which brings us back to the question whether "LEVEL_A" etc. would be a better naming scheme for Direct3D 12.

I had the time to take a look through whatever small bits MSDN has to offer on UAVs at every stage in Direct3D 11:
https://msdn.microsoft.com/en-us/li...=vs.85).aspx#use_uavs_at_every_pipeline_stage

I also took a closer look to resource heaps, resource binding and resource descriptors:
https://software.intel.com/en-us/articles/introduction-to-resource-binding-in-microsoft-directx-12
https://msdn.microsoft.com/en-us/library/windows/desktop/dn899109(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/dn770451(v=vs.85).aspx
etc.

I believe it's certain that Kepler/Maxwell do in fact support UAVs in every pipeline stage in Direct3D 12, since UAVs are just standard resource descriptors in a descriptor heap which is shared across all stages by design, and Max McMullen talked about the 64 "slot" limitation on RB Tier 2 in an earlier post, so it's quite easy to guess which vendor he was talking about since there are only three of them.

All the confusion, including the clarifying statement on Guru3D, stems from the fact that UAVs in every stage were tied with increased UAV slot count in Direct3D 11.1 and were not made two separate optional features on feature level 11_0 - and whatever reasons Microsoft had for enforcing this requirement, they do not seem to be valid for current Direct3D 12 hardware anymore....

We just need someone to actually to test it on Kepler/Maxwell-1 to confirm.

Direct3D feature levels discussion

DmitryKo

Alessio1989

Alessio1989

upnorthsox

Alexko

sebbbi

Alessio1989

Andrew Lauritzen

Moderator

sebbbi

homerdog

donator of the year

Infinisearch

sebbbi

Infinisearch

3dcgi

trinibwoy

Meh

sebbbi

Alessio1989

Alessio1989

Andrew Lauritzen

Moderator

DmitryKo