DX12 Performance Discussion And Analysis Thread

Ahh, that's the old GPCBenchmark. Carsten, can you post some numbers from the local memory sub-test with Fiji and Hawaii?

By the way, Fiji doubles the L2 size because of the doubled count of the memory controllers -- 32*64KB partitions = 2048KB, the bandwidth should also scale proportionally.
 
Ahh, that's the old GPCBenchmark. Carsten, can you post some numbers from the local memory sub-test with Fiji and Hawaii?
That test IMHO is quite erratic, so take this with another extra dose of salt. Erratic in the sense that the results can vary a couple of hundred GB/s from run to run. I took the best out of ~10 tries, so here you go.
0Cns1F1.png


By the way, Fiji doubles the L2 size because of the doubled count of the memory controllers -- 32*64KB partitions = 2048KB, the bandwidth should also scale proportionally.
The Fiji block diagram seems to imply otherwise:
http://www.hotchips.org/wp-content/...-GPU-Epub/HC27.25.520-Fury-Macri-AMD-GPU2.pdf

--
@Jawed: Good input. If no one beats me to it, as soon as i can find the time. :)
 
That test IMHO is quite erratic, so take this with another extra dose of salt. Erratic in the sense that the results can vary a couple of hundred GB/s from run to run. I took the best out of ~10 tries, so here you go.
Thanks.
Indeed, it is erratic. I noticed that the application doesn't trigger the highest P-state or boost clock on my 980Ti. I have to find a way to run the tests with power management off somehow.
 
Yep, that's a problem with the Geforce cards. Radeons react quickly enough though for the very short duration of each test run.
 
Thanks.
Indeed, it is erratic. I noticed that the application doesn't trigger the highest P-state or boost clock on my 980Ti. I have to find a way to run the tests with power management off somehow.

You can always flash a custom bios :p

That's what i did to keep the voltage stable under 3D load (and effectively "disable" GPU boost) because the way Nvidia have it set up created big fluctuations in games where the card wasn't being pushed enough and i was getting driver crashes.
 
I think they touch on something that is interesting regarding AMD and that is the GPU memory management; how this diverges from the Fury range using HBM and the lower cards with greater memory albeit GDDR5.
I assume developers need to consider as part of their optimisation how to handle the dynamic memory solution with the Fury range in a more aggressive way, and the approach from the lower cards that are not as bandwidth efficient but benefit from extra memory.

Cheers
 
From the Quantum Break presentation
dx12hmuxk.png

DX11 drivers are able to circumvent HW pitfalls. We’re matching DX11 GPU perf on Maxwell + AMD.
CPU perf: Sure DX12 can be much faster, but if your engine design is such that you don’t swamp the API with draw calls, the actual API overhead might not be significant in your overall CPU cost. We saved ~10% overall renderer time

Full presentation here: Developing The Northlight Engine: Lessons Learned
 
Last edited:
Shader Model 6? The shader model shipped with DirectX 12 is 5.1 which is essentially SM 5.0 + direct resource indexing (oh, and Root Signature via HLSL).. Yeah, we do not have a true new shader model since SM 4.0...
 
Back
Top