DX12 Performance Discussion And Analysis Thread

fellix · Jun 24, 2016

I can't get the latency reading below 12ms.

huebie · Jun 25, 2016

Did i misinterpret something or is the status quo that AC works fine on maxwell? What have changed so far? I had lost track of this discussion so a short summary would be too kind.

Davros · Jun 27, 2016

Hope this hasnt been already posted

Kaotik · Jun 27, 2016

huebie said:
Did i misinterpret something or is the status quo that AC works fine on maxwell? What have changed so far? I had lost track of this discussion so a short summary would be too kind.

Apparently the current situation is "yes and it's a disaster"
Yes: It can run concurrent graphics & compute tasks
It's a disaster: You partition the GPU resources, let's say you do 50/50 for graphics and compute, and your graphics task finishes before compute task, and you have 50% of your resources idling and waiting for the compute to finish up, you can't change that on the fly, and instead have to do expensive context switch to change the partitioning

sebbbi · Jun 28, 2016

Kaarlisk said:
In case anybody find this of interest, ran it on a GT2 Haswell (4210U).

Could you also post your "Latency (compute starts 10ms after graphics)" results. I am really interested in that particular case on Intel GPUs.

CarstenS · Jun 28, 2016

Here's HD530 (def. clocks, but DDR4-3000) in an overclocked i7-6700K.

CarstenS · Jun 28, 2016

And since I know no one else will bother, here's GTX 580. Yep, _5_80.
BTW - MDolenc: Does your program require 64-Bit windows or a certain amount of dedicated video memory? It does not run on my cheap-ass tablet with Atom Z3735G and x86-W10.

MDolenc · Jun 28, 2016

It's 64bit yes and will require at least 128MB video memory. Let me know if you want to check that one out.

But it would probably be better to shorten the graphics part for that one, it takes half a second per run on not so slow integrated Intel GPUs.

CarstenS · Jun 28, 2016

Na, it's ok. No one's really interested in that 4-EU-crap anyway I guess. It displays the browser window alright and shows some YT vids.

Alessio1989 · Jun 28, 2016

MDolenc said:
I checked this today real quick. Added a new case to the sample, so scenario is a bit different:
- main queue renders to offscreen target (no buffering of frames, trivial VS/PS) - 128 draws.
- there's a high priority queue that executes a compute kernel after 10ms delay.
Seems to work on GCN only. That is on 380X graphics finishes in 70ms and compute in 1.5ms. Reaction time (from issue to completion signal on high priority queue - average kernel runtime) seems to be in 0.2-0.5ms-ish range. Checked on Maxwell and HD 4600 and in both cases high priority queue only kicked in after graphics queue was done.

It would be interesting to see how the high priority queue will react on GCN Gen 4 GPUs: both latency and total rendering time. PS: NDA for RX 480 ends tomorrow

Ext3h · Jun 28, 2016

Alessio1989 said:
It would be interesting to see how the high priority queue will react on GCN Gen 4 GPUs: both latency and total rendering time.

While at it, Fiji should be tested again as well. And then compare 480 results against Fury with recent driver.

Dygaza · Jun 28, 2016

Here are stock Fury-X results for comparision as requested. A lot of variation in latency test. I ran it twice and both were the same (browsers closed).

lanek · Jun 28, 2016

AMD HD7970 (1050mhz) ( not quite sure aboutt the results as i have 2 installed ( but CFX was disabled )

Kaarlisk · Jun 28, 2016

sebbbi said:
Could you also post your "Latency (compute starts 10ms after graphics)" results. I am really interested in that particular case on Intel GPUs.

OOps. That was not intentional.

sebbbi · Jun 29, 2016

CarstenS said:
Here's HD530 (def. clocks, but DDR4-3000) in an overclocked i7-6700K.

Kaarlisk said:
OOps. That was not intentional.

600+ milliseconds of waiting = no high prio queue support at all (queue it after the GPU is idle). Ouch!

High priority queues are definitely NOT working properly on Intel GPUs. Intel has UMA and shared caches making low latency GPGPU a perfect use case. Too bad the latency completely tanks when the GPU is rendering at the same time.

Of course there's also a possibility to use the Intel iGPU solely for gameplay GPGPU tasks and discrete GPU solely for rendering. DX12 explicit multiadapter makes this possible. But if you are using the same shared scene data structures in the GPGPU code and in the rendering code, you need to duplicate them and maintain the state of both (copy modifications between the memory pools). Makes things complicated. And even this doesn't solve the case where the consumer only has an Intel iGPU or if he/she has a 6+ core Xeon/i7 (no iGPU, only discrete).

It seems that high prio compute queues are NOT yet ready for shipping games. AMD has been ready since 2011. Nvidia is now ready with Maxwell and Pascal (Maxwell suffers some penalty, but gets the job done). Hopefully Intel could fix their high prio queues with a new driver (to match Maxwell's functionality). People seem to be blinded by concurrent execution. It is solely a GPU performance gain. High priority queues on the other hand enable games to offload game logic to the GPU, allowing completely new gameplay. Some modern console games do this already, making it hard to port them to PC.

Alessio1989 · Jun 29, 2016

GCN Gen 4 should provide better support for high priority compute queue. I guess the same feature will be on the next iteration of Microsoft and Sony consoles (though I hope in a new rasterizer too!). It would be a lot interesting to see how much important will become such feature. But yes, we are still far from having three priority options on engine queues (actually D3D12 exposes only two priority-value, normal and high. Low/background priority is missing).

Ext3h · Jun 29, 2016

Alessio1989 said:
GCN Gen 4 should provide better support for high priority compute queue.

In fact, that part is only software. GCN3, especially Fiji, *used to* have worse support than it does now. With a recent driver, Fiji uses an entirely different MEC firmware, feature-wise very similar to what is known about Polaris. I'm not exactly sure about Tonga, I can't remember whether Tonga already had sufficient memory for the full MEC firmware, or only a cut down version.

lanek · Jun 29, 2016

Ext3h said:
In fact, that part is only software. GCN3, especially Fiji, *used to* have worse support than it does now. With a recent driver, Fiji uses an entirely different MEC firmware, feature-wise very similar to what is known about Polaris. I'm not exactly sure about Tonga, I can't remember whether Tonga already had sufficient memory for the full MEC firmware, or only a cut down version.

I see on polaris that Hws feature can be updated via micro code, but i can imagine it was allready the case before.

CarstenS · Jun 29, 2016

It was.

Ext3h · Jun 29, 2016

lanek said:
I see on polaris that Hws feature can be updated via micro code, but i can imagine it was allready the case before.

It was. However the maximum possible size of the micro code differs significantly. Only since Fiji, there is sufficient memory available to pack all the desired functionality into a single firmware image.

DX12 Performance Discussion And Analysis Thread

fellix

huebie

Davros

Kaotik

Drunk Member

sebbbi

CarstenS

Moderator

Attachments

CarstenS

Moderator

Attachments

MDolenc

CarstenS

Moderator

Alessio1989

Ext3h

Dygaza

Attachments

lanek

Attachments

Kaarlisk

Attachments

sebbbi

Alessio1989

Ext3h

lanek

CarstenS

Moderator

Ext3h

Similar threads