DX12 Performance Discussion And Analysis Thread

That would be the case for the results we're getting with MDolenc's tests, yes. (BTW that's a "highway lanes analogy, not a cars analogy ;) )

Basically, if (graphics+compute) time = (graphics time) + (compute time), then at least with this code the hardware isn't running Async Compute.
And that's what we're seeing with both Kepler+Maxwell 1 (which do not support Async Compute by nVidia's own spec) and Maxwell 2.

As far as I can see, there are 3 very odd things with the results so far:

1 - Maxwell 2 isn't doing Async Compute in this test. Pretty much all results are showing that.
Razor1 pointed to someone with two Titan Xs being seemingly able to do Async but it seems the driver is just cleverly sending the render to one card and the compute to another (which for PhysX is actually something that you could toggle in the driver since G80, so the capability was been there for many years). Of course, if you're using two Maxwell cards for SLI in the typical Alternate Frame Rendering mode, this "feature" will be useless because both cards are rendering. The same thing will happen for a VR implementation where each card is rendering each eye.

2 - Forcing "no Async" in the test (single command queue) is making nVidia chips to serialize everything. This means that the last test with rendering + 512 kernels will take the Render-time + 512x(Compute-time of 1 kernel). That's why the test times end up ballooning, which eventually crashes the display driver.


3 - Forcing "no Async" is making GCN 1.1 chips doing some very weird stuff (perhaps the driver is recognizing a pattern and skipping some calculations as suggested before?). GCN 1.0 like Tahiti in the 7950 is behaving like it "should": (compute[n] + render) time = compute[n] time + render time.

MDolenc, any thoughts? Are we interpreting functional Async Compute correctly?
 
doesn't matter what the performance difference is if there is even a small amount doing async code, it is functional

Yeah, except it's not functional at all:

3NrhGRo.png




There's an almost constant step-up between the blue and the red lines, and that step is almost always equal to the constant value of the green line. This means the GPU is doing rendering + context switching + compute task.
There's no Async Compute happening on the hardware level at all.
 
doesn't matter what the performance difference is if there is even a small amount doing async code, it is functional, its not about the end performance vs the different IHV's, its about is it capable or not, and it is capable. Serial path should always be the same or higher than doing it asynchronously if the variables are the same and if the processor is being tasked enough.

Allright, but i am a little bit confused, are theese tests showing that nvidia has poor performance while performing with async compute enabled? If thats the case, then it could be an implementation issue instead a hardware one right?

PS: ¡Que bien! ¡Si hay Shaders Asincronos en la casa!
 
why don't you explain then why the difference is there, on all maxwell 2 cards in favor of async compute vs serial, why is it always faster by a few microseconds. I have looked at majority of the reports too. You said it compile the date and see.

The users did report a major spike in CPU usage on NV when it tries to do compute + graphics asynchronously, corresponding to a drop in GPU usage.

My theory?

NV is emulating Async Compute support, pushing some compute to be processed on the CPU, giving a very small % improvement (probably within the margin of error, the ms deltas are so low). When the single command list mode is push through with all the kernels at once, it saturates the CPU, causing a major stall (spikes into 2,000ms++) and system crash.

That is actually related to what Oxide is saying, there is a conflict with the software emulation causing a major spike in CPU usage.

"unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs."

http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1400#post_24360916

 
there ARE specific instances of +/- faster/slower in async mode
Which could easily be explained by the driver detecting a small GPU load and reducing the clocks.


Allright, but i am a little bit confused, are theese tests showing that nvidia has poor performance while performing with async compute enabled?

No. MDolenc's test has been unable to activate Async Compute in any form or shape using any nVidia card so far.
 
Allright, but i am a little bit confused, are theese tests showing that nvidia has poor performance while performing with async compute enabled? If thats the case, then it could be an implementation issue instead a hardware one right?

PS: ¡Que bien! ¡Si hay Shaders Asincronos en la casa!


LOL

Yes that is my understanding,

RedditUserB,

I dont' think the oxide dev is being forth coming, he is unwilling to share any hard data on how they reached that conclusion from their end, but he seems to be free to talk about it in generalizations which is not being transparent at all. Have you watched politicians talk, this is how he is talking, giving out this is what might be happening, if he is a dev at Oxide he should be able to tell us exactly what is going on in his program, with a definitive stance and data to back his comments up. He doesn't need to share source code, just psuedo code is enough, and profiler data, is that too much to ask for, when he is saying what he is saying?
 
I dont' think the oxide dev is being forth coming, he is unwilling to share any hard data

Wow, it's almost like that Oxide developer lives in this weird world where employees of a company have to sign NDAs and all that he states publicly must go through his higher-ups and/or the legal department first.

How very strange.
I'm shocked.
 
No. MDolenc's test has been unable to activate Async Compute in any form or shape using any nVidia card so far.

p p p p pero... mis shaders Asincronos! Nooo!

Seriously, theres like 5 people stating the results show it works on nvidia... and some other people stating otherwise... Any easy explanation for a high level programmer *who doesnt work with graphic programming atm, but will be interested to do so in the future btw.
 
after making the statements as he did
A) he is already in hot water
B) he already got consent to say somethings not others, this is why he is being vague, which points to more to C
C) he is not part of oxide

This is why when I read what he posted, I said he shouldn't have talked about it in the manner he did.

This is also the reason why we never see developers in the industry come out and say things like this without definitively showing why, and any industry for that matter.
 
Last edited:
I think it's pretty clear that, while some operations *may* be happening asynchronously on Maxwell 2, and while overall it has less latency than GCN, GCN gains less latency as load increases. This tells me that either this test program is flawed for nvidia GPUs, or the driver isn't doing something right, or Maxwell 2 just can't do async compute as well or at all compared to GCN.

My results (Sapphire r9-270x 4G Dual-X):
Compute only:
1. 26.08ms [25.75]
Graphics only:
49.71ms (33.75G pixels/s)
Graphics + compute:
1. 49.70ms (33.75G pixels/s) [26.13] {0.00 G pixels/s}
Graphics, compute single commandlist:
1. 75.48ms (22.23G pixels/s) [25.74] {33.88 G pixels/s}

For whatever reason (maybe I'm just blind), there's no summary for compute only like the other ones, so I just picked the first result for my post. Can't upload here for some reason--uploaded here instead.
 
after making the statements as he did
A) he is already in hot water
B) he already got consent to say somethings not others, this is why he is being vague, which points to more to C
C) he is not part of oxide

I emailed oxide when this started, Collock is legit.
*I still think he *Oxide in general) act as AMD spokesmen, they evade the fact that the fury X is a total fail at the tests *even with async* , which puts amd even worse than nvidia, just saying (No fanboyism or anything, i love my hd5870).
Also, AMD Robert is just doing hi typicall PR stuff (Remember the 4gb amd campagin? that was some FUNNY stuff.
 
p p p p pero... mis shaders Asincronos! Nooo!

Seriously, theres like 5 people stating the results show it works on nvidia... and some other people stating otherwise... Any easy explanation for a high level programmer *who doesnt work with graphic programming atm, but will be interested to do so in the future btw.
How hard can it be to see? Graphics+Compute time on NVIDIA = Graphics time + Compute time, they're done separately after each other, no asynchronous work done, simple as that. If there was, Graphics+Compute time would be clearly under Graphics time + Compute time
 
Also, something I noticed: the only time the bars overlap for Maxwell 2 is when compute takes more time than usual (or else the tops of the bars would drop). Definitely weird.
 
I think a clear indicator of async compute is when the tops and bottoms of the bars move together, rather than just the top or bottom. You can see this clearly on GCN cards (particularly the Fury X), where the bar drops completely into the blue when it's doing async, rather than just extending the bottom into the blue. Obviously, something wrong is happening in the Fury X, because it's not doing async continuously, but that does make it easier to illustrate.

Sorry for the multi-posts...would edit if I could.
 
There's something really wrong with GCN alltogether in this test. Compute times are just horrible, and GPU usage is way too low (max 10% under compute). Well granted it's not benchmark made for pure performance.
 
Now, all in all, how does this affect a gtx 980Ti? (Objectively speaking) Trying to decide ebtween fury X and this for longevity...

I'd wait another year or buy a middle end part to tide yourself over. Fury X has issues speedwise compared to the 980 Ti/Titan X and likely not enough VRAM for later games, whereas with the 980 Ti, we have with regards to Async Compute what can best be described as a clusterfuck.

They're both inoptimal in not so nice ways for long term(Thinking 5 years) usage.
 
Test ended with a crash on my 970GTX, is that normal? Just got a notification of official Windows 10 drivers being released, so installing those now.
 

Attachments

  • perf.zip
    206 KB · Views: 2
Back
Top