DX12 Performance Discussion And Analysis Thread

RedditUserB · Sep 2, 2015

ToTTenTranz said:
That would be the case for the results we're getting with MDolenc's tests, yes. (BTW that's a "highway lanes analogy, not a cars analogy )

Basically, if (graphics+compute) time = (graphics time) + (compute time), then at least with this code the hardware isn't running Async Compute.
And that's what we're seeing with both Kepler+Maxwell 1 (which do not support Async Compute by nVidia's own spec) and Maxwell 2.

As far as I can see, there are 3 very odd things with the results so far:

1 - Maxwell 2 isn't doing Async Compute in this test. Pretty much all results are showing that.
Razor1 pointed to someone with two Titan Xs being seemingly able to do Async but it seems the driver is just cleverly sending the render to one card and the compute to another (which for PhysX is actually something that you could toggle in the driver since G80, so the capability was been there for many years). Of course, if you're using two Maxwell cards for SLI in the typical Alternate Frame Rendering mode, this "feature" will be useless because both cards are rendering. The same thing will happen for a VR implementation where each card is rendering each eye.

2 - Forcing "no Async" in the test (single command queue) is making nVidia chips to serialize everything. This means that the last test with rendering + 512 kernels will take the Render-time + 512x(Compute-time of 1 kernel). That's why the test times end up ballooning, which eventually crashes the display driver.

3 - Forcing "no Async" is making GCN 1.1 chips doing some very weird stuff (perhaps the driver is recognizing a pattern and skipping some calculations as suggested before?). GCN 1.0 like Tahiti in the 7950 is behaving like it "should": (compute[n] + render) time = compute[n] time + render time.

MDolenc, any thoughts? Are we interpreting functional Async Compute correctly?

Deleted member 13524 · Sep 2, 2015

Razor1 said:
doesn't matter what the performance difference is if there is even a small amount doing async code, it is functional

Yeah, except it's not functional at all:

There's an almost constant step-up between the blue and the red lines, and that step is almost always equal to the constant value of the green line. This means the GPU is doing rendering + context switching + compute task.
There's no Async Compute happening on the hardware level at all.

CasellasAbdala · Sep 2, 2015

Razor1 said:
doesn't matter what the performance difference is if there is even a small amount doing async code, it is functional, its not about the end performance vs the different IHV's, its about is it capable or not, and it is capable. Serial path should always be the same or higher than doing it asynchronously if the variables are the same and if the processor is being tasked enough.

Allright, but i am a little bit confused, are theese tests showing that nvidia has poor performance while performing with async compute enabled? If thats the case, then it could be an implementation issue instead a hardware one right?

PS: ¡Que bien! ¡Si hay Shaders Asincronos en la casa!

RedditUserB · Sep 2, 2015

Razor1 said:
why don't you explain then why the difference is there, on all maxwell 2 cards in favor of async compute vs serial, why is it always faster by a few microseconds. I have looked at majority of the reports too. You said it compile the date and see.

The users did report a major spike in CPU usage on NV when it tries to do compute + graphics asynchronously, corresponding to a drop in GPU usage.

My theory?

NV is emulating Async Compute support, pushing some compute to be processed on the CPU, giving a very small % improvement (probably within the margin of error, the ms deltas are so low). When the single command list mode is push through with all the kernels at once, it saturates the CPU, causing a major stall (spikes into 2,000ms++) and system crash.

That is actually related to what Oxide is saying, there is a conflict with the software emulation causing a major spike in CPU usage.

"unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs."

http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1400#post_24360916

Deleted member 13524 · Sep 2, 2015

RedditUserB said:
there ARE specific instances of +/- faster/slower in async mode

Which could easily be explained by the driver detecting a small GPU load and reducing the clocks.

CasellasAbdala said:
Allright, but i am a little bit confused, are theese tests showing that nvidia has poor performance while performing with async compute enabled?

No. MDolenc's test has been unable to activate Async Compute in any form or shape using any nVidia card so far.

Razor1 · Sep 2, 2015

CasellasAbdala said:
Allright, but i am a little bit confused, are theese tests showing that nvidia has poor performance while performing with async compute enabled? If thats the case, then it could be an implementation issue instead a hardware one right?

PS: ¡Que bien! ¡Si hay Shaders Asincronos en la casa!

LOL

Yes that is my understanding,

RedditUserB,

I dont' think the oxide dev is being forth coming, he is unwilling to share any hard data on how they reached that conclusion from their end, but he seems to be free to talk about it in generalizations which is not being transparent at all. Have you watched politicians talk, this is how he is talking, giving out this is what might be happening, if he is a dev at Oxide he should be able to tell us exactly what is going on in his program, with a definitive stance and data to back his comments up. He doesn't need to share source code, just psuedo code is enough, and profiler data, is that too much to ask for, when he is saying what he is saying?

trandoanhung1991 · Sep 2, 2015

Results with a 970 OCed to 1491/7962 on 355.82

Deleted member 13524 · Sep 2, 2015

Razor1 said:
I dont' think the oxide dev is being forth coming, he is unwilling to share any hard data

Wow, it's almost like that Oxide developer lives in this weird world where employees of a company have to sign NDAs and all that he states publicly must go through his higher-ups and/or the legal department first.

How very strange.
I'm shocked.

CasellasAbdala · Sep 2, 2015

ToTTenTranz said:
No. MDolenc's test has been unable to activate Async Compute in any form or shape using any nVidia card so far.

p p p p pero... mis shaders Asincronos! Nooo!

Seriously, theres like 5 people stating the results show it works on nvidia... and some other people stating otherwise... Any easy explanation for a high level programmer *who doesnt work with graphic programming atm, but will be interested to do so in the future btw.

Razor1 · Sep 2, 2015

after making the statements as he did
A) he is already in hot water
B) he already got consent to say somethings not others, this is why he is being vague, which points to more to C
C) he is not part of oxide

This is why when I read what he posted, I said he shouldn't have talked about it in the manner he did.

This is also the reason why we never see developers in the industry come out and say things like this without definitively showing why, and any industry for that matter.

Nobu · Sep 2, 2015

I think it's pretty clear that, while some operations *may* be happening asynchronously on Maxwell 2, and while overall it has less latency than GCN, GCN gains less latency as load increases. This tells me that either this test program is flawed for nvidia GPUs, or the driver isn't doing something right, or Maxwell 2 just can't do async compute as well or at all compared to GCN.

My results (Sapphire r9-270x 4G Dual-X):
Compute only:
1. 26.08ms [25.75]
Graphics only:
49.71ms (33.75G pixels/s)
Graphics + compute:
1. 49.70ms (33.75G pixels/s) [26.13] {0.00 G pixels/s}
Graphics, compute single commandlist:
1. 75.48ms (22.23G pixels/s) [25.74] {33.88 G pixels/s}

For whatever reason (maybe I'm just blind), there's no summary for compute only like the other ones, so I just picked the first result for my post. Can't upload here for some reason--uploaded here instead.

CasellasAbdala · Sep 2, 2015

Razor1 said:
after making the statements as he did
A) he is already in hot water
B) he already got consent to say somethings not others, this is why he is being vague, which points to more to C
C) he is not part of oxide

I emailed oxide when this started, Collock is legit.
*I still think he *Oxide in general) act as AMD spokesmen, they evade the fact that the fury X is a total fail at the tests *even with async* , which puts amd even worse than nvidia, just saying (No fanboyism or anything, i love my hd5870).
Also, AMD Robert is just doing hi typicall PR stuff (Remember the 4gb amd campagin? that was some FUNNY stuff.

Kaotik · Sep 2, 2015

CasellasAbdala said:
p p p p pero... mis shaders Asincronos! Nooo!

Seriously, theres like 5 people stating the results show it works on nvidia... and some other people stating otherwise... Any easy explanation for a high level programmer *who doesnt work with graphic programming atm, but will be interested to do so in the future btw.

How hard can it be to see? Graphics+Compute time on NVIDIA = Graphics time + Compute time, they're done separately after each other, no asynchronous work done, simple as that. If there was, Graphics+Compute time would be clearly under Graphics time + Compute time

Nobu · Sep 2, 2015

Also, something I noticed: the only time the bars overlap for Maxwell 2 is when compute takes more time than usual (or else the tops of the bars would drop). Definitely weird.

Nobu · Sep 2, 2015

I think a clear indicator of async compute is when the tops and bottoms of the bars move together, rather than just the top or bottom. You can see this clearly on GCN cards (particularly the Fury X), where the bar drops completely into the blue when it's doing async, rather than just extending the bottom into the blue. Obviously, something wrong is happening in the Fury X, because it's not doing async continuously, but that does make it easier to illustrate.

Sorry for the multi-posts...would edit if I could.

CasellasAbdala · Sep 2, 2015

Now, all in all, how does this affect a gtx 980Ti? (Objectively speaking) Trying to decide ebtween fury X and this for longevity...

Dygaza · Sep 2, 2015

There's something really wrong with GCN alltogether in this test. Compute times are just horrible, and GPU usage is way too low (max 10% under compute). Well granted it's not benchmark made for pure performance.

I.S.T. · Sep 2, 2015

Could driver bugs be behind the Fiji GPU issues?

I.S.T. · Sep 2, 2015

CasellasAbdala said:
Now, all in all, how does this affect a gtx 980Ti? (Objectively speaking) Trying to decide ebtween fury X and this for longevity...

I'd wait another year or buy a middle end part to tide yourself over. Fury X has issues speedwise compared to the 980 Ti/Titan X and likely not enough VRAM for later games, whereas with the 980 Ti, we have with regards to Async Compute what can best be described as a clusterfuck.

They're both inoptimal in not so nice ways for long term(Thinking 5 years) usage.

Arwin · Sep 2, 2015

Test ended with a crash on my 970GTX, is that normal? Just got a notification of official Windows 10 drivers being released, so installing those now.

DX12 Performance Discussion And Analysis Thread

RedditUserB

Deleted member 13524

Guest

CasellasAbdala

RedditUserB

Deleted member 13524

Guest

Razor1

trandoanhung1991

Attachments

Deleted member 13524

Guest

CasellasAbdala

Razor1

Nobu

CasellasAbdala

Kaotik

Drunk Member

Nobu

Nobu

CasellasAbdala

Dygaza

I.S.T.

I.S.T.

Arwin

Now Officially a Top 10 Poster

Attachments

Similar threads