DX12 Performance Discussion And Analysis Thread

3dcgi · Sep 27, 2015

iroboto said:
But for async compute, isn't the idea to reserve less hardware to complete the jobs? I get that compute jobs depending on size will span over as many CU units as needed, but if the idea (for async compute) is to insert compute jobs where there are stalls in compute jobs (sync points in the shader, waiting for loading of resources etc), then ideally wouldn't you want to reserve less hardware?

Ideally you'd profile your graphics work (i.e. shadow mapping), see how many compute resources are needed for graphics and limit the resources available to async compute but no one's going to do this on PC for all possible GPUs. On PC the driver will do the best it can. You generally don't want to reserve hardware for compute shaders, but you do want to ensure they don't take over the system and starve the fixed function graphics pipeline.

iroboto said:
I assume most async compute jobs are being called from frame N, while the GPU is working on N-1.

I wouldn't assume this. I'm not saying you're incorrect, but it's too early in the life of async compute to assume how it will be used.

dogen · Sep 27, 2015

pjbliverpool said:
Is the user that posted that comparison a reliable source? Seems the Mods aren't giving him the benefit of the doubt at the moment. It's a pretty big claim to make with nothing I can see to back it up thus far.

He's a pretty well known member on that board. He says he's working on a vr project on the ps4, and seems to be very knowledgeable based on his post history. Hard to believe he would have access to a different version of the demo, but who knows.

Deleted member 2197 · Sep 27, 2015

dogen said:
He's a pretty well known member on that board. He says he's working on a vr project on the ps4, and seems to be very knowledgeable based on his post history. Hard to believe he would have access to a different version of the demo, but who knows.

Unless the version he had access to is the same used to provide ExtremeTech with the graph produced by AMD PR team.

Lightman · Sep 27, 2015

pharma said:
Unless the version he had access to is the same used to provide ExtremeTech with the graph produced by AMD PR team.

In Anandtech piece Ryan mentioned AMD had new driver with optimizations for this demo but it was too late to include it in the article.

Deleted member 2197 · Sep 27, 2015

Lightman said:
In Anandtech piece Ryan mentioned AMD had new driver with optimizations for this demo but it was too late to include it in the article.

I don't think the new driver would include the ability to turn async compute off/on in the demo as mentioned by the poster ...

SimBy said:
Another bench comparing async on and off. This benchmark version might not be the same as the one reviewers got.

lanek · Sep 27, 2015

pharma said:
I don't think the new driver would include the ability to turn async compute off/on in the demo as mentioned by the poster ...

I dont think he use the same demo, looking at what he posted, i can imagine he have accesss to a developer sample who is not locked as the demo send to reviewers..

Ofc, untill we can get a sample and can check iti this will not help us much..

gamervivek · Sep 27, 2015

His post would have more credibility if Fury X scored 88fps with new drivers. As of now the reviewers have it at around 75fps mark at 1080p(even extremetech who apparently use the new driver unlike anandtech) while he is getting 87.8fps which is an OCed 980Ti territory, as in TR's review.

Deleted member 2197 · Sep 27, 2015

TBH at this time I take anything coming out of extremetech with a grain of salt. I would not doubt based on his initial review that they are now using the "AMD PR demo" version.

gamervivek · Sep 27, 2015

I read they couldn't change the settings of the benchmark. Anyway, 390X has a great showing and with same shaders as a 980Ti is about 15% off its performance with about a similar difference in clockspeeds.

If Fury cards scaled well, this would have been quite a coup for AMD considering UE4 doesn't run that well on AMD cards in dx11.

Deleted member 2197 · Sep 27, 2015

Nvidia DX12 Do's And Don'ts
https://developer.nvidia.com/dx12-dos-and-donts

juanchotazo99 · Sep 27, 2015

Don’ts

Don’t toggle between compute and graphics on the same command queue more than absolutely necessary

This is still a heavyweight switch to make

Don’t toggle tessellation on/off more than absolutely necessary

Again, this is still a heavyweight switch to make

That's interesting. Could that be taken as an answer to all this?

Razor1 · Sep 27, 2015

No toggling means on and off of the kernel execution I don't think that means for the instructions to be processed.

Plus the programmer doesn't have access to what is being done (order wise) when things are being done asynchronously.

Alessio1989 · Sep 28, 2015

Don’t rely on being able to allocate all GPU memory in one go

Depending on the underlying GPU architecture the memory may or may not be segmented

wtf? Please explain...

lanek · Sep 28, 2015

gamervivek said:
His post would have more credibility if Fury X scored 88fps with new drivers. As of now the reviewers have it at around 75fps mark at 1080p(even extremetech who apparently use the new driver unlike anandtech) while he is getting 87.8fps which is an OCed 980Ti territory, as in TR's review.

I tend to give him the benefit of the doubt, he look too well informed in the ballpark of VR and i recognize in his post some information about LiquidVR than only some aficionados, or peoples working witth them ( Occulus VR) can saw right now ( specially on the console question about VR ). Thoses things are not public right now, and he turn the things in a way that you can run around it for a while. ( by that, i mean he point it to the right direction, without divulging anything )

lanek · Sep 28, 2015

Razor1 said:
No toggling means on and off of the kernel execution I don't think that means for the instructions to be processed.

Plus the programmer doesn't have access to what is being done (order wise) when things are being done asynchronously.

Not in an architecture who dont have fine preemption grain ( so not on Nvidia gpu ).. On GCN you can.

pjbliverpool · Sep 28, 2015

lanek said:
I tend to give him the benefit of the doubt, he look too well informed in the ballpark of VR and i recognize in his post some information about LiquidVR than only some aficionados, or peoples working witth them ( Occulus VR) can saw right now ( specially on the console question about VR ). Thoses things are not public right now, and he turn the things in a way that you can run around it for a while. ( by that, i mean he point it to the right direction, without divulging anything )

If he is correct then I'd say this is quite a big coup for AMD - effectively another DX12 benchmark where they are kicking NV in the nuts thanks to async compute (AMD were already doing very well at performance tiers below Fiji). It's also quite an achievement of the NV marketing department to cover that up in regards to this benchmark.

I'll wait for more data points but I'm seriously starting to question my certainty that Pascal will be my next GPU purchase.

BRiT · Sep 28, 2015

Alessio1989 said:
wtf? Please explain...

Isnt that for the "4gb" nvidia card where only 3.5gb is "fast" memory and the rest is "slow" memory?

Infinisearch · Sep 28, 2015

lanek said:
Not in an architecture who dont have fine preemption grain ( so not on Nvidia gpu ).. On GCN you can.

Does GCN have fine grain preemption or fine grain sharing? I thought it was the latter.

CSI PC · Sep 28, 2015

pjbliverpool said:
If he is correct then I'd say this is quite a big coup for AMD - effectively another DX12 benchmark where they are kicking NV in the nuts thanks to async compute (AMD were already doing very well at performance tiers below Fiji). It's also quite an achievement of the NV marketing department to cover that up in regards to this benchmark.

I'll wait for more data points but I'm seriously starting to question my certainty that Pascal will be my next GPU purchase.

I think he is specifically a PS4 developer, so probably cannot equate this to PC environment.
Cheers

Deleted member 13524 · Sep 28, 2015

BRiT said:
Isnt that for the "4gb" nvidia card where only 3.5gb is "fast" memory and the rest is "slow" memory?

Yup, which comes together with the GTX 660 Ti with 1.5GB at 192bit and 512MB at 64bit.

I don't think there are modern lower-end parts using Turbocache, but I imagine that would also be a big problem.

DX12 Performance Discussion And Analysis Thread

3dcgi

dogen

Deleted member 2197

Guest

Lightman

Deleted member 2197

Guest

lanek

gamervivek

Deleted member 2197

Guest

gamervivek

Deleted member 2197

Guest

juanchotazo99

Razor1

Alessio1989

lanek

lanek

pjbliverpool

B3D Scallywag

BRiT

(>• •)>⌐■-■ (⌐■-■)

Infinisearch

CSI PC

Deleted member 13524

Guest

Similar threads