DX12 Performance Discussion And Analysis Thread

But for async compute, isn't the idea to reserve less hardware to complete the jobs? I get that compute jobs depending on size will span over as many CU units as needed, but if the idea (for async compute) is to insert compute jobs where there are stalls in compute jobs (sync points in the shader, waiting for loading of resources etc), then ideally wouldn't you want to reserve less hardware?
Ideally you'd profile your graphics work (i.e. shadow mapping), see how many compute resources are needed for graphics and limit the resources available to async compute but no one's going to do this on PC for all possible GPUs. On PC the driver will do the best it can. You generally don't want to reserve hardware for compute shaders, but you do want to ensure they don't take over the system and starve the fixed function graphics pipeline.

I assume most async compute jobs are being called from frame N, while the GPU is working on N-1.
I wouldn't assume this. I'm not saying you're incorrect, but it's too early in the life of async compute to assume how it will be used.
 
Is the user that posted that comparison a reliable source? Seems the Mods aren't giving him the benefit of the doubt at the moment. It's a pretty big claim to make with nothing I can see to back it up thus far.

He's a pretty well known member on that board. He says he's working on a vr project on the ps4, and seems to be very knowledgeable based on his post history. Hard to believe he would have access to a different version of the demo, but who knows.
 
He's a pretty well known member on that board. He says he's working on a vr project on the ps4, and seems to be very knowledgeable based on his post history. Hard to believe he would have access to a different version of the demo, but who knows.
Unless the version he had access to is the same used to provide ExtremeTech with the graph produced by AMD PR team.
 
Last edited by a moderator:
Unless the version he had access to is the same used to provide ExtremeTech with the graph produced by AMD PR team.

In Anandtech piece Ryan mentioned AMD had new driver with optimizations for this demo but it was too late to include it in the article.
 
In Anandtech piece Ryan mentioned AMD had new driver with optimizations for this demo but it was too late to include it in the article.
I don't think the new driver would include the ability to turn async compute off/on in the demo as mentioned by the poster ...
Another bench comparing async on and off. This benchmark version might not be the same as the one reviewers got.
 
I don't think the new driver would include the ability to turn async compute off/on in the demo as mentioned by the poster ...

I dont think he use the same demo, looking at what he posted, i can imagine he have accesss to a developer sample who is not locked as the demo send to reviewers..

Ofc, untill we can get a sample and can check iti this will not help us much..
 
His post would have more credibility if Fury X scored 88fps with new drivers. As of now the reviewers have it at around 75fps mark at 1080p(even extremetech who apparently use the new driver unlike anandtech) while he is getting 87.8fps which is an OCed 980Ti territory, as in TR's review.
 
TBH at this time I take anything coming out of extremetech with a grain of salt. I would not doubt based on his initial review that they are now using the "AMD PR demo" version.
 
I read they couldn't change the settings of the benchmark. Anyway, 390X has a great showing and with same shaders as a 980Ti is about 15% off its performance with about a similar difference in clockspeeds.

If Fury cards scaled well, this would have been quite a coup for AMD considering UE4 doesn't run that well on AMD cards in dx11.
 
Don’ts
  • Don’t toggle between compute and graphics on the same command queue more than absolutely necessary
    • This is still a heavyweight switch to make
  • Don’t toggle tessellation on/off more than absolutely necessary
    • Again, this is still a heavyweight switch to make

That's interesting. Could that be taken as an answer to all this?
 
No toggling means on and off of the kernel execution I don't think that means for the instructions to be processed.

Plus the programmer doesn't have access to what is being done (order wise) when things are being done asynchronously.
 
His post would have more credibility if Fury X scored 88fps with new drivers. As of now the reviewers have it at around 75fps mark at 1080p(even extremetech who apparently use the new driver unlike anandtech) while he is getting 87.8fps which is an OCed 980Ti territory, as in TR's review.

I tend to give him the benefit of the doubt, he look too well informed in the ballpark of VR and i recognize in his post some information about LiquidVR than only some aficionados, or peoples working witth them ( Occulus VR) can saw right now ( specially on the console question about VR ). Thoses things are not public right now, and he turn the things in a way that you can run around it for a while. ( by that, i mean he point it to the right direction, without divulging anything )
 
Last edited:
I tend to give him the benefit of the doubt, he look too well informed in the ballpark of VR and i recognize in his post some information about LiquidVR than only some aficionados, or peoples working witth them ( Occulus VR) can saw right now ( specially on the console question about VR ). Thoses things are not public right now, and he turn the things in a way that you can run around it for a while. ( by that, i mean he point it to the right direction, without divulging anything )

If he is correct then I'd say this is quite a big coup for AMD - effectively another DX12 benchmark where they are kicking NV in the nuts thanks to async compute (AMD were already doing very well at performance tiers below Fiji). It's also quite an achievement of the NV marketing department to cover that up in regards to this benchmark.

I'll wait for more data points but I'm seriously starting to question my certainty that Pascal will be my next GPU purchase.
 
If he is correct then I'd say this is quite a big coup for AMD - effectively another DX12 benchmark where they are kicking NV in the nuts thanks to async compute (AMD were already doing very well at performance tiers below Fiji). It's also quite an achievement of the NV marketing department to cover that up in regards to this benchmark.

I'll wait for more data points but I'm seriously starting to question my certainty that Pascal will be my next GPU purchase.
I think he is specifically a PS4 developer, so probably cannot equate this to PC environment.
Cheers
 
Isnt that for the "4gb" nvidia card where only 3.5gb is "fast" memory and the rest is "slow" memory?

Yup, which comes together with the GTX 660 Ti with 1.5GB at 192bit and 512MB at 64bit.

I don't think there are modern lower-end parts using Turbocache, but I imagine that would also be a big problem.
 
Back
Top