D
Deleted member 2197
Guest
He could have also spoke more about the use of 12_1 features like Conservative Raster or Raster Ordered Views in their benchmark engine.
hmm everything from cuda developer toolkit says otherwise for Maxwell 2 not having async shaders
I think it would have been better if he didn't say anything lol
Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
How sure are we that this is an actual developer? There are NO "async compute" flags or cap bits.Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that.
With deferred texturing you can get significantly higher than 30% gains, since the G-buffer pixel shader doesn't do any texturing or use much BW. Exact number of course highly depends on your lighting and post processing algorithms and how much parallelism you pipeline offers.Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC.
If NVidia's driver is saying it's FL12_0 doesn't that mean that it's saying it has async compute? So they have to resort to hardware detection?How sure are we that this is an actual developer? There are NO "async compute" flags or cap bits.
Some features, such as async compute and ExecuteIndirect are new DirectX 12 API features. These features are supported by all GPUs.If NVidia's driver is saying it's FL12_0 doesn't that mean that it's saying it has async compute? So they have to resort to hardware detection?
Indeed. And an optional feature at that. You don't need async compute to be DX12 compliant, which is why even Fermi will (eventually) be DX12 capable.Some features, such as async compute and ExecuteIndirect are new DirectX 12 API features. These are supported by all GPUs.
I think it actually gets a bit more complicated then that. My initial understanding was the same: more command queues more chances to keep GPU busy. Though 128 queues in my benchmark probably went a bit overboard (should have put some diagnostics in that but maybe that's where GCN crashed?).Having more queues (threads) is mostly helpful when the queues (threads) are issuing kernels (parallel_for_each invocations) that do not solely fill the GPU (CPU). Not all tasks can be split to hundreds of thousands of parallel work items.
Yes. DX12 has manual hazard tracking. All modern GPUs can also fetch multiple commands from a single queue and run them concurrently, assuming the resource barriers you put in the command queue allow that. In my example single lane shader, it is required to have a barrier on both sides (before and after), preventing any parallelism from the same queue.I think it actually gets a bit more complicated then that. My initial understanding was the same: more command queues more chances to keep GPU busy. Though 128 queues in my benchmark probably went a bit overboard (should have put some diagnostics in that but maybe that's where GCN crashed?).
There are basically two commands: Dispatch and Draw. They can be put into command list and command list gets executed on command queue. What I'm seeing at the moment though is that adding a bunch of (single lane) Dispatch calls to single command list executed on single queue will actually run dispatches in parallel on Kepler. Two command lists however will not run in parallel (at least not on Kepler).
The interesting bit of course is the compute + graphics bit, I'm extending in that direction at the moment.
You can't query for this in d3d12. It's just there. Just as a bunch of FUD about this feature.So async compute is optional in D3D12. I suppose over time NVidia will work out which games to say "nope" to when queried for this then.
And if the GPU hardware doesn't support it, the "threads" are simply executed serially?You can't query for this in d3d12. It's just there.
And if the GPU hardware doesn't support it, the "threads" are simply executed serially?