Nvidia Pascal Reviews [1080XP, 1080ti, 1080, 1070ti, 1070, 1060, 1050, and 1030]

Isn't async compute simply the fact that a GPU can run compute shaders independently and asynchronously with graphics workloads?
If so, doing it inter-shader instead of intra-shader should be sufficient to meet that definition.
Nobody said that it has to be the most efficient or fastest implementation in existence. Similarly, nobody said that enabling async compute has to be faster than not enabling it: if a particular implementation is such that it can't find inefficiencies to exploit, then so be it.
 
Isn't async compute simply the fact that a GPU can run compute shaders independently and asynchronously with graphics workloads?
If so, doing it inter-shader instead of intra-shader should be sufficient to meet that definition.
Yes, cooperative scheduling is perfectly sufficient to fulfill the specification. Maxwell did that already, respectively you can do that on any hardware.

But the problem with Maxwell was that it would essentially flush the entire graphics pipeline, all SMMs and stall the command processors, in order to reconfigure the hardware for compute. That made the switch extremely expensive, as the GPU utilization suffers while the remaining draw calls complete, and the GPC isn't allowed to dispatch anything new.

The specs said nowhere that you had to gain anything from Async Compute, but that penalty should not have happened either.
 
Yes, cooperative scheduling is perfectly sufficient to fulfill the specification. Maxwell did that already, respectively you can do that on any hardware.

But the problem with Maxwell was that it would essentially flush the entire graphics pipeline, all SMMs and stall the command processors, in order to reconfigure the hardware for compute. That made the switch extremely expensive, as the GPU utilization suffers while the remaining draw calls complete, and the GPC isn't allowed to dispatch anything new.

The specs said nowhere that you had to gain anything from Async Compute, but that penalty should not have happened either.
Fine. That's Maxwell. So with Pascal, they're able to avoid this flush and reassign the SMs dynamically? That's a major improvement, right? So why the complaints? It's not perfect, it doesn't have the granularity of AMD. It's not the first time that there have been features that worked better for one vendor than the other.
 
. Similarly, nobody said that enabling async compute has to be faster than not enabling it: if a particular implementation is such that it can't find inefficiencies to exploit, then so be it.

It is well known AMD hardware suffered from underutilization since generations ago. Async helps them achieve better utilization. That doesnt mean NVIDIA should follow suit. There are other ways through which a certain archeticture maximizes its throughput.
 
The pity is the PCB is not shorter. A card like this with the size of a Fury nano would be perfect for a mini-ITX living room system.
Yeah I'm waiting to see how the mid-range looks for my HTPC replacement as I want to move to dedicated discrete instead of Steam streaming from my gaming PC. Provided all the checkboxes are there for 4k, HDR etc. I'm hoping Polaris will shine for perf/w.
 
Fine. That's Maxwell. So with Pascal, they're able to avoid this flush and reassign the SMs dynamically? That's a major improvement, right? So why the complaints? It's not perfect, it doesn't have the granularity of AMD. It's not the first time that there have been features that worked better for one vendor than the other.
I'm only complaining that they are apparently not putting the hardware to FULL use yet. Now that they fixed Pascal, it's about time that they move the DX12 compute queues to the GMU as well. Till now, the hardware queues in that are still reserved for CUDA only. In hindsight it makes sense why they didn't do that for Maxwell yet, it just wouldn't have worked properly at all. But with Pascal, that limitation is gone, and there are still *actual* gains to be achieved there.

Apart from that, it is impressive that they managed to fix the fundamental problem so thoroughly this time.
 

I like what have bring Nvidia with the 1080, but honestly, when i read something like ( up to ) 3x the performance over the previous generation in the main page. ( ofc maybe is in a really specific case of a specific VR implementation ( even if we cant really check it ).. i dont know what to tell.

Personnally i wait that they really demo a VR game who run at 300% more fps... Should be easy to calculate, the game will run faster with VR that on a 1080p standard monitor ( i joke a bit about the marketing )
 
Spec for spec it's very close to a Titan-X. I understand NV are saying it will be faster, although it will likely be very marginally so. Factory O/C versions though should easily cruise past the Titan-X.

Some compute applications, like physics simulation (ie fluid,smoke), are memory bandwidth limited.
There the frame buffer compression does not help and the 80 GB/s deficit of the 1070 will hurt.

Take for example the Wave simulation @ http://users.skynet.be/fquake/
I'm getting 573 FPS on a Titan-X @ 1080p screen resolution.
(With memory @ 8GHz 654 FPS)
 
Last edited:
Some compute applications, like physics simulation (ie fluid,smoke), are memory bandwidth limited.
There the frame buffer compression does not help and the 80 GB/s deficit of the 1070 will hurt.

Take for example the Wave simulation @ http://users.skynet.be/fquake/
I'm getting 573 FPS on a Titan-X @ 1080p screen resolution.
(With memory @ 8GHz 654 FPS)
Nice scaling. :)
1080: 646
Fury X: 808

Where is this from?
From here:
http://www.pcgameshardware.de/Nvidi...cials/Geforce-GTX-1080-GTX-1070-KFKA-1195567/
and confirmed by Nvidia.
 
Last edited:
Back
Top