Nvidia Pascal Reviews [1080XP, 1080ti, 1080, 1070ti, 1070, 1060, 1050, and 1030]

Make sure you consider the variable boost as well. Mostly, you cannot really be sure what clock rates are being compared and some people are really sloppy regarding their benchmarks' reproducibility and comparability.

It's true that we don't have all the data for many of these reviews, though I don't think the boost clock differential can account for this great of variability, unless a reviewer disabled boost on their 1070 sample altogether and did not do the same for their 1080. 33% is the maximum difference of any functional unit comparing 1070 to 1080, a few MHz here or there can't account for the remaining 17-18% seen in some of those other tests, it would need to be quite a large difference (several hundred MHz).
 
I can only speak for our review, were we made sure the clocks are as consistent as the benchmark runs themselves (i.e. very low variability) and from what I saw at a quick glance (didn't do all the excel math), we have no case among the gaming tests where the 1080 is more than 29% faster than the 1070. In synthetics, I think 33% is occuring once.

We also have two outliers (one being the higher texture bandwidth rate for the 1070 with 8 black textures, repeatable, might be cache-line related), and one for which i await feedback from Nvidia, where the 1070 is in some Luxmark scenes (from 3.0 and 3.1, not in the online review) faster than the 1080, which normally should not be possible, but is also repeatable.

But yes, boost clock difference should not account for massive performance differences.
 
where the 1070 is in some Luxmark scenes (from 3.0 and 3.1, not in the online review) faster than the 1080, which normally should not be possible, but is also repeatable.
Why shouldn't that be possible? More favorable ratio of L2 size to active threads possibly, coincidentally hitting a sweet spot.
 
I can only speak for our review, were we made sure the clocks are as consistent as the benchmark runs themselves (i.e. very low variability) and from what I saw at a quick glance (didn't do all the excel math), we have no case among the gaming tests where the 1080 is more than 29% faster than the 1070. In synthetics, I think 33% is occuring once.

We also have two outliers (one being the higher texture bandwidth rate for the 1070 with 8 black textures, repeatable, might be cache-line related), and one for which i await feedback from Nvidia, where the 1070 is in some Luxmark scenes (from 3.0 and 3.1, not in the online review) faster than the 1080, which normally should not be possible, but is also repeatable.

But yes, boost clock difference should not account for massive performance differences.

You're with PCGH? Great work you guys do :yes: I'd seen that Luxmark anomaly in another thread I posted about this question on Tech Report's forums, a user there cited HotHardware's Luxmark results. Strange indeed.
 
Why shouldn't that be possible? More favorable ratio of L2 size to active threads possibly, coincidentally hitting a sweet spot.

Is L2 in Pascal located outside of the GPC and also a portion not disabled for the GP104 found in 1070?
 
L2 is located with the ROPs and memory controllers. 1080 and 1070 are both fully enabled in that respect, so both get 2MB of L2.

Thanks, makes sense. How's that AT 1080 review coming along? I'd probably have known the answer to my question already if it was here... ;)
 
Update: looks like I don't know how to calculate. 1 GPC being disabled means the shader and geometry processing differences between 1070 and 1080 are identical, unlike the figures I was working with (28.75% and 37% differences, respectively - it should be 37% across the board).
 
The 1070 looks like a really great card and a solid upgrade from my 970. Still I'll probly wait for Volta (1170?)
 
I've been playing around with FP16 throughput in CUDA with a willing 1080 owner, in a little benchmark thing I've been working on for a couple of days. Might release the source, might not, but here's the quick documentation I wrote.

https://gist.github.com/rys/f427c0a85fcc367087c40fd8ffbdccb7

Nothing really new, other than the performance data near the end.
So that means most upper Maxwell 2 cards are faster at FP16 than GTX1080?
That is I thought they had a 1:1 relationship, or was that just certain Maxwell models?

Rather strange logic Nvidia has applied as older models outperform this generation - if right that some of the Maxwell models could do 1:1 for FP16.
I guess we will not know for sure what is going on until the GP102 is released for each sector; Tesla-Quadro-Titan.
Any chance you can get your hands on a Tegra X1?

Cheers
 
So that means most upper Maxwell 2 cards are faster at FP16 than GTX1080?
That is I thought they had a 1:1 relationship, or was that just certain Maxwell models?
Rys' code mentions compute level 5.3 (Tegra X1) and higher.
Maxwell 2 is compute level 5.2.

IOW: Maxwell 2 doesn't have native FP16 at all.

At least that's how understand it.
 
... if right that some of the Maxwell models could do 1:1 for FP16.

No... sm_50 and sm_52 do not have this capability.

If so you would just be able to access half-words and would achieve 64 ops/clock/SMM.

Nothing really new, other than the performance data near the end.

So 2*M*N fp16 vs. 1*M*N fp32 takes 6x more time?

A simulated (necessary for pre-sm_53) half2 FMA takes 11 instructions but you get 2 fp16 FMAs per thread.

Sounds like 6 ops per fp16 to me.

One way to discover more about what's going on under the hood is to not perform FMAs but just MULs.

If F2F conversion is happening then you'll be skipping unpacking/packing of the addend.
 
Last edited:
Back
Top