Nvidia Pascal Reviews [1080XP, 1080ti, 1080, 1070ti, 1070, 1060, 1050, and 1030]

ShaidarHaran · May 30, 2016

CarstenS said:
Make sure you consider the variable boost as well. Mostly, you cannot really be sure what clock rates are being compared and some people are really sloppy regarding their benchmarks' reproducibility and comparability.

It's true that we don't have all the data for many of these reviews, though I don't think the boost clock differential can account for this great of variability, unless a reviewer disabled boost on their 1070 sample altogether and did not do the same for their 1080. 33% is the maximum difference of any functional unit comparing 1070 to 1080, a few MHz here or there can't account for the remaining 17-18% seen in some of those other tests, it would need to be quite a large difference (several hundred MHz).

CarstenS · May 30, 2016

I can only speak for our review, were we made sure the clocks are as consistent as the benchmark runs themselves (i.e. very low variability) and from what I saw at a quick glance (didn't do all the excel math), we have no case among the gaming tests where the 1080 is more than 29% faster than the 1070. In synthetics, I think 33% is occuring once.

We also have two outliers (one being the higher texture bandwidth rate for the 1070 with 8 black textures, repeatable, might be cache-line related), and one for which i await feedback from Nvidia, where the 1070 is in some Luxmark scenes (from 3.0 and 3.1, not in the online review) faster than the 1080, which normally should not be possible, but is also repeatable.

But yes, boost clock difference should not account for massive performance differences.

Ext3h · May 30, 2016

CarstenS said:
where the 1070 is in some Luxmark scenes (from 3.0 and 3.1, not in the online review) faster than the 1080, which normally should not be possible, but is also repeatable.

Why shouldn't that be possible? More favorable ratio of L2 size to active threads possibly, coincidentally hitting a sweet spot.

CarstenS · May 30, 2016

Should've written: Normally not possible.

ShaidarHaran · May 30, 2016

CarstenS said:
I can only speak for our review, were we made sure the clocks are as consistent as the benchmark runs themselves (i.e. very low variability) and from what I saw at a quick glance (didn't do all the excel math), we have no case among the gaming tests where the 1080 is more than 29% faster than the 1070. In synthetics, I think 33% is occuring once.

We also have two outliers (one being the higher texture bandwidth rate for the 1070 with 8 black textures, repeatable, might be cache-line related), and one for which i await feedback from Nvidia, where the 1070 is in some Luxmark scenes (from 3.0 and 3.1, not in the online review) faster than the 1080, which normally should not be possible, but is also repeatable.

But yes, boost clock difference should not account for massive performance differences.

You're with PCGH? Great work you guys do :yes:

I'd seen that Luxmark anomaly in another thread I posted about this question on Tech Report's forums, a user there cited HotHardware's Luxmark results. Strange indeed.

ShaidarHaran · May 30, 2016

Ext3h said:
Why shouldn't that be possible? More favorable ratio of L2 size to active threads possibly, coincidentally hitting a sweet spot.

Is L2 in Pascal located outside of the GPC and also a portion not disabled for the GP104 found in 1070?

Ryan Smith · May 30, 2016

ShaidarHaran said:
Is L2 in Pascal located outside of the GPC and also a portion not disabled for the GP104 found in 1070?

L2 is located with the ROPs and memory controllers. 1080 and 1070 are both fully enabled in that respect, so both get 2MB of L2.

ShaidarHaran · May 30, 2016

Ryan Smith said:
L2 is located with the ROPs and memory controllers. 1080 and 1070 are both fully enabled in that respect, so both get 2MB of L2.

Thanks, makes sense. How's that AT 1080 review coming along? I'd probably have known the answer to my question already if it was here...

ShaidarHaran · May 31, 2016

Update: looks like I don't know how to calculate. 1 GPC being disabled means the shader and geometry processing differences between 1070 and 1080 are identical, unlike the figures I was working with (28.75% and 37% differences, respectively - it should be 37% across the board).

Clukos · May 31, 2016

Some firestrike results from oc.net with a 1080 FE at 2.1 (100% fan probably means not much clock variation): http://www.3dmark.com/fs/8632279

The GPU score seems to be the equivalent of a 980ti clocked at 1.7ghz: http://www.3dmark.com/fs/7844643

And is well above (over 1k) over the best firestrike result with a titan X in the hof (at 1.65ghz): http://www.3dmark.com/fs/5397064

xEx · May 31, 2016

Clukos said:
Some firestrike results from oc.net with a 1080 FE at 2.1 (100% fan probably means not much clock variation): http://www.3dmark.com/fs/8632279

The GPU score seems to be the equivalent of a 980ti clocked at 1.7ghz: http://www.3dmark.com/fs/7844643

And is well above (over 1k) over the best firestrike result with a titan X in the hof (at 1.65ghz): http://www.3dmark.com/fs/5397064

Why the Physics Score were so low in the 1080? 13k vs 23k

Clukos · May 31, 2016

xEx said:
Why the Physics Score were so low in the 1080? 13k vs 23k

Physics is always CPU

i7-4790K vs i7-5960X

Deleted member 2197 · May 31, 2016

Nvidia GTX 1080 Tested with Core i7 6950X / 6900K / 6850K & 6800K Processor
http://www.guru3d.com/articles_pages/core_i7_6950x_6900k_6850k_and_6800k_processor_review,1.html

homerdog · May 31, 2016

The 1070 looks like a really great card and a solid upgrade from my 970. Still I'll probly wait for Volta (1170?)

Rys · Jun 1, 2016

I've been playing around with FP16 throughput in CUDA with a willing 1080 owner, in a little benchmark thing I've been working on for a couple of days. Might release the source, might not, but here's the quick documentation I wrote.

https://gist.github.com/rys/f427c0a85fcc367087c40fd8ffbdccb7

Nothing really new, other than the performance data near the end.

CSI PC · Jun 1, 2016

Rys said:
I've been playing around with FP16 throughput in CUDA with a willing 1080 owner, in a little benchmark thing I've been working on for a couple of days. Might release the source, might not, but here's the quick documentation I wrote.

https://gist.github.com/rys/f427c0a85fcc367087c40fd8ffbdccb7

Nothing really new, other than the performance data near the end.

So that means most upper Maxwell 2 cards are faster at FP16 than GTX1080?
That is I thought they had a 1:1 relationship, or was that just certain Maxwell models?

Rather strange logic Nvidia has applied as older models outperform this generation - if right that some of the Maxwell models could do 1:1 for FP16.
I guess we will not know for sure what is going on until the GP102 is released for each sector; Tesla-Quadro-Titan.
Any chance you can get your hands on a Tegra X1?

Cheers

Jawed · Jun 1, 2016

Since I'm on a roll

https://forum.beyond3d.com/posts/1886240

I wonder if NVidia will lock "high performance" cuDNN on Pascal to the Tesla variants, like it did with double-precision.

silent_guy · Jun 1, 2016

CSI PC said:
So that means most upper Maxwell 2 cards are faster at FP16 than GTX1080?
That is I thought they had a 1:1 relationship, or was that just certain Maxwell models?

Rys' code mentions compute level 5.3 (Tegra X1) and higher.
Maxwell 2 is compute level 5.2.

IOW: Maxwell 2 doesn't have native FP16 at all.

At least that's how understand it.

pixelio · Jun 1, 2016

CSI PC said:
... if right that some of the Maxwell models could do 1:1 for FP16.

No... sm_50 and sm_52 do not have this capability.

If so you would just be able to access half-words and would achieve 64 ops/clock/SMM.

Rys said:
Nothing really new, other than the performance data near the end.

So 2*M*N fp16 vs. 1*M*N fp32 takes 6x more time?

A simulated (necessary for pre-sm_53) half2 FMA takes 11 instructions but you get 2 fp16 FMAs per thread.

Sounds like 6 ops per fp16 to me.

One way to discover more about what's going on under the hood is to not perform FMAs but just MULs.

If F2F conversion is happening then you'll be skipping unpacking/packing of the addend.

homerdog · Jun 1, 2016

Jawed said:
Since I'm on a roll

https://forum.beyond3d.com/posts/1886240

Does GP104 even have fast FP16 in hardware?

Nvidia Pascal Reviews [1080XP, 1080ti, 1080, 1070ti, 1070, 1060, 1050, and 1030]

ShaidarHaran

hardware monkey

CarstenS

Moderator

Ext3h

CarstenS

Moderator

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Ryan Smith

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Clukos

Bloodborne 2 when?

xEx

Clukos

Bloodborne 2 when?

Deleted member 2197

Guest

homerdog

donator of the year

Rys

Graphics @ AMD

CSI PC

Jawed

silent_guy

pixelio

homerdog

donator of the year

Similar threads