Pascal vs. Maxwell - a Generational Performance Comparison & Analysis

ShaidarHaran

hardware monkey
Veteran
GP104 provides a performance increase vs. it's direct predecessor, GM204 as well as vs. GM200, the largest GPU of the Maxwell (2) generation. This is not entirely unprecedented, GM204 itself was in a similar position upon its introduction relative to the prior (complete) generation, Kepler. The difference this time however appears to be much greater, such that this initial observation also applies to the 2nd tier GP104 SKU, GTX 1070 and not only the top tier SKU as with the prior generation. Given the reduced price of GTX 1070 relative to GTX 1080 or 980 Ti, this makes it an excellent value proposition at the top-end of the graphics card performance spectrum, with the 1080 offering additional performance at a higher price point.

When GP104 was unveiled, I became curious as to its performance relative not only to GM204 but also GM200 as a means of determining what (if any) architectural improvements have been made. To determine the answer to this question, I have utilized overclockersclub.com's 1070 overclocking review which provides performance numbers for all of the aforementioned GPUs in 9 current games at resolutions of 1920x1080, 2560x1440, and 4k (assuming 3840x2160). I've compiled a spreadsheet (Google docs) of data from this review and have made a number of calculations comparing the GPUs in question. I'll provide links to the review and the spreadsheet in question at the end of this post. From this data, I've been able to make some interesting observations.

GTX 970 vs. 1070:
I believe this to be the natural comparison of SKUs between these generations based on their MSRPs ($329+ vs. $379-449). My current gaming PC has a Gigabyte GTX 970 G1 Gaming with an EK waterblock and is overclocked to 1554/8000, almost exactly the same speed as the 970 featured by overclockersclub in their review. Contemplating an upgrade, I found this to be a near perfect natural comparison and so I began to dig in to the numbers. Given the overclocked speeds of the cards in question, 1070 has a clockspeed increase of 30.9% over the 970 (2050 vs. 1566MHz). There is also an SP count differential of 15.4% (1920 vs. 1664). Combining these increases we arrive at an expected SP throughput difference of of 51.1%. Therefore, in purely shader-bound workloads the performance difference from 970 to 1070 should be close to this number. Averaging the performance across the 3 resolutions tested we arrive at a figure of 53.2% performance advantage for the 1070. It is possible that there were some variances in the reported clockspeeds which could account for this difference, it is also possible that there are architectural differences at play. Without further supporting information I cannot say if either or both options are true. Nonetheless, an interesting finding and what seems to be a rather close corrolation. One last data point of note, FPS/W is 73.3% higher with 1070. Further breakdowns of the performance of each card are available in the spreadsheet.

GTX 980 Ti vs 1080:
Here's the basic breakdown: 1080 has a clockspeed increase of 40.9% vs. 980 Ti (2050 vs. 1455), with a 9.1% decrease in SP count (2560 vs. 2816). This equates to a 28.1% theoretical increase in SP throughput. The average performance increase across all 3 resolutions was 22.6%. The observed performance increase is not as high as with 1070 vs. 970. This leads me to believe that SP throughput is not the only limiting factor in the observed performance. I am not an electrical engineer (let alone architect) so I don't have much insight into this phenomenon. FPS/W is 51.5% higher with 1080. Given that fact, its performance gain over 980 Ti is impressive.

There's a lot more data in the spreadsheet, including performance for each card at each resolution. Anyone interested in highlighting further datapoints of interest feel free, I hope this can be a good topic for discussion. Kudos to OCC for their hard work in this review and for including so many data points.

Overclockers Club review: http://www.overclockersclub.com/reviews/nvidia_geforcegtx_1070_overclocking/
Spreadsheet: https://docs.google.com/spreadsheets/d/10mwNtNsQXNJCjHQPtE5K636f-rXjZTRrmHszOUWyI6Y/edit?usp=sharing

Note: FPS/W figures were calculated based on measured total system power consumption. The ratios are the same as what they would be if the power of each card was measured individually, but the calculated numbers themselves are lower.
 
That would be great! Given that the expected scaling from 970 -> 1070 lines up nearly perfectly with the observed scaling, it seems safe to say there are likely to be very few architectural optimizations from Maxwell to Pascal which benefit the majority of graphics workloads. Obviously, compute is a different story. From what I have heard, VR seems to have gotten the most attention in this regard for the consumer products at least, I didn't delve into VR performance in my investigation.
 
Isn't the multiview port most responsible for the performance gains in VR, though? Not having to replicate geometry calculations per view is a big deal, I would think.

I updated the spreadsheet with data for the 980 and made a 980 -> 1070 comparison also. No 980 -> 1080 comparison though, at least not yet.
 
Don't forget lens matched shading. It will reduce by a lot the number of pixels to shade in VR apps.
That's probably a much bigger factor than the geometry reduction, since it's a guaranteed, fixed, and substantial amount of pixel rusty don't need to be shaded.
Who knows how much load is really caused by the geometry...
 
A very nice analysis overall, thank you!
Note: FPS/W figures were calculated based on measured total system power consumption. The ratios are the same as what they would be if the power of each card was measured individually, but the calculated numbers themselves are lower.

This however is not true. When the GTX 1070 achieves ~50% more fps in a given game, the rest-of-system (CPU, busses, memories, storage even sometimes) have to provide 50% more data in the same amount of time, increasing workloads and thus increasing power consumption as well over the increase of the card alone.
 
Nice information.
From a technical perspective comparing a 980ti to a 1080 also needs to consider the basic architecture and not just Cuda cores, especially when it comes to games - from a caveat point anyway.
One reason the 980ti still relevant today compared to 980 is that the 980ti has a full compliment 6 GPCs with the associated number of Polymorph Engine/SM/Cuda cores, albeit with Polymorph Engine-SM-Cores slightly reduced compared to the Titan X.
This makes a noteable difference as it can be seen how 980 struggles these days against a 390x while a 980ti is still very strong against a Fury X - yeah appreciate one can only do a very loose correlation this way, but trends do show the 980ti still being very competitive/strong with modern games.

So ideally the 1080 needs to be compared with both 980 and 980ti; the 980 is showing the improvement over like-for-like (albeit the 1080 has 1 extra SM per GPC) and the 980ti that is the 'near' complete Maxwell 2 architecture and higher performance tier to beat.
Cheers
 
The closest gpus you can compare to see performance increase from generation to generation between Pascal and Maxwell is to compare a 980 to a 1070 at the exact same clock speed.

Same number of ROPs: 64
Almost the same number of TMUs: 120 (1070) vs 128 (980)
Same memory bus width: 256 bit
Similar number of SPs: 1920 (1070) vs 2048 (980)

Only thing you have to do is either overclock the 980 memory to match the 1070 or underclock the 1070 to match the 7GHz memory clock of the 980. Both running the same core clock speed (around 1500MHz). Now that would be interesting :yep2:
 
A very nice analysis overall, thank you!


This however is not true. When the GTX 1070 achieves ~50% more fps in a given game, the rest-of-system (CPU, busses, memories, storage even sometimes) have to provide 50% more data in the same amount of time, increasing workloads and thus increasing power consumption as well over the increase of the card alone.

I disagree. As I said, the *ratio* stays the same because we're using the same measurement in both instances. Also, CPU power consumption due to driver workload is unlikely to vary greatly between situations where you're already GPU-limited.

I know you have a lot of experience and probably have touched more GPUs than I ever will (though I do buy them like they're going out of style), so if you have data that indicates otherwise please do share.
 
Nice information.
From a technical perspective comparing a 980ti to a 1080 also needs to consider the basic architecture and not just Cuda cores, especially when it comes to games - from a caveat point anyway.
One reason the 980ti still relevant today compared to 980 is that the 980ti has a full compliment 6 GPCs with the associated number of Polymorph Engine/SM/Cuda cores, albeit with Polymorph Engine-SM-Cores slightly reduced compared to the Titan X.
This makes a noteable difference as it can be seen how 980 struggles these days against a 390x while a 980ti is still very strong against a Fury X - yeah appreciate one can only do a very loose correlation this way, but trends do show the 980ti still being very competitive/strong with modern games.

So ideally the 1080 needs to be compared with both 980 and 980ti; the 980 is showing the improvement over like-for-like (albeit the 1080 has 1 extra SM per GPC) and the 980ti that is the 'near' complete Maxwell 2 architecture and higher performance tier to beat.
Cheers

The original post was made before an update to the spreadsheet, though I didn't create an analysis column for 980 to 1080, specifically, the raw data for both is there. IIRC the difference between 980 and 1080 is about 60-70%.

Also, since this post was created I have in fact purchased a 1070 (MSI Gaming X model) and have been able to make a few observations thus far, though the only benchmarking I've performed has been numerous runs of Firestrike. My observations of this particular model:

Pros:
-fast (previous generation high-end performance for a good amount less money)
-low power consumption
-great stock cooling solution

Cons:
-low availability/overpriced compared to MSRP
-doesn't overclock very well since the stock boost is already very high

The key point here (for me) is the overclocking. I'm coming from a VBIOS-modded 980 Ti under water which hit a very good OC of 1552/8000. My 1070 only hits 2063MHz peak/2050MHz sustained while benchmarking, though the RAM OCs to a quite nice 9200MHz. Given this comparison the 1070 actually comes out slightly behind, which has left me mildly disappointed though I do hope that the Pascal VBIOS is eventually cracked so we can overcome these power limits and unlock Pascal's true potential. By my calculations I need around 2150-2200MHz to outscore my old 980 Ti. Keep in mind this is not the case for most people, as it was rare to see greater than about 1450MHz on GM200 without VBIOS mod and water.
 
Yeah agree the BIOS situation and the voltage is frustrating.
Just as a heads-up it looks like Asus has already released their XOC custom BIOS, but this is not designed for air cooled 1080s, and with its higher V I am not sure it is even really designed for the standard reference card.
Some info on it being used can be seen here:
http://forum.hwbot.org/showthread.php?t=159025
Someone applied the BIOS to their Gigabyte and said it was working OK, but not sure that is a good idea as it not guaranteed the manufacturers will use same components/ratings on their custom designs (in terms of the PWM controller and power stage,inductors).

Here is a bit from the post regarding its use on the Asus Strix
Here’s the original BIOS, NVFLASH and the XOC BIOS:
1080strixXOC.rar beim Filehorst - filehorst.de
Just drag and drop the XOC file on the prepared NVFLASH shortcut…

This BIOS has no powertarget and a fixed measured voltage around 1,24V. If you try this BIOS on air, you’ll notice a very fast increase temp-wise, so be careful.
I did some initial testings with the original BIOS and found that 1,24V is good enough for 2250 to 2300.
Also being paranoid I would want the source to be validated as a secure-safe one to use.
Cheers
 
I am interested to see GM vs GP on a same clock to clock basis, is there a test done?

My 980Ti can also reach 1450mhz in gaming, so that i figured would put me ahead of 1070.

The gains of GP is their surprisingly high boost clocks, has there been a jump as big from one GPU gen to the next?

Nvidia did claims Pascal = Maxwell + HBM + 16nm
 
I disagree. As I said, the *ratio* stays the same because we're using the same measurement in both instances. Also, CPU power consumption due to driver workload is unlikely to vary greatly between situations where you're already GPU-limited.

I know you have a lot of experience and probably have touched more GPUs than I ever will (though I do buy them like they're going out of style), so if you have data that indicates otherwise please do share.

Only anecdotal data apart from the impressions gathered during testing and intermittent glances on the watt meter plugged into the wall outlet, which amounts to only a rough guesstimate.

The anecdotal data:
The two different cards: MSI GTX 970 Gaming 4G and Asus R9-390X Strix DC3OC (both are OC-cards and do in fact have higher power draw than their respective reference models.
Isolated measurements under high gaming loads it's 71% more power for the 390X.
In a whole-PC-scenario this ratio gets higher with 77% more power for the PC equipped with R9 390X, while being ~27% faster in this particular test.

I am interested to see GM vs GP on a same clock to clock basis, is there a test done?

My 980Ti can also reach 1450mhz in gaming, so that i figured would put me ahead of 1070.

The gains of GP is their surprisingly high boost clocks, has there been a jump as big from one GPU gen to the next?

Nvidia did claims Pascal = Maxwell + HBM + 16nm
We tested an MSI Lightning at 1.430 MHz alongside GTX 1070/1080, if that's any help:
http://www.pcgameshardware.de/Nvidi.../GTX-1070-Benchmarks-Test-Preis-1196360/2/#a1
 
Glad to see my overclocked 980Ti still hanging at the top of 1440p!
Is ~1733mhz of the 1080, its average clocks during the tests, or it is actually boosting more?

1733mhz vs 1430mhz, the 1080 and 980Ti come closer in raw numbers.. i think.
At 1430, 980Ti has fillrate advantage but 1080 have gen3 ROP iirc, so that even things...

The wins of 1080 could then be down to architectural gains, or just certain games benefiting from the higher theoretical texturing/memory bandwidth/compute flops of a 1733mhz 1080?

Come to think of it, 980Ti was a 'swell' big gpu. I think Nvidia was quite nervous about Fury X then, so priced the 980Ti at 'just' $649 and even...throw in a free $60 game (new, big title)!

Nvidia knew of Fury X HBM + twice the shader cores of 290X, making it sound like a theoretical beast, they probably didnt know of its final clocks. If AMD could clock Fury X to 1200~1400mhz, then Fury X would have won...
 
Last edited:
Nvidia knew of Fury X HBM + twice the shader cores of 290X, making it sound like a theoretical beast, they probably didnt know of its final clocks. If AMD could clock Fury X to 1200~1400mhz, then Fury X would have won...

Fiji doesn't have twice the cores of Hawaii. It's actually just 45% more compute units and the same amount of ROPs. Not even the bandwidth is that much greater, since it's just 30% more than a 390X. One would think that color compression would make a larger difference than the raw bandwidth, but it actually didn't in GCN3.
 
6-Gen GeForce GTX x80 Comparison: GTX 470, 580, 680, 780, 980 & 1080


It7sIP7.png


VCrhYMj.png

http://www.hardwareunboxed.com/6-gen-geforce-gtx-x80-comparison-gtx-470-580-680-780-980-1080/
 
Back
Top