AMD: R9xx Speculation

Even if it's a 25% boost for 2GB as in that review, it'd still trail 23 to 14-15

Anywho I wrote this piece as my theory for why all the reviews are all over the place. This is my own speculation based on the #'s. Cayman is really interesting the more that I delve into it:

Going off these 3dMark feature scores:

http://h-5.abload.de/img/69704mxh.jpg

The 6970 trounces the 5870 in everything, except Perlin Noise. Taking a look at Perlin Noise, which is a score reflecting shader power, we get:

5870: 175.42
6970: 146.02

Thus, 6970 is just 83.24% of the 5870's power in shaders. Now why is this significant?

As I wrote before, 5870 is VLIW-5 which is (w,x,y,z,t) where t is the transcendental unit. 6970 is now VLIW-4 which is (w,x,y,z) with 3 of the 4 shaders being used to calculate a transcendental.

Now, why the big gap in Perlin Noise if VLIW-4 Cayman has 384 SIMDs and Cypress has 320 SIMDs?

Well first, 1536 is 96% of the 1600 shaders that Cypress has assuming all shaders are firing. However, that doesn't account for the 83% gap. As mczak wrote, two possibilities:

a) The VLIW-4 compiler isn't doing as great a job yet, in which case drivers and optimizations may improve performance
b) 3dMark uses a lot of transcendentals, and hence Cayman isn't able to take advantage of the complex t-unit and is getting a performance decrease

So either way you look at it, the 6970 still has room for improvement with regards to 3dMark by improving the compiler and/or optimizing the 3dMark code for VLIW-4. Thus, at this time, 3dMark is not very indicative of actual in game 6970 performance.

(Besides, look at 5870 scores in Vantage at release and today... it's a whopping increase over a year of driver optimizations for a synthetic. Cayman should get even more seeing as how it is a different architecture)

So what's all this mean? Well, I've been saying it for some time now, but I can see in-game performance putting the 6970 ~GTX 580 levels and the 6950 ~GTX570 levels. The key is that the 69xx improvement over Cypress will range greatly - and hence your perspective of how good the card is may differ.

One of the key things from the release of the GTX 480 and now 580 is that in some games, Nvidia has a whopping lead, and in others, the 5870 barely trails or even takes the lead. That's because Fermi's architecture enabled it to take advantage of certain games far better than the 5870 (esp. in some DX11 tessellation), whereas in others, the 5870's pure shader and texturing advantages bring it close.

However, this creates a wide variation in performance figures - some say Cypress trails only 15%, others say it trails 30%, etc. from the 480. What will be interesting to see is how "stable" Cayman performs relative to the GTX 580/570 - in other words, in games where Cypress trails heavily, does Cayman get a considerable boost over Cypress showing that Cayman is truly different from Cypress? We've seen from the Stalker benchmark, Cayman does get a 33% boost over Cypress so it's quite possible.

IMO this is what we'll see:

6970 will be anywhere from 10 to 50% faster than the 5870 based upon the game - in heavily tessellated games the lead should increase, in games with heavy shaders and little tessellation the lead is probably lower. 6970 should be close though probably trail to the GTX 580 in quite a few games overall though, and show less variation in performance relative to the 580 than the 5870 did. Likewise, in situations were CF doesn't scale well and/or tessellation is heavier (where the 5870 is weak), the 6970 might be very close to the 5970 but in other games, it will trail heavily.

My own hypothesis: Games where Cayman barely improves on 5870 are games with low-to-none tessellation and are heavily shader bound. These will also be games where the 5870 is really really close to the 480/580. Games where Cayman pulls far ahead of 5870 are games with higher levels of tessellation and/or less shader bound. DX11 titles and titles using heavy DXCompute features as well.

That's my assessment of the situation, and why so many people were giving doom and gloom when certain benchmarks were showing Cayman barely edging Cypress (3dMark and some older games), whereas in other benchmarks (such as Unigine Heaven, ComputeMark, Stalker etc.), the 6970 seems to beat the 5870 handily (and often gets close to the 5970).

Stalker COP:
comparep.jpg

Metro2033:
http://www.hardwareluxx.de/community/15873506-post1363.html

What'll be interesting to see is what games the reviewers use to compare. If they use games that are heavily 5870 favored, Cayman might not look great - however, if they show games where the 5870 struggled and 480/580 excelled, its possible Cayman looks amazing. Of course, this will show who's biased to who...

TL;dr - 3dMark and other benches are optimized for VLIW-5 and VLIW-4 isn't optimized yet, and so 3dMark isnt representative of Cayman performance yet. Where Cayman will shine and probably pull ahead is in DX11 games where DirectCompute and Tessellation is necessary. Ultimately, the suite/games tested will determine whether the 6970 looks like a big improvement. What's most important though is comparing it to the 480/580/570 and seeing if it is a more consistent performer relative to those cards than the 5870.

Oh and Antilles will be a beast
 
At 1900 x 1200 the added memory didn't do anything notable but they weren't using AA.

At 2560 x 1600 the 1 GB cards failed.

So I guess for Metro 2033 we need a benchmark where we can eliminate the amount of memory being an issue.

Edit: The 5970 only has 1 GB per chip afaik.
 
3dMark and other benches are optimized for VLIW-5 and VLIW-4 isn't optimized yet, and so 3dMark isnt representative of Cayman performance yet. Where Cayman will shine and probably pull ahead is in DX11 games where DirectCompute and Tessellation is necessary.

and where is DC and tessellation running exactly?:rolleyes:
Last I saw cayman numbers, it was battling gtx480, winning some, losing some; so a lead in most games is proof the drivers have gotten better since then. Or in worst case scenario, the games amd used in the earlier posted slides are in which 6970 works quite well.
As for Antilles, I won't put much faith in CF if the single card drivers were barely better.
 
and where is DC and tessellation running exactly?:rolleyes:
Last I saw cayman numbers, it was battling gtx480, winning some, losing some; so a lead in most games is proof the drivers have gotten better since then. Or in worst case scenario, the games amd used in the earlier posted slides are in which 6970 works quite well.
As for Antilles, I won't put much faith in CF if the single card drivers were barely better.

Targeting your opponents last gen highest chip seems like a recipe to assure losing.
 
Well first, 1536 is 96% of the 1600 shaders that Cypress has assuming all shaders are firing.
Actually, if you take clocks into account, it's not just 96%, it's 100%.
For it to be 20% slower in perlin noise under the (optimistic...) assumption that utilization is 100% for VLIW-5, the test would need about 10% of the instructions to be transcendentals (or other t-slot exclusive instructions). (That is, if you have 50 instructions 5 of them transcendentals, you could ideally execute that in 10 VLIW-5 instructions - with VLIW-4 those 5 transcendentals translate to 15 slots hence 60 slots in total, so 15 VLIW-4 instructions, then taking into account (slight) clock difference and more simds.)
Of course, this is very simplified, depending how instructions can actually be bundled together.
 
Thanks for the info. Been a long time since I got to play with architecture stuff since I've been out of EE for a while :p

Actually, if you take clocks into account, it's not just 96%, it's 100%.
For it to be 20% slower in perlin noise under the (optimistic...) assumption that utilization is 100% for VLIW-5, the test would need about 10% of the instructions to be transcendentals (or other t-slot exclusive instructions). (That is, if you have 50 instructions 5 of them transcendentals, you could ideally execute that in 10 VLIW-5 instructions - with VLIW-4 those 5 transcendentals translate to 15 slots hence 60 slots in total, so 15 VLIW-4 instructions, then taking into account (slight) clock difference and more simds.)
Of course, this is very simplified, depending how instructions can actually be bundled together.

Good catch on clocks. I'd love to see the Perlin Noise figures after a few rounds of drivers/optimizations... would be interesting to see where the gap is coming from


Hey, Dave, on scale 1-10 how hard, stressful was this release? :)

I'd lvoe to see another one of those Anandtech behind the scenes stories from AMD. Those are the best features by far
 
Last edited by a moderator:

nice, though october slides and still TBD in place of shaders

clocks of various blocks?

and ati have their own CSAA now, how many AA modes does that make..:oops:

and faster double precision@1/4 the SP rate.:LOL:

Isn't that the point?

Your opponent is never going to sit still, presumably.

Edit: unless you're joking.

gtx480 was the top card, so the performance was being compared to it.
 
I think you should set the powertune as +20% or just disable the powertune(howto?) before test the perlin noise that is a fillrate-like test.
 
Back
Top