AMD: R9xx Speculation

I don't think so. Looking at ComputerBase, HD6970 is faster than HD5870 by:
16% at 1920×1200 / AA 4x
31% at 2560×1600 / AA 4x

die-size increased from 334mm² to 389mm² = 16%

And this is comparision of drivers which were polished for 1 year with fresh and evidently buggy driver. I think this launch is very similar to R520's launch...

Is that compared to the 2GB 5870 or the 1GB. If it is compared to the latter then it's not a fair comparison, especially at high resolution. A better comparison if there is a memory discrepancy then we need to look at 1650 or below for how much improvement ATi have got out of the chip itself and not the memory increase.
 
And this is comparision of drivers which were polished for 1 year with fresh and evidently buggy driver. I think this launch is very similar to R520's launch...
Not quite. R520 hd about a year of software bringup; Cayman, on the other hand, only taped out in May. The amount of exposure to this new architecture is a lot less due to these schedules.
 
Hardware.fr has comprehensive HD6970 versus HD5870-2GB results. Fantastic article.

Cayman is worse than I was expecting. ~15% better than HD5870. After the slide leaks I was thinking 30% better (25-50% range).
 
Hardware.fr has comprehensive HD6970 versus HD5870-2GB results. Fantastic article.

Cayman is worse than I was expecting. ~15% better than HD5870. After the slide leaks I was thinking 30% better (25-50% range).


Yep, some people expected 30-40% lead over GTX580... Where is it? ;( Only Barts is looking good with those 1120 SPs compared to Cypress' 1600 SPs. Or is it just confusion and increasing the number of shaders does not scale well? Really confusing. 6970 would be a great product if it's price was 200 USD. :(
 
Thanks for the tip. Yeah, in that light I think Cayman is pretty disappointing. In a like for like comparison 6970 outperforms 5870 2GB by around 10-15% all around. ATi need to try harder next time around, to catch up with GTX580 they need a 20% improvement all around sometimes more sometimes less. A linear increase in die size will take them to 480mm^2, but given the difficulty in getting the heat away it will end up bigger, and ATi don't have the same level of experience with bigger dice like Nvidia.

Like Rangers, I think the bigger die strategy is now paying dividends for Nvidia, they have a profitable business model despite much larger dice and ATi are playing catch up. Definitely a problem for ATi, hopefully AMD can bring home the bacon.

Edit: This is the key picture:

2nbyqfq.jpg
hsvqdc.jpg


You can see in these pictures that the 6970 is around 10% faster than the 2GB 5870. The more evidence that comes out the less impressive Cayman becomes. Looking at the core clock there is around a 5% boost coming from increases clocks and the memory bandwidth has has a big increase too. Cayman itself looks like it is just 5-7% better than Cypress for a 15% increase in die size.
 
Last edited by a moderator:
Two 6950s CFed for 600 bucks are looking pretty good in comparison with THE OUT OF STOCK 530 USD GTX 580. :) Maybe that is the sweet spot and people have to go in that direction. :)
 
GTX 580 gives only 70ish % of 2X 6950 performance, while it has 90ish % of their price. So I think it is a better choice to have CF system.

But for me it is so strange. It looks like AMD took some performance from the single cards and translated this into crossfire improved performance. More people would be happy if it's the opposite, the CF scailing remains the same, while single cards offer better performance.




Uploaded with ImageShack.us
 
Two 6950s CFed for 600 bucks are looking pretty good in comparison with THE OUT OF STOCK 530 USD GTX 580. :) Maybe that is the sweet spot and people have to go in that direction. :)

It's out of stock on the omelette but I have it shown in stock on another site for $499, which is 83% of dual 6950s.
 
Why no ALU hot-clock below two times base clock? S3 showed it with Chrome 400/500.
1.5 times base clock would a bit above previous ALU:TEX.

Considering the amount of shaders AMD uses, and the fact that heavy shader use is what's currently running into the powertune TDP limits, raising base clock of ALUs would just mean powertune kicks in more often.

That isn't necessarily bad if we look at an example below, but wouldn't be the most efficient use of resources.

I agree that it is more complex, but if you break it down to its bare elements then it is just an anti-Furmark switch.

Hardly, and it is definitely a lot more complex than previous power monitoring and throttling solutions on previous GPU solutions from both vendors.

Comparing onto GTX 580 right now as that's perhaps the worse possible type of power containment there is, app detection. If something like this is done you lower your clocks across the board thus everything is reduced in performance. Min, Avg, Max FPS would be affected.

Now with Powertune. What it allows AMD to do is allow for a higher base clock. without exceeding TDP requirements which are meant to prevent a card/chip from frying and destroying itself.

So, without Powertune, 6970 may have necessitated a lower base clock. Lets say Powertune when it kicks in reduces clocks to those levels. Now lets compare theoretical effects on games.

6970 w/PT but no throttling - higher clocks = higher min, higher average, higher max than 6970 without PT due to lower base clocks.

6970 w/PT but with throttling - same or lower clocks as 6970 without PT when throttling = higher min (same as above), higher average although not as high as above, variable max FPS could be higher, same, or lower than 6970 without PT.

In all cases, min FPS will be higher. Average FPS will be higher (unless during design they REALLY borked the chosen base clock speed). Max FPS may be higher, same, or lower.

Unless they boosted clocks so aggressively that the card is throttling the majority of the time, then you'll generally see an overall performance boost with powertune versus without powertune. Those exceptions generally being the outliers/power viruses/programs that stress only one singular part of a GPU in a pathological way. IE - generally not games.

In the very worse case scenario where the base clock was boosted so high that powertune was throttling 100% of the time you'd end up with a situation where min FPS = average FPS = max FPS which would likely be significantly faster than a non-powertune clocked GPU. But that wouldn't be a very efficient way to design a GPU. As well power consumption would be overall higher as now you're basically running the card at 100% power consumption 100% of the time (well, except when idle).

Of course, this is also going to depend on the granularity of throttling going on. If it's very course, then it may not be beneficial all the time ala Intels throttling in the past where clock speed would drop rather drastically rather than having a more gradual decrease to match increased load..

Powertune is by far the most exciting thing about Cayman, IMO with regards to increasing performance of video cards going forward as it'll allow for higher utilization of a card versus the same card with a base clock without powertune. This assuming it allows for a higher base clock as advertised.

The card itself may or may not be disappointing for a variety of reasons. But powertune is certainly something I'm looking foward to going forward.

Regards,
SB
 
Is that compared to the 2GB 5870 or the 1GB. If it is compared to the latter then it's not a fair comparison, especially at high resolution.
There's no performance difference between 1 and 2GB model of HD5870 at 2560×1600 / MSAA 4×. Check Hardware.fr review of GTX580, where both 1GB and 2GB models are tested at all resolution and MSAA levels. The only game, which benefits from the bigger memory at 2560×1600/4× is Crysis (17 FPS vs. 20 FPS). Situation changes with MSAA 8×, but I compared only MSAA 4× results.
 
Hardware.fr has comprehensive HD6970 versus HD5870-2GB results. Fantastic article.

Cayman is worse than I was expecting. ~15% better than HD5870. After the slide leaks I was thinking 30% better (25-50% range).
How far do you think better drivers could push this new cards?
I wonder because some results in game doesn't add up with either increased bandwidth or ALU power. There are even situations (rare) where the HD5870 pulls ahead of the hd69xx.

---------------------------------------
Overall I feel like AMD should have stuck to his previous naming scheme, Bart should be 67xx and Cayman 68xx especially as due to the delayed 28nm process they might not be able to refresh the HD57xx line. I didn't expect cayman to beat last Nvidia GPUs (even though some late rumours gave me unreasonable hopes) I hope that AMD would have stuck to its "sweet spot strategy" instead their GPUs inflated too greatly.
It's easier said than done but with 28nm being pushed back they should have given back some raw power and fix they arch (what they did to some extend with cayman).
As I see it Bart and cayman should have shared the same architecture (4 wide, improved RBE, 2 geometry engines, etc.) but tinier.

-Barts should have been something like 14/12 SIMD Arrays (tinier ones like their lower end parts 32ALUs 2 tex units) 16 RBEs, mostly the same size as HD5770 but clocked higher(900/800MHz) higher clocked memory. They should have been called hd67xx. <200mm²

-Caymans 14/12 SIMD arrays 32ROP 900/800MHz 2/1GB 900/800 ~250mm² (the same size, same clocks as actual barts faster memory speed). They should have been called hd68xx. < 300mm²
Actually within those die size AMD may have pack extra improvements.

It would have been clearer that AMD is not competing toe to toe with Nvidia on the high end. It should have been Antilles job to take the perf crown. Having to deal with tinier/cooler/less power hungry chips may also have helped AMD to launch the 6970 card when it's needed for maximum marketing impact (ie now before Christmas at the same time).
 
Last edited by a moderator:
Hmmm... it seems AMD really aimed low this time, it seems that if they could have upped shader count and have a bigger GPU (around 450mm^2 for a 1920 SP part?) they could have reached GTX580 performance and maybe with a lower power consumption. Now the new card consume less (with more RAM clocked higher) than a 570 but it is also priced higher while offering the same speed (or a little faster according to the games suite used). Maybe drivers will push it higher, but I don´t think we´ll see more than a 10% improvement. Possibly they also choose not to risk to go after a big GPU AND a new architecture all together.
Said so, Crossfire scaling is very good ad this means Antilles will be a monster card (in power consumption, too) and if sales will be not good AMD can always lower the price and give a better value (especially with a 1GB version of 6950).
 
There's no performance difference between 1 and 2GB model of HD5870 at 2560×1600 / MSAA 4×. Check Hardware.fr review of GTX580, where both 1GB and 2GB models are tested at all resolution and MSAA levels. The only game, which benefits from the bigger memory at 2560×1600/4× is Crysis (17 FPS vs. 20 FPS). Situation changes with MSAA 8×, but I compared only MSAA 4× results.
This is unfortunately untrue for metro 2033 as tested there where the HD5870 completely tanks (1 fps). Due to that the HD5870 seems to have quite a bit lower overall score for that resolution (you can also see this as evidenced by the fact the GTX 470 is closer at that resolution to the HD5870, whereas the difference typically grows with increasing resolution).
My guess is without that result, HD6970 would not be more than 20% faster as HD 5870 at 2560x1600 with 4xAA.
Still, with these benchmarks your point somewhat stands, it is of comparable efficiency to HD 5870 and not really worse. But Cypress wasn't terribly efficient neither, not in comparison to other Evergreen members (well, Juniper mostly) and certainly not compared to Barts. So I think people were expecting more something along Barts efficiency - granted it has more features (well mostly just DP - the increased geometry/tesselation throughput doesn't do much in most games but some apps which are included got a disproportional boost) but still with 50% die size increase over Barts something like 35-40% performance increase wouldn't have been unexpected. But just as with Cypress compared to Juniper, it apparently still doesn't scale well for whatever reason at the high end.

edit: of course, it does have a bit more features - the compute stuff (multiple kernels), and CSAA - err I meant EQAA (btw the perf hit seems to be larger than what nvidia has - maybe those way-too-many rops with their ultra-high z rate are good for something, after all). Still it does not seem like that should decrease overall efficiency so much.
 
Last edited by a moderator:
mczak: Yes, I agree. I expected, that scaling with the number of SIMDs will be solved in Cayman. It wasn't, but despite it I think, that the GPU has a bit more potential than we currently see in reviews (20-25% above Cypress at the average @1920×1200/AA 4x isn't out of reality, I believe).

Anyway, it has comparable or slightly better efficiency than Cypress, which is more efficient than GF104. That's still not bad. Competitor's product, which was developed primarily for gaming is still less efficient in games than product, which was developed with HPC in mind.

I'm quite curious, what will AMD release next year. They can't release any faster product with current (nonexistant) performance scaling with the number of SIMDs...
 
AMD GPU Computing

I think the 6950 seems like a very good deal, if games is all you care about.

Unfortunately for me, it isnt. I also want compute value in form of impressive speed gains in video editing like with Adobe Premiere Pro CS5, and right now only nVidia offers that.

This is what I'm talking about: http://ppbm5.com/Benchmark5.html

Now, 69XX is set to make gains into GPU Computing, but the big question is how soon and to what extend can we expect AMD to support this?

I have been told that the engineering effort of making CUDA support i CS5 is more or less a split effort between Adobe and nVidia, so I highly doubt just making some good OpenCL extensions is cutting the mustard.

I'm really looking forward to getting some good news from AMD in this front. How does hiring a big group of Del.Rev sounds to you? I'll take it! :cool:
 
Back
Top