AMD: R9xx Speculation

Comparing to HD6870, same clocks: +37% more shaders, up to +20% for 4D, +XX% for improved geometry and tess , -YY% for unoptimized drivers = .... I guess it can match the GTX580, but it's going to loose in tesselation benchamrks, thus it won't be fastest GPU on planet.
I hope that was sarcastic, cause you could pick flop benchmark and Fermi wouldnt be the fastest GPU on the planet .. :p
 
that is incorrect :
http://www.computerbase.de/artikel/...te-2/22/#abschnitt_performancerating_mit_aaaf

More like :
10-15% in the majority of cases
5% in some cases
20% in games with tessellation

Why don't we take real overall numbers instead of talking about "cases"

1680x1050: 12% Faster
1920x1200: 9% Faster
2560x1600: 3% Faster

4xaa/16af
1680x1050: 15% Faster
1920x1200: 14% Faster
2560x1600: 20% Faster

8xaa/16af
1680x1050: 11% Faster
1920x1200: 8% Faster

Yup , I agree , however add to that , less AA hit .

So Evergreen loses performance at 4xaa but gains performance at 8xaa.
 
Last edited by a moderator:
AMD doesn't have to worry about 580 if Nvidia can only sqeeze out a few cards at a higher price if AMD can churn out a lot of Cayman at a lower price.

Personally I think Cayman will be faster, smaller, cheaper, cooler. AMD was targeting a full Fermi last year as competition, so if 580 is finally what should have been delivered last year, AMD will have no problems dealing with it.

Do you by that think of the 6990? Because its the only card by specs that is gonne be faster if rumors are to be belived. Ati now Amd has been playing catch up for some years now, and with a dual cayman on one board i have a hard time to belive that a singel card will be a monster ....
 
I expect 6970 to be slower than G110 but 6990 to be faster, just like the current generation.

How much 6970 is slower though is another matter. They might even end up even with AMD winning some and Nvidia the others.
 
I would think it depends on the price. Also, some people want the fastest card and if 6990 is faster then people will buy it.
Everyone wants the fastest card, but I'm under no illusions with how they're gonna be pricing that dual-GPU monster. :p

I don't think they'll have a problem filling the demand, I just really don't see the demand as being overwhelmingly high.
 
Since I said I'd do it, here comes an update on the HD6870 and undervolting/clocking.
I've only had the card for two days in spite of ordering it on launch day, so I haven't had the opportunity to gather a lot of datapoints.
The ASUS ships with software that allows setting GPU and memory clocks and GPU voltage. That's it (the fan settings seem broken), but that GPU voltage is critical - it allows dropping GPU voltage to 0.95V. It would have been nice to be able to adjust memory voltage as well - AMD apparently drives it at 1.6V up from nominal 1.5V, but no such adjustment is provided.

Default GPU clock (900MHz) is, on my sample, reliable from 1.05V, thus the default 1.175V provides, for my card, quite a large safety margin. Dropping frequencies, I've been able to determine that the system is definitely stable at 1.00V at 850(+)MHz, yielding a GPU power draw (850/900)*(1.00/1.175)^2=0.68 or 68% of the standard settings. 0.97V at 800(+)MHz yielding a GPU power draw (800/900)*(0.97/1.175)=0.60 or 60% of the standard settings have also proven perfectly reliable. (+) indicates that I simply haven't had the occasion to move upwards from these settings to see just where instability occurs, just that it occurs within the next 50MHz.

Conclusions: yes, the HD6870 is pushed sufficiently up the voltage/clock function that dropping frequencies slightly allows significantly lower voltages and power draw. This translates directly into lower fan noise, but the bad news is that the cooling solution doesn't provide much margins for the card - at default settings you get both rather high noise levels, and rather high temperatures. Dropping power draw lowers both noise and temperatures significantly, but the noise is still higher than I'd like (by far the noisiest component of my system), and there is no way I can adjust the fan profile.

Actually, GPU-Z and RBE seems to allow extracting the card BIOS and adjusting the fan settings (but not voltages), however there is no utility that I know of that can yet perform the actual flashing of the card. Kudos to both W1zzard and BAGZZlash for their community service with GPU-Z and RBE.
 
By the way, why is it that AMD states much lower idle power for Barts than Cypress, but most reviews I've seen show very little difference, if any?
 
By the way, why is it that AMD states much lower idle power for Barts than Cypress, but most reviews I've seen show very little difference, if any?

Which reviews are you referring to? If you look at total system power there's not a huge amount of difference, but for the GPU alone its pretty significant. Taking 5watts off of the 15 cypress was using is quite a bit.
 
Which reviews are you referring to? If you look at total system power there's not a huge amount of difference, but for the GPU alone its pretty significant. Taking 5watts off of the 15 cypress was using is quite a bit.
In the reviews I've seen which measure power draw directly idle voltage isn't really much lower neither: http://ht4u.net/reviews/2010/powercolor_pcs_plus_6850/index12.php
I guess at least part of that is due to not lowering (and using more than the speced 1.5V) memory voltage (which is the same for Cypress/Barts). Also, idle gpu voltage seems to be mostly the same as Cypress, and the chip is only about 20% less complex so I guess this isn't really a big surprise. Maybe the possibly cheaper VRM doesn't help neither.
 
In the reviews I've seen which measure power draw directly idle voltage isn't really much lower neither: http://ht4u.net/reviews/2010/powercolor_pcs_plus_6850/index12.php
I guess at least part of that is due to not lowering (and using more than the speced 1.5V) memory voltage (which is the same for Cypress/Barts). Also, idle gpu voltage seems to be mostly the same as Cypress, and the chip is only about 20% less complex so I guess this isn't really a big surprise. Maybe the possibly cheaper VRM doesn't help neither.

5 watts at HardOCP

8 watts at bit-tech

5 watts or so at xbit

Seems like its more than nothing to me.
 
Does this type of connection between organization of shaders AND performance still stand? Because Charlie, which is pretty close to ATI/AMD, said:



What I mean is: Cypress has a 4+1 configuration. Cayman seems to have just 4 "medium" shaders. Before you had 320 "powered" shaders (the "1" in the sum). Now you dont have powered shaders, but pretty much equal ones. The way I understand this, is that your calculations for Cayman dont work anymore, because even IF there are 64 more "medium" shaders, those 384 performance is EQUAL to 320 "powered" shaders, except on the more complex operations, where they are faster. In the end, shader power would stay, in average, the same.

What the above actually meant was that if you have 320 VLIW-5 shaders, take away the T units while beef the remaining 4 up for a VLIW-4 config, it is still 98% as fast in general gaming. So that's 1600 VLIW-5 vs. 1280 VLIW-4 and both are still roughly on par.

Can you see how 1536 VLIW-4 will be faster now?
 
So Evergreen loses performance at 4xaa but gains performance at 8xaa.
Yup , that seems to be the case , AMD 8XAA is still a little better efficient than NVIDIA .
What the above actually meant was that if you have 320 VLIW-5 shaders, take away the T units while beef the remaining 4 up for a VLIW-4 config, it is still 98% as fast in general gaming. So that's 1600 VLIW-5 vs. 1280 VLIW-4 and both are still roughly on par.

Can you see how 1536 VLIW-4 will be faster now?
:D

That is an oversimplification , ALUs don't get faster by being beefed up , they are not buffaloes for god's sake , ALUs have been stuck processing a single instruction every clock for many years , (or 2 if you count FMA) , so unless you do something about that , they are not getting faster .

However , ALUs can be beefed up to allow for higher clocks , or to support new form of instructions or enable wider instruction words (ie 24 bit to 32 bit) ..etc .

The increase in performance that results from going VLIW-5 to VLIW-4 is coming from the higher utilization rate , however that advantage could very well go to waste , and become neutralized by the lack of T unit and the ability to perform special functions .. unless all four ALUs become capable of that of course , which is a waste of die area and resoucres .
 
Both of those are at the wall measurements which I don't really trust.
5 watts or so at xbit
This one is more trustworthy. This indeed shows Barts is quite a bit better (too bad only the HD6870 had meaningful figures).

I've said it before, I'll say it again - ASIC's have variable levels of leakage and you cannot take an absolute power differential when there is just a sample of one (of each).
Well, in total there are 7 HD6850/HD6870 cards there, ranging from 17.6 to 20.6W, so that imho draws a pretty clear picture for those cards, they vary a bit indeed but all within 15% (though I don't know why the one xbit measured had so much lower power draw). You are right there's only one HD5870 and one HD5850 in there as reference, but you can easily find more data points: http://ht4u.net/reviews/2010/his_hd5870_icooler_v_turbo/index12.php. So that would give an average of about 19W for Barts based cards and 21W for HD5870 (if you pick a review of a HD5850 you get more data points for these cards as well, though it looks the data is skewed there cause some manufacturers managed to screw up power management, in particular voltage at idle).
 
Back
Top