R6XX Performance Problems

I know that this info can be obtained if you read through the various threads on this board but I think it'd be beneficial if we had a thread that highlighted some of the architectural inefficiencies found in R6XX, just to get it all in one place.

As it stands I still don't fully understand why this card is doing poorly compared to the competition so this thread also serves to cement my understanding on the issue.

Anyways from my understanding the main issue with the architecture is the inadequate rate of texture sampling/filtering, which leads to degraded performance in many titles when compared with the competitions offerings (G80 & G92).

The AA resolve issues have also been brought up in various threads on this board (as well as in B3Ds architectural overview). Although my understanding of this issue is limited (if someone can clarify that would be wonderful) it seems the inefficient/broken implementation of hardware resolve is the main issue limiting AA performance on R6XX.

Others have also pointed out that the VLIW architecture implemented in the shader core could lead to inefficiencies if the complier and thread scheduler failed to extract the instruction level parallelism to properly feed the ALUs.

So I'd like to ask for some clarification on these issues as my understanding is still full of gaps. Are the issues outlined here the main factors contributing to the less than stellar performance seen on R600?
 
R600 was EOL'd and achieved practically zero market penetration. I fail to see the fascination with it. RV670 appears to be (mostly) more of the same, but is likely to achieve much higher market penetration thanks to its competitive pricing. Still won't hold a candle to GF8800 sales when all's said and done though, so why even bother?

Sorry, I'm just so down on AMD right now I just can't find much of anything positive to say about them or their products.
 
R600 was EOL'd and achieved practically zero market penetration. I fail to see the fascination with it. RV670 appears to be (mostly) more of the same, but is likely to achieve much higher market penetration thanks to its competitive pricing. Still won't hold a candle to GF8800 sales when all's said and done though, so why even bother?

Sorry, I'm just so down on AMD right now I just can't find much of anything positive to say about them or their products.

I think the HD3850 has a lot going for it at the moment. But thats pretty much the only AMD product that looks atractive right now in light of the competition IMO.
 
R600 was EOL'd and achieved practically zero market penetration. I fail to see the fascination with it. RV670 appears to be (mostly) more of the same, but is likely to achieve much higher market penetration thanks to its competitive pricing. Still won't hold a candle to GF8800 sales when all's said and done though, so why even bother?

Sorry, I'm just so down on AMD right now I just can't find much of anything positive to say about them or their products.

Haha yeah I understand that but it just seems like R600 should have done better than it did given it's exceptional paper specs. Remember when we were all amazed at the 512bit mem bus providing > 100GBs of BW and 320 stream processors and whatnot? I'm just trying to figure out what happened is all. And this thread is directly applicable to RV670 seeing as it's little more than a die shrink.
 
VLIW sucks compared to SIMD. Too low of an AA sample rate, too low texture filtering rate, and overall clocks just too low. Those are the failures of the R6xx architecture.
 
AA sample rate isn't too low. It's same as on R580 and adequate. Problem for older and a lot of todays games is performace drop (percentualy much higher compared to R580) caused by shader based resolve. G92 outputs ~twice as many multisamples per clock compared to RV670, but look at MSAA 8x results - RV670 is comparable or faster in many cases despite theoretically half MSAA rate.

Filtering rate - only for Int8. FP16 filtering is quite fast.
 
AA sample rate isn't too low. It's same as on R580 and adequate. Problem for older and a lot of todays games is performace drop (percentualy much higher compared to R580) caused by shader based resolve. G92 outputs ~twice as many multisamples per clock compared to RV670, but look at MSAA 8x results - RV670 is comparable or faster in many cases despite theoretically half MSAA rate.

Filtering rate - only for Int8. FP16 filtering is quite fast.

Too low by comparison. NV's design choices w/G80 are clearly superior in just about every regard (except for higher precision filtering, the lone win for R6xx).

The whole "performance drop because of shader AA resolve" has already been debunked in other threads on this forum.
 
Yeah I read in another thread that shader based resolve isn't the problem. The basic conclusion was that the software resolve wouldn't take a significant amount of time to compute therefor it shouldn't be affecting the AA performance. But the benchmarks would say otherwise... ah god this architecture is such a mystery (to me at least lol).
 
If AA sampling rate was an issue then adding more samples should make it worse, yet 8x AA seems not to be as much of an issue as it is on G80/G90 -

http://forum.beyond3d.com/showthread.php?t=45482

Or, if you actually read that thread, you'd notice that there is no performance drop for 3870 going from 4x AA to 8x AA. Now, unless AMD has added magic AA fairies to RV670, I don't think the performance results we're seeing there are valid (at least not in the context they've been presented).
 
I'm not so sure that its R6XX performance problem come from purely hardware but its may come from software becuase of 3D games nowaday based on nvidia GPU.
 
I'm not so sure that its R6XX performance problem come from purely hardware but its may come from software becuase of 3D games nowaday based on nvidia GPU.

There is some merit to this, but the counter-argument is that would be AMD's fault for being so late to market (compared to G80) and for failing to put as much effort into developer relations as Nvidia does.
 
Or, if you actually read that thread, you'd notice that there is no performance drop for 3870 going from 4x AA to 8x AA. Now, unless AMD has added magic AA fairies to RV670, I don't think the performance results we're seeing there are valid (at least not in the context they've been presented).

I don't see where you're seeing this, in every game tested on computerbase.de there's a drop from 4x to 8x. One of the posters in the thread got confused by the flashMX graph that normalizes RV670 to 100% upon mouse over. Check the link out again.
 
Freak'n Big Panda said:
Yeah I read in another thread that shader based resolve isn't the problem. The basic conclusion was that the software resolve wouldn't take a significant amount of time to compute therefor it shouldn't be affecting the AA performance.
I've started to think it isn't the amount of work to be done, but a problem of scheduling the work to be done efficiently. Would explain the hit with 2xAA in relation to the rest...
 
I don't see where you're seeing this, in every game tested on computerbase.de there's a drop from 4x to 8x. One of the posters in the thread got confused by the flashMX graph that normalizes RV670 to 100% upon mouse over. Check the link out again.

Ah, I see. Silly percent normalization... Give me FPS damnit!

Anyway, who expects to use 8x AA on a single 3870/8800 GT anyway? I know I don't and I'm planning on building a rig including 2x of one or the other.
 
Anyway, who expects to use 8x AA on a single 3870/8800 GT anyway?
Yes, you can CrossFire them and boost overall performance. Anyway, there are situations, where one HD3870 gives similar results to GTX/Ultra with AA 8x enabled. If you consider GTX/Ultra's AA 8x performace as usable (in these games), than HD3870 is usable, too.
 
VLIW sucks compared to SIMD. Too low of an AA sample rate, too low texture filtering rate, and overall clocks just too low. Those are the failures of the R6xx architecture.

The R600 is MIMD with four SIMD channels (every cluster can do its own independent instructions).
R600's Z-fill rate is too low.
 
I'll take a stab at it.

Insufficient texturing seems to be a consensus for one of the problems.

In the long run, it'll never be on par with the best of the G80+ series until they increase the speed of the shader arrays. Shouldn't be that big an effort, because it's just one unit on the chip repeated 64 times. (And I'm not talking adding more pipeline stages to make it happen, that's cheating and counter-productive. Take the Fast14 tech that was such a big deal a few years ago and apply it already so that it's the same except faster.)
 
VLIW sucks compared to SIMD. Too low of an AA sample rate, too low texture filtering rate, and overall clocks just too low. Those are the failures of the R6xx architecture.
Calling R600 VLIW and G80 SIMD is not very accurate, though I know what you're getting at. This is unlikely to be the problem of R6xx's speed deficit compared to G80, as ATI was able to pack a lot more ALU's than G80. Go to digit-life and you can find some ALU limited shader tests which show R600 with the expected advantage over GTS (i.e. no loss of speed due to "VLIW").

Texture filtering speed is the real culprit, as well as a design that is notably less powerful than R580 per transistor, whereas G80 probably matched G70 in this regard while stepping up functionality even more than R580->R600.
 
In contrast to some quick analysis some reviews provided on AA/AF performance based on synthetic benchmarks, it looks to me like R600/RV670 is definitely bottlenecked by not enough filtering capacity.
Unfortunately most reviews enable AA and AF together (well makes perfect sense but doesn't tell you if the performance hit mostly came from AA or AF), but there are some numbers for AF only (I've taken them from the computerbase.de review, which has AF only numbers for titles AA doesn't work).
Going from 1xAF to 16xAF at 1600x1200, a 8800GTS (640) loses:
Gothic 4% - Stalker 17% - Bioshock 1%
a 8800GT loses:
Gothic 4% - Stalker 14% - Bioshock 1%
a r600 loses:
Gothic 28% - Stalker 34% - Bioshock 5%
a rv670 (reference 3870) loses:
Gothic 32% - Stalker 33% - Bioshock 5%

That imho paints a pretty clear picture: the R600/RV670 is definitely suffering from a lack of texture filtering capacity sometimes, and I'd guess that if it would have, say, 24 tmus instead of 16, that would greatly help it be more competitive in those situations.
So, I don't think there's much of a "mystery" for these "performance problems", though I'd definitely like to see more benchmarks investigating this.
 
Back
Top