R6XX Performance Problems

Seems to be a useful resource, although much more limited in terms of cards compared. I'll save the url, the more data, the better.
 
From Tech Report's 8800 review:


Seems that G71 is beating R580. I too thought it was the other way around, but this seems to say otherwise.
That's probably from the indoor parts of the game where Nvidia cards do fine. The outdoor sections are where they get slaughtered.

TomsHardware VGAcharts is a good resource for comparing videocards over different generations. The latest, 2007 version, obviously has the most recent games (7games + 3DM06). Whether this accurately describes what people actually play is a different question - for some reason the hardware sites don't want to test WoW and its peers. :) If you go back in time to the earlier charts you get a few more comparison points.

http://www23.tomshardware.com/graphics_2007.html

The 7900GTX is faster than the X1900XTX in five out of seven 2007 games, one of two wins for the X1900XTX incidentally being Oblivion.
Maybe at 1024x768, but at 1600x1200, the X1900 XTX wins in BF2142, DMOMAM, Doom 3, Oblivion and Prey, according to those benchmarks. That's 5 out of 7 games.

And none of those games were released in 2007!

Entropy: THW-chart results are often different from other reviews. Hardware.fr or computerbase.de still tests GF7 and X1 with many recent games:

http://www.computerbase.de/artikel/...hd_3870_rv670/28/#abschnitt_performancerating
Does anyone still benchmark the X1800 XT? I own one and would really like to see how it's still holding up.
 
TomsHardware VGAcharts is a good resource for comparing videocards over different generations. The latest, 2007 version, obviously has the most recent games (7games + 3DM06). Whether this accurately describes what people actually play is a different question - for some reason the hardware sites don't want to test WoW and its peers. :) If you go back in time to the earlier charts you get a few more comparison points.

http://www23.tomshardware.com/graphics_2007.html

The 7900GTX is faster than the X1900XTX in five out of seven 2007 games, one of two wins for the X1900XTX incidentally being Oblivion.

Huh? I make it 4 clear wins for the X1900XTX, 2 for the 7900GTX and 1 (Dark Messiah) that switches between the two depending on resolution.

But I would argue that the games Toms is using aren't a particulary good modern benchmarking suite anyway. I think the R580 would look a lot better if games like this were used:

Crysis
Lost PLanet
Bioshock
CR: Dirt
Call of Jureaz
Tombraider Legend
HL2: Ep2
Call of Duty 4
Unreal Tournament 3
etc....

Here's a benchmark comparing the two in Crysis:

http://www.amdzone.com/index.php/co...demo-dx9-gpu-and-cpu-core-scaling-performance
 
I said slower per-clock. R520 was around 20% faster than G70 w/AA at launch (in Direct3D), but was clocked 45% higher (430 MHz vs. 625 MHz). That's why Nvidia were able to quickly take back the performance crown with the 7800 GTX 512, and had ATI not introduced the R580, the X1800 would have to face off against the 7900 GTX (still only 650 MHz vs. 625 for the X1800).

It doesn't matter what the cards were clocked at. It's whether the cards were able to meet their design requirements. G70 had a completely different design philosophy than R520. Wide and slow compared to narrow and fast. Running within the design parameters of each chip, R520 turned out to be the faster design.

Yeah the 7800 GTX 512 was great for the approximately 1000 people that were actually able to buy one before Nvidia ran out of hand picked GPUs. And yeah it was great that people were spending close to and sometimes over 1k USD for it.

True 7900 GTX was generally better than R520, but then again R580 was definitely better than G70. Why are you comparing it to the past parts? both G71 and R580 were on the internal roadmaps before G70 and R520 even launched. It isn't like they were a direct response to R520 and G70 respectively.

Luckily for ATI. R580 was on time compared to the delays that R520 experienced.

As for R600. I just think that ATI massively underestimated how important texture loads would continue to be and how much texture loads would continue to increase relative to ALU loads.

Regards,
SB
 
It doesn't matter what the cards were clocked at. It's whether the cards were able to meet their design requirements. G70 had a completely different design philosophy than R520. Wide and slow compared to narrow and fast. Running within the design parameters of each chip, R520 turned out to be the faster design.
Yes, on a smaller process. When Nvidia migrated to 90 nm, they were able to match ATI's 'narrow and fast' architecture in terms of clock speed, while keeping their huge per-clock advantage.

Now yes R520 launched first, and should have arrived 6 months earlier anyway, but that's irrelevant, because my point was simply that R580 successfully closed a huge performance gap; and that the transistor count disparity between G71 and R580 was because ATI had to improve R520's per-clock performance significantly.

Yeah the 7800 GTX 512 was great for the approximately 1000 people that were actually able to buy one before Nvidia ran out of hand picked GPUs. And yeah it was great that people were spending close to and sometimes over 1k USD for it.
Nevertheless, it still overshadowed R520. And again, my point was simply that even with a 125 MHz clock deficit, Nvidia were still able to beat the R520.

True 7900 GTX was generally better than R520, but then again R580 was definitely better than G70. Why are you comparing it to the past parts?
To measure the amount of work ATI had to do catch up with Nvidia, without having a process advantage.

both G71 and R580 were on the internal roadmaps before G70 and R520 even launched. It isn't like they were a direct response to R520 and G70 respectively.
True, and R580 was meant to launch in 2006 according to Dave, but we're still going to view cards in light of their actual release timeframe, and in light of what the competition has to offer. Was RV670 a response to Nvidia's G92 (in the form of the 8800 GT)? No. But that's how we're going to view it.
 
...

To measure the amount of work ATI had to do catch up with Nvidia, without having a process advantage.

....
You really think the 580 was a work in progress to catch up to nvida? I would say the 580 was all ready designed and it just came out that way. Chips are designed years in advance.
 
True, and R580 was meant to launch in 2006 according to Dave,

R580 did launch in 2006. The end of January, to be precise. I should know, I bought an XTX the day before launch and a CF master card a couple months later.

but we're still going to view cards in light of their actual release timeframe, and in light of what the competition has to offer. Was RV670 a response to Nvidia's G92 (in the form of the 8800 GT)? No. But that's how we're going to view it.

While it's true that RV670 and G92 are direct competitors, it is not true that either is a reaction to the other from an engineering standpoint. Nv made a decision last-minute to increase the SP count (and, possibly the clocks) as a reaction to RV670's unexpected "almost full R600" specs, but G92 already had those units so it's not like NV redesigned it at the last minute. Enabling/disabling units for SKUs that share the same GPU is pretty common, and certainly not an engineering decision.
 
You really think the 580 was a work in progress to catch up to nvida? I would say the 580 was all ready designed and it just came out that way. Chips are designed years in advance.
No, I don't, but the performance difference is still there. Read: the amount of work, in hindsight, that they had to do/that had to be done. Given the R520 had double the number of transistors of the previous generation, for maybe 20 - 30% more performance per-clock, I think ATI knew they had to do a lot more to be competitive, longer term (and the R520 was originally a spring part anyway).

The transistor disparity between R580 and G71 was therefore because ATI had to use R520 as a base.

Perhaps I'm just not communicating myself very well.

R580 did launch in 2006. The end of January, to be precise. I should know, I bought an XTX the day before launch and a CF master card a couple months later.
Whoops. I meant 2005.

While it's true that RV670 and G92 are direct competitors, it is not true that either is a reaction to the other from an engineering standpoint. Nv made a decision last-minute to increase the SP count (and, possibly the clocks) as a reaction to RV670's unexpected "almost full R600" specs, but G92 already had those units so it's not like NV redesigned it at the last minute. Enabling/disabling units for SKUs that share the same GPU is pretty common, and certainly not an engineering decision.
I'm not disagreeing with any of that.
 
Last edited by a moderator:
I was on hiatus for nearly half a year, starting right before R600 launched, but from the meager amount I have read R600 does seem TMU poor and there appears to be some AA issues. Or am I missing anything else that is obvious?

One thing, looking back, is ATI seems to have misjudged the part, which can be seen by the bandwidth to memory. All indications are ATI expected R600 to require significant amounts of bandwidth--so what gives? Are there cases where it finds itself being very usefull? (Appears to be corner cases at best). Or is it truly "broke" in some ways, resulting in performance sub-par but retained the memory system for their original target?

I agree with previous posts that ATI seems to be designing for "games not yet, but coming". Part of this is devrel--self fulfilling prophecy even--and also equal parts the trouble of forcasting game development 36 months out as well as hitting a solid "good performance now, but with legs".

While I can appreciate a number of their moves (e.g. investing in a solid SM3.0 design) it just doesn't pay off. It takes 2-3 years to make software, sometimes longer to significantly alter the tool chain and add new effects. Lets take SM3.0. When NV released their SM3.0 part in Summer 2004 NV had to know it would be a solid 2-3 years before anyone really delivered a game that made notably non-checkbox use of the technology. And by then the NV40 class hardware would be slow anyhow--so it would encourage consumers to upgrade anyhow.

So NV nailed the SM3.0 check box with minimal investment, got the basic technology into developers hands for testing purposes (along with SLI to project performance), and now that SM3.0 features are becoming valid we are 2 generations later (7x00 and now 8x00) and tada, NV has performant SM3.0.

And while NV has been progressive, they don't seem to take their eye off of todays games. They don't sacrifice today's performance for tomorrows potential--because you are going to sell based on today's games. By the time tomorrows games are out you will have a refresh or better yet a new architecture.

Between these issues and a couple nasty delays with the R520 (which really took ATI from the first mover to catch up position) ATI has had a tough time. The RV670 seems to be a step in the right direction, so for competition sake I hope they have developed some better protocol, contingencies, and are re-examining their approach to the market.
 
Entropy: THW-chart results are often different from other reviews. Hardware.fr or computerbase.de still tests GF7 and X1 with many recent games:

http://www.computerbase.de/artikel/...hd_3870_rv670/28/#abschnitt_performancerating
Computerbase.de really gimps G7x by disabling all texturing optimizations (this affects R580 much less), sometimes lopping off more than half the framerate.

I understand that there's some shimmering with G7x, but the (relatively) minor quality improvement is not worth that kind of hit. People rarely play with optimizations disabled. Moreover, it's possible that ATI has spent more time making the driver run the HW optimally in HQ mode whereas NVidia doesn't care.

In other words, it's not representative of how people use their cards. Forcing cards to produce as identical output as possible is stupid. One should be optimizing the perf/IQ tradeoff seperately on each card and judge from there.
 
G70's AF was undersampled. Many driver versions caused heavy undersampling. So, AF 16x wasn't really AF 16x.

Lets imagine, that ATi would use only 2 per-pixel samples for MSAA 4x to compensate R600's slow MSAA performance. Would it be fair to compare performance results of this "MSAA 4x" to G80's true MSAA 4x performance?
 
G70's AF was undersampled. Many driver versions caused heavy undersampling. So, AF 16x wasn't really AF 16x.

Pffft. Never mind their AF. They didn't bother to trilinear filter worth a damn half the time. I wish I'd made some screens of what Oblivion, HL2 and KOTOR can look like on 6x00 and 7x00 cards. Run after those mip transitions! If you install Qarl's Texture Pack 3 on Oblivion and go up to snow country, get ready for a show!

NV's performance at their default settings becomes rather moot if you are annoyed by their shortcuts. On every ATI card since Radeon 9500, filtering has been very nice. Not so with NV 6x00 and 7x00. Sometimes they are ok, but other times things got obviously ugly especially if you had been on an ATI card previously and knew better.

I must say though that my 8800 has some pretty flawless filtering without any tweaking at all.
 
Last edited by a moderator:
G70's AF was undersampled. Many driver versions caused heavy undersampling. So, AF 16x wasn't really AF 16x.

Lets imagine, that ATi would use only 2 per-pixel samples for MSAA 4x to compensate R600's slow MSAA performance. Would it be fair to compare performance results of this "MSAA 4x" to G80's true MSAA 4x performance?

Depends what exact undersampling a driver does while filtering. Most shimmering artifacts originate from undersampling that start on the bi- or trilinear level, which you can also call a form of texture antialiasing.

Take a G7x and let it run with default optimisations enabled; detect shimmering artifacts and try to see the scene again yet this time with 4x supersampling enabled. In the majority of cases supersampling will only slightly reduce the shimmering or any other side-effects caused through underfiltering. Disable one or all optimisations and the shimmering goes away with or without supersampling.

The most likely scenario in my mind here is that the driver is skipping on basic filtering with AF enabled, rather than reducing AF sample amounts.

Disable the LOD clamp on a G80 and leave the trilinear optimisation enabled (also often called "brilinear") and you'll see that whenever a texture in a scene should have a negative LOD value shimmering is still visible (albeit not as much as on G7x). Here the major difference is that it's rather nonsense to not play with high quality on a G80.

Your last example doesn't make much sense either; AF algorithms are adaptive ie algorithms apply as many samples needed according to texture "steepness". The average amount of samples applied overall should nowadays still be under 4x samples by far. If you'd hypothetically reduce those very few surfaces (if there even are any) that truly need >8x samples the gain wouldn't be worth the trouble.

Unlike of course the gain between bilinear and trilinear. You could eventually say that trilinear is almost for free on G8x, yet not for all architectures where it's one clock for bilinear, two for trilinear. If you now take a scene with extremely high texture demands and compare a G7x and a G8x once with quality and once with high quality, the first would loose an about 3x times higher persentage.

In cliff notes: of course was it (up to) 16xAF with default settings on a G7x; the difference was that an application didn't receive trilinear wherever an application called for it and I'm not so sure that it wasn't also the case with more aggressive optimisations and bilinear.
 
Computerbase.de really gimps G7x by disabling all texturing optimizations (this affects R580 much less), sometimes lopping off more than half the framerate.

I understand that there's some shimmering with G7x, but the (relatively) minor quality improvement is not worth that kind of hit. People rarely play with optimizations disabled. Moreover, it's possible that ATI has spent more time making the driver run the HW optimally in HQ mode whereas NVidia doesn't care.

In other words, it's not representative of how people use their cards. Forcing cards to produce as identical output as possible is stupid. One should be optimizing the perf/IQ tradeoff seperately on each card and judge from there.

For the 2nd paragraph I don't think it's also valid for G8x HW. It's rather the opposite; the rather small overall performance hit between quality and high quality, makes it rather nonsensical to tolerate any side-effects.

I ran my past G7x overclocked constantly in order to compensate for the high quality/no optimisations performance drop and while I'm probably amongst the minority of users that are so finicky with stuff like that it annoys me like hell.

Finally since I had my humble little share in the past protesting against such side-effects I fear that G7x got way too much criticism for it's driver default settings. Radeons have and had also AF related optimisations enabled on default and there's no chance in hell that someone can convince me that no side-effects appear due to those either.

In the end when comparing G7x against R5x0 it is unfair to compare one side with optimisations disabled and the other with them enabled, based on the degree of visibility of side-effects. It's either all on or all off in my book. If there are IQ differences a reviewer can point them out. And before anyone protests that switching off AI entirely also disable other non-AF related optimisations, pardon me yet that's AMD's problem and could have been solved a long time ago if wanted. It's a very neat way to keep your userbase and the reviewing community from shutting off optimisations.

If today a reviewer would compare R6x0 against G8x with AF optimisations disabled, I'm afraid it would put the first into a far worse light when it comes to pure AF performance. IMHLO there should be no optimisations at all enabled at default and I've had that standpoint for years now. If the user wants to have dancing meanders over his screen and goes way down to point sampling I couldn't care less; but when an application calls for trilinear it should receive trilinear. There are no but-buts for that one.
 
I ran my past G7x overclocked constantly in order to compensate for the high quality/no optimisations performance drop and while I'm probably amongst the minority of users that are so finicky with stuff like that it annoys me like hell.
According to the computerbase.de numbers, overclocking would barely make a dent in the difference. They disable all optimizations, and framerate is often half that of other sites. Is that really worth it to you?

Finally since I had my humble little share in the past protesting against such side-effects I fear that G7x got way too much criticism for it's driver default settings. Radeons have and had also AF related optimisations enabled on default and there's no chance in hell that someone can convince me that no side-effects appear due to those either.
If the image quality difference was really that big, you'd see ATI run its parts with HQ filtering by default since R580 doesn't take as much of a perf hit as G71. Then reviewers would notice right away. However, they went the same route as NVidia since they obviously felt neither reviewers nor users in general would appreciate the IQ difference.

I think most reviewers are idiots when it comes to IQ for one reason: Nearly all test at different resolutions without AF as a baseline. Any time AF gives you a big perf hit, it means you're missing out on a lot of detail, so running at a high resolution is useless. You could pick random people off the streets and in a side-by-side comparison they would choose one step lower resolution with AF enabled. I even tolerated bilinear AF on R100/R200 because the difference in detail is so staggering.

Anyway, regardless of this incompetence, if you want to evaluate architectural decisions, you have to look at how reviewers test the cards. NVidia saw their good filtering and AF IQ go almost unnoticed with GF3 (though even considering IQ the AF hit was insane), so they went the other way entirely with NV3x and had really crappy default settings and were burned for it in reviews. With NV4x/G7x they found the sweet spot.

I'm not sure why they upped the default texture IQ for G80, but maybe it was to emphasize the IQ improvement (angle independent AF and 16xCSAA being the other big features). The decoupled texture units would also drastically reduce the hit.

In the end when comparing G7x against R5x0 it is unfair to compare one side with optimisations disabled and the other with them enabled, based on the degree of visibility of side-effects. It's either all on or all off in my book.
Well, computerbase.de is fair in that sense. I'm just saying that 50% perf hits are not tolerated lightly by video card buyers. They're being disingenous about their testing methodology by not showing how fast the cards are with optimizations enabled.

From your experience with G7x, how would you subjectively rate the IQ improvement of disabling all filtering optimizations? Is it:
A) As important as going from noAA to 4xAA?
B) As important as going from noAF to 16xAF?

I can't imagine that you'd say yes to either of those, and they usually entail a lower perf hit. You look at a site like HardOCP that tries to choose the best resolution, AF, and AA settings for a given framerate, and they skimp on the latter two for to get rather small improvements in framerate. You can bet your ass that if they included filtering optimizations as part of the parameter space, they'd choose to keep them enabled all the time.
If today a reviewer would compare R6x0 against G8x with AF optimisations disabled, I'm afraid it would put the first into a far worse light when it comes to pure AF performance. IMHLO there should be no optimisations at all enabled at default and I've had that standpoint for years now. If the user wants to have dancing meanders over his screen and goes way down to point sampling I couldn't care less; but when an application calls for trilinear it should receive trilinear. There are no but-buts for that one.
Point sampling is glaringly obvious, and accordingly no hardware runs faster with filtering disabled for this reason. Other things are not so obvious. You'll have "dancing meanders" many places anyway due to shader aliasing and inadequate AA, so a few more due to optimized filtering isn't a deal-breaker to me. I think "brilinear" is a great optimization as long as it isn't taken too far.
 
Joshua Luna said:
One thing, looking back, is ATI seems to have misjudged the part, which can be seen by the bandwidth to memory. All indications are ATI expected R600 to require significant amounts of bandwidth--so what gives? Are there cases where it finds itself being very usefull? (Appears to be corner cases at best). Or is it truly "broke" in some ways, resulting in performance sub-par but retained the memory system for their original target?

maybe the bandwidth had some use in GPGPU applications? Regardless it doesn't really make much sense to me either. Anyone care to speculate?
 
According to the computerbase.de numbers, overclocking would barely make a dent in the difference. They disable all optimizations, and framerate is often half that of other sites. Is that really worth it to you?

It wasn't nearly half the framerate for the games I played back then; I had even written a minor write-up about it and the difference was slightly above the oc gain.

If the image quality difference was really that big, you'd see ATI run its parts with HQ filtering by default since R580 doesn't take as much of a perf hit as G71. Then reviewers would notice right away. However, they went the same route as NVidia since they obviously felt neither reviewers nor users in general would appreciate the IQ difference.

Albeit irrelevant I recall ATI having started many years ago that optimisation circus and no I'm in no way against it per se, as long as one can get rid of those. There are way too many applications out there where any developer had the funky idea to use for textures X an idiotic negative LOD. You get shimmering there even without optimisations; enable optimisations and all hell can break lose. Battlefield games were nice examples and it wasn't just a tiny spot somewhere; it was a shimmering fest across the screen.

When G70 launched there must have been some switch broken in the driver and enabling high quality didn't disable any optimisation with the first drivers. NV was quick enough to deliver the 78.03 beta driver that fixed the problem. The reaction back then from the userbase about the shimmering was quite sizeable and a heated debate about it could be found on almost any public forum.

I'm not talking about angle-invariance here, since that's the major gain of enabling HQ on R5x0. Even enabling high quality there means that it turns on the less angle-dependant thingy yet leaves optimisations enabled. In order to disable the latter one needs to switch AI off and it unfortunately disables (or used to disable) other non AF related optimisations also.


Anyway, regardless of this incompetence, if you want to evaluate architectural decisions, you have to look at how reviewers test the cards. NVidia saw their good filtering and AF IQ go almost unnoticed with GF3 (though even considering IQ the AF hit was insane), so they went the other way entirely with NV3x and had really crappy default settings and were burned for it in reviews. With NV4x/G7x they found the sweet spot.

I don't think that much has changed in terms of AF related optimisations between from NV3x to G7x. IMHLO they merely saved on transistors on NV4x/G7x with the way larger angle-invariance, something that bounced back to NV3x levels with G80 again.

I'm not sure why they upped the default texture IQ for G80, but maybe it was to emphasize the IQ improvement (angle independent AF and 16xCSAA being the other big features). The decoupled texture units would also drastically reduce the hit.

NV4x/G7x had more than just the trilinear optimisation enabled, whereby G8x has only that one enabled by default (at least in terms to what's transparent to the user). It's times harder to detect in a real game a weird angle where less angle dependency makes a huge difference than side effects caused due to underfiltering. Angle dependency wasn't my problem on G7x, the too agreesive AF optimisations in the driver were.


Well, computerbase.de is fair in that sense. I'm just saying that 50% perf hits are not tolerated lightly by video card buyers. They're being disingenous about their testing methodology by not showing how fast the cards are with optimizations enabled.

Aspects like that should be shown in such cases, but I'd prefer all cards to be tested with optimisation on and off with the right footnotes underneath what the reader should keep in mind.

From your experience with G7x, how would you subjectively rate the IQ improvement of disabling all filtering optimizations? Is it:
A) As important as going from noAA to 4xAA?
B) As important as going from noAF to 16xAF?

That's a weird dilemma; aggressive optimisations lead to underfiltering which as noted above can skip severely on texture antialiasing. It doesn't do me any good to apply a crapload of MSAA samples in a scene if my textures (which still capture the majority of any given scene) shimmer around happily.

I'd rather have 4x good AF samples than 16x crappy AF samples; just as much as folks used to use extensively 2xRGMS instead of 4xOGMS around the GF3 timeframe.

If in a case where severe underfiltering is present, if I say that I'm using AA it would build a nice oxymoron since it's limited to polygon edges/intersections. Given the amount of data that polygon interior data captures, proper texture AA is far more important to me than any MSAA.

If you would had captured a screenshot from the first Battlefield game, the noise was so apparent that even screenshots looked horrendous.

I can't imagine that you'd say yes to either of those, and they usually entail a lower perf hit. You look at a site like HardOCP that tries to choose the best resolution, AF, and AA settings for a given framerate, and they skimp on the latter two for to get rather small improvements in framerate. You can bet your ass that if they included filtering optimizations as part of the parameter space, they'd choose to keep them enabled all the time.

Imagine you could pick in driver panel X between a losless MSAA and a quite lossy MSAA algorithm; the latter ends up quite a bit faster. Now which one would you chose, given that the latter would fail in a healthy persentage to antialias poly edges for instance.

Now would the same apply for reviewers/users? Or else why wasn't stuff like Quincunx or ATI's new custom filters received generally with enthusiasm?

Point sampling is glaringly obvious, and accordingly no hardware runs faster with filtering disabled for this reason. Other things are not so obvious. You'll have "dancing meanders" many places anyway due to shader aliasing and inadequate AA, so a few more due to optimized filtering isn't a deal-breaker to me. I think "brilinear" is a great optimization as long as it isn't taken too far.

Well you could have skipped the obvious exaggeration. Shader aliasing and/or crappy content (such as negative LOD on textures) could be catered for by developers themselves; the first needs way more resources and I could find a good excuse for the absence of it. For the latter though I cannot find a single viable excuse.
 
Back
Top