G80 vs R600 Part X: The Blunt & The Rich Feature

Jawed

Legend
MOD: This is an offspring of the R7xx thread, as it was starting to run significantly off-topic after a few pages... Please keep discussing this here.

---

ATI's "problem" is the desire to stuff way more technology into each GPU - some argue there's too much technology in there for what it needs to do today (apply that to R520, R580, R600).

NVidia is the one playing catch-up, technologically and has got a long way to go architecturally if the imbalances of G80 are any indication. The serial scalar ALU is the only point of merit in G80. G80 won because of its big blunt tool approach, not because it's clever.

Jawed
 
ATI's "problem" is the desire to stuff way more technology into each GPU - some argue there's too much technology in there for what it needs to do today (apply that to R520, R580, R600).

NVidia is the one playing catch-up, technologically and has got a long way to go architecturally if the imbalances of G80 are any indication. The serial scalar ALU is the only point of merit in G80. G80 won because of its big blunt tool approach, not because it's clever.

Jawed

Your position I can understand....but this is, umm, rather over the top?Now the G80 is a stupid POS and the R600 is the most technologically advanced thing in existance?How so?
 
If you call piling on TMUs and ROPs clever, technologically or architecturally, then I guess you have an argument.

Jawed
 
NVidia is the one playing catch-up, technologically and has got a long way to go architecturally if the imbalances of G80 are any indication. The serial scalar ALU is the only point of merit in G80. G80 won because of its big blunt tool approach, not because it's clever.
That may be right, but after all the net result is the final merit for a successful architecture, isn't it!
But for the sake of your point, let's take (and compare), for example, the huge Z/S rate in G80 and for R600 that would be HiZ and now added HiS testing buffers. For sure NV threw a considerable amount of fill-rate logic (read: transistors) in their ROPs, but that is also valid for ATi spending not less on the more "clever" approach, both for the same goal.
Well, I can say now the stencil shadowing in Q4 is quite fast on R600, but that didn't came out from the tin air, obviously. :D
 
G80 won because of its big blunt tool approach, not because it's clever.
Well, just like creating a highly sophisticated system can be clever, it can also be just as clever to create a simpler one that is actually as efficient, or even more efficient. While I do understand what you meant by this, I think there is some potential to read too much into it because of your choice of words... Of course, feel free to disagree.

If you call piling on TMUs and ROPs clever, technologically or architecturally, then I guess you have an argument.
I know you're critical of G80's ALU-TEX ratio (and I am fairly critical of G84's myself, as it is even more extreme) but I think it's probably fair to point out that many games that were used to review the G80 when it came out wouldn't have had astonishing performance gains with twice the number of ALUs. So part of the equation is 'What games do we optimize for?'

The other part of the equation that I like to insist upon is that for any given game (or frame) the ideal ratio in terms of perf/mm2 depends on what the cost of your units are. Imagine a game that had two phases, one limited entirely by the ROPs and the other limited entirely by the ALUs. And then imagine that two IHVs have the same performance per ROP and per ALU. But for IHV1, the ROP costs 10mm2 and the ALU costs 10mm2, while for IHV2, the ROP costs 7mm2 and the ALU costs 12mm2.

Based on this, it should be easy to see that the 'ideal ratio' for these two IHVs is different, even for the same scene. Put another way, if IHV1 had better perf/mm2, that doesn't mean that IHV2 would have had better perf/mm2 by picking IHV1's ratio if he didn't change the die size. In fact, it might just make things worse!

That isn't to say that NVIDIA's ratios are optimal. I would be very surprised if they were optimal for the games that were reviewed in G84 vs RV630 comparisons, for example. That doesn't mean they are as bad as some people think they are, however, because we just don't have the necessary information to conclusively say so.

As such, saying that the ratios are completely ridiculous is akin to saying that NVIDIA's architecture & competitive analysis groups are staffed by monkeys with typewriters. And if that was truly the case, then given how much they're most likely paid for that anyway, I'd have to seriously consider buying a monkey costume too! :)
 
ATI's "problem" is the desire to stuff way more technology into each GPU - some argue there's too much technology in there for what it needs to do today (apply that to R520, R580, R600).
If this was certainly true for R5xx, I'm not sure it still holds for R600.
R600, given its specs, it's simply an underperforming chip. We don't know why yet (or at least I didn't read or hear any plausible explanation) but it doesn't seem to me that R600 was bloated by so much extra 'unuseful' stuff that it can't reach its full potential.
NVidia is the one playing catch-up, technologically and has got a long way to go architecturally if the imbalances of G80 are any indication. The serial scalar ALU is the only point of merit in G80. G80 won because of its big blunt tool approach, not because it's clever.
Both GPUs have unique features that will likely crossover in the next few years.
 
Last edited:
If this was certainly true for R5xx, I'm not sure it still holds for R600.
R600, given its specs, it's simply an underperforming chip. We don't know why yet (or at least I didn't read or hear any plausible explanation) but it doesn't seem to me that R600 was bloated with by so much extra 'unuseful' stuff that it can't reach its full potential.
Both GPUs have unique features that will likely crossover in the next few years.

As far as I can tell the main issue is the current leakage in the chip process that stopped it reaching it's orginally rumoured 1 ghz core speed.

Additionally to the current lower than intended clock, a couple of design decisions (low numbers of TMUs and AA done in shaders) really hurts performance on current titles, even though there may be benefit for shader heavy games in the future.

And of course it's all exacerbated by the G80 being a very powerful chip that really hits current games (ie those that are going to be used for benchmarking) very hard indeed.

I'm hoping that R700 (or even the R650 refresh) doesn't run into this same old business of cutting edge processes simply not being ready for such complex chip designs and causing serious compromises to be made in performance and execution.
 
As far as I can tell the main issue is the current leakage in the chip process that stopped it reaching it's orginally rumoured 1 ghz core speed.
Uhm...
http://www.beyond3d.com/content/interviews/39/2
Eric Demers said:
We knew we were going to hit the power envelope with the higher leakage parts, and we could have done some more binning to create more SKUs. 742 MHz was in the range of our expectations, though we thought we would be a little higher (~800 MHz) initially. Due to simplifying the binning process and various design decisions, the 740 ended up as a good middle ground that gives a competitive part in this price range, and gives good yields.
 
I'm not sure if I got it. Is Eric Demers speaking about R600 or HD2900XT? I thought he meant HD2900XT ("...competitive part in this price range").
 
If you call piling on TMUs and ROPs clever, technologically or architecturally, then I guess you have an argument.

Jawed

Isn't that a rather simplistic way of looking at things?What's so incredibly out of this world advanced in R600?The fact that they screwed themselves(again) by focusing solely on math?That doesn't seem to pay off at all. The fact is that both IHVs made decisions, and nV seems to have gotten more of them right. Even in the light of the neverending "but ATi's gonna own in future games". Because that doesn't seem to be the case, given how the cards perform in the latest, DX10 aware titles(which is safe to assume are the heavy hitters in terms of shading). Will drivers change that?Probably so, but most likely too late for it to matter in the long-run.

I agree that in a wank-over-paper-specs way, the R600 is very mouthwatering...heck, I drooled over its specs as well. But IRL, it underperforms. It's like the really hot, stylist produced, designer clothed girl you pick up, only to find out she's against oral sex, anal sex, sex-games and that she only likes doing it missionary stile. Whilst the G80 is like the cheerleader with a really really short skirt and a flimsy top, that chews bubble-gum and bangs you wildly till you drop from exhaustion(btw, this is mildly exxagerated:D). What would you choose?
 

And if the process had been more advanced and less leaky (as we'll probably see by the end of the year...?

You don't deliberately make a leaky design, it ends up being that way because of the process limitations, and you have to bin/volt/clock around it.

Sure, you can say after the fact and late in the game you got what you "expected", but when you're six months late, leaky as hell, underperforming and trying to pitch all this as what you planned, rather than a bit of a disaster, there's not much else you can say.
 
Last edited by a moderator:
Based on this, it should be easy to see that the 'ideal ratio' for these two IHVs is different, even for the same scene. Put another way, if IHV1 had better perf/mm2, that doesn't mean that IHV2 would have had better perf/mm2 by picking IHV1's ratio if he didn't change the die size. In fact, it might just make things worse!
I agree with all this and the rest of the points in your post.

With R600, ATI aimed way too low in terms of performance margins over R5xx, bizarrely put in way too much bandwidth and then spent 6-8 months fiddling around trying to get it working. And looks like it's got another 6 months of fiddling before you can say, hand on heart, that most games will work better than on R5xx. All this to squeeze in stuff that I think is only barely getting a work-out.

As such, saying that the ratios are completely ridiculous is akin to saying that NVIDIA's architecture & competitive analysis groups are staffed by monkeys with typewriters. And if that was truly the case, then given how much they're most likely paid for that anyway, I'd have to seriously consider buying a monkey costume too! :)
I'm quite convinced that NVidia took a line of least resistance with G80 - concentrating on unification and the sequential-scalar ALU as well as revising TMUs for single-cycle fp16/trilinear/AF. I'm not saying they're trivial changes.

The way I see it, NVidia's got another heavy dose of architectural reform in order to get to D3D11. Virtual memory and multiple context support (with fine-grained switching support given to the CPU) seem to me to be aspects of R600 that are pretty much in place, but sketchy at best in G80. My pet theory is that ATI brought this stuff as far forward as possible because it's intrinsic to the R700-style multi-GPU thing - but hey...

Jawed
 
I disagree Jawed, this way of reasoning doesn't go anywhere, imho.
If when D3D10.2 or D3D11 GPUs are out and nvidia still outperforms AMD 'just' cause AMD introduced (again) architectural modifications to their GPUs to support D3D12 what are you going to say?
Thinking about the future is good, but forgetting about the present is bad.
 
R600, given its specs, it's simply an underperforming chip.
Huh? In games where the drivers are working properly it performs exactly as expected, or significantly better, when using R580 as a baseline (given that all its rates, except for Z-only with AA off, are identical per clock to R580). Eric said it himself, R600 was not targetted as 2x faster than R580. R580 doesn't normally exhaust its bandwidth - the GDDR4 version of R580 showed significantly less performance gain than the bandwidth increase justifies.

It's anything but an underperforming chip for its specs - this has long been my argument, that in terms of its theoreticals its way ahead of G80 in games. It's the drivers that are the problem. And the bandwidth is a complete misdirection, sadly...

Jawed
 
Huh? In games where the drivers are working properly it performs exactly as expected, or significantly better, when using R580 as a baseline (given that all its rates, except for Z-only with AA off, are identical per clock to R580). Eric said it himself, R600 was not targetted as 2x faster than R580. R580 doesn't normally exhaust its bandwidth - the GDDR4 version of R580 showed significantly less performance gain than the bandwidth increase justifies.
I don't know how you can just say that when R600 doesn't perform well is just because of drivers.
Said that I also don't know how to call a chip which more or less employs the same number of transistors of its direct competitor, a more advanced fab process, draws more power , uses a wider bus and it's slower then competition in a lot of tests (benchmarks and games) so that AMD is not even positioning it against its natural counterpart.
If you don't wanna call it undeperformer that's fine, we can call G80 overperformer than, in the end it really does not matter ;)

It's anything but an underperforming chip for its specs - this has long been my argument, that in terms of its theoreticals its way ahead of G80 in games. It's the drivers that are the problem. And the bandwidth is a complete misdirection, sadly...
In terms of its theoreticals its way ahead of g80 in games..what does this mean?
 
I disagree Jawed, this way of reasoning doesn't go anywhere, imho.
If when D3D10.2 or D3D11 GPUs are out and nvidia still outperforms AMD 'just' cause AMD introduced (again) architectural modifications to their GPUs to support D3D12 what are you going to say?
Thinking about the future is good, but forgetting about the present is bad.
I agree, I'm not defending ATI for aiming so low on texture and fill rates - seems they painted themselves into a corner with the other stuff they were trying to do. I'm merely talking about the different technology and architecture focus of each of them.

At the same time I find it mildly disturbing that the tech sites are so much on the bleeding edge that they are unwilling to apply hindsight. e.g. 7900GTX performance sucks against X1900XTX as newer games come out, generally, but this point is completely unexamined. e.g. 40% deficit:

http://www.computerbase.de/artikel/...on_hd_2900_xt/32/#abschnitt_performancerating

For some reason NVidia's architecture/technology/marketing gets a clean bill of health despite these clear deficits. I find the native-English-language webbies extremely dishonest, in general... So, no wonder that G80 is painted in a glowing light, when people aren't asking fundamental questions about the architectures.

Jawed
 
I don't know how you can just say that when R600 doesn't perform well is just because of drivers.
You think the fact it's slower than R580 is because of the chip? :rolleyes:

Said that I also don't know how to call a chip which more or less employs the same number of transistors of its direct competitor,
You're forgetting supposedly 150M transistors of NVIO and I'm arguing there's "technology/architecture" that consumes transistors in R600 but not in G80 - the virtual memory/context-switching stuff for example. Like being able to hide the latency associated with register spillage into VRAM.

a more advanced fab process,
TSMC's "half-nodes" have a long history of being somewhat broken/hard-to-use it seems.

draws more power , uses a wider bus
Actually uses about half its bandwidth I expect (perhaps with exceptions, e.g. during streamout?) - as I said the bus width is a complete misdirection. X1600XT has the same problem.

and it's slower then competition in a lot of tests (benchmarks and games) so that AMD is not even positioning it against its natural counterpart.
Eric said they were aiming for about 30% faster than R580, effectively for DX9. When they decided that, do you think they expected NVidia to aim for less than 100% faster than G71?

In terms of its theoreticals its way ahead of g80 in games..what does this mean?
Per unit of texture and fill rate it's going great, when the drivers aren't screwing it over.

For what it's worth, if you include "ease of creating performant drivers" and "just works when new games are released" then I think these are mighty demerits of R6xx's technology. CrossFire-dependent R670 and R700 both sound like shit and fan poised for a re-match :cry: :cry:

Jawed
 
Huh? In games where the drivers are working properly it performs exactly as expected, or significantly better, when using R580 as a baseline (given that all its rates, except for Z-only with AA off, are identical per clock to R580).
I think when nAo said "given its specs", he mean given the transistor count and memory bandwidth.

Those are pretty much the starting points for GPU design, and the specs/constraints that a design team works with. Their job is to make the best selling (which usually is equivalent to fastest at an IQ level that the consumers use) part given those specs. The other way to do it is to target a speed and use as few transistors as possible, but that's really the same thing.

"Not targetting 2x the speed" is either just a smokescreen/excuse for R600's performance or a sign of seriously misplaced priorities at ATI. They should be going for a part as fast as they can with their transistor budget. nAo is fully justified in calling it underperforming.
 
They should be going for a part as fast as they can with their transistor budget.
Ironic really as you keep arguing that R5xx is way oversized for its performance - so in your eyes ATI already has a track record for not designing solely for performance-per-transistor or mm2 or watt. R600 is just more of the same.

nAo is fully justified in calling it underperforming.
I agree it underperforms in the marketplace, as a $500+ part. I was shocked when Eric said where they were aiming. I'm still shocked they thought they needed a 512-bit bus. I'd really like to see any evidence that more than about 64GB/s is needed in a game for R600's rates...

Jawed
 
You think the fact it's slower than R580 is because of the chip? :rolleyes:
It can easily be. I know for a fact that there are situations where R100 is faster than R200 due to a simple oversight. RV250 corrected it. Memory controller changes can have drastic consequences in utilization also. Many things can be chip related.

You're forgetting supposedly 150M transistors of NVIO and I'm arguing there's "technology/architecture" that consumes transistors in R600 but not in G80 - the virtual memory/context-switching stuff for example. Like being able to hide the latency associated with register spillage into VRAM.
First of all that's just a guess for NVIO. Logically you can't do 150M transistors worth of work in a separate chip like that. Secondly, even making a ridiculous assumption of a 830M G80 is not enough to justify the performance deficit of a 700M R600. Finally, that technology doesn't matter if it doesn't get used.

Actually uses about half its bandwidth I expect (perhaps with exceptions, e.g. during streamout?) - as I said the bus width is a complete misdirection. X1600XT has the same problem.
I think he's talking about 512-bit vs. 384/320-bit, depending on which nVidia part we compare it to.

Per unit of texture and fill rate it's going great, when the drivers aren't screwing it over.
Useless metric. Look at G73 vs. RV530, for example.
 
Back
Top