G80 vs R600 Part X: The Blunt & The Rich Feature

Ironic really as you keep arguing that R5xx is way oversized for its performance - so in your eyes ATI already has a track record for not designing solely for performance-per-transistor or mm2 or watt. R600 is just more of the same.
Do you know what ironic means? There's nothing ironic about what I said.

R600 takes it to a new level as it is far worse than R580 by any of those metrics. If G71 was 32 pipes instead of 24, it wouldn't beat R580 by anywhere near the margin that we're seeing with the 8800GTX vs. 2900XT. It's one thing to goof up when scaling parts down like with RV530, but a whole other thing to have the entire family be uncompetitive. If NVidia releases a half-G80 on 65nm, then ATI is going to look really bad.
 
It can easily be. I know for a fact that there are situations where R100 is faster than R200 due to a simple oversight. RV250 corrected it. Memory controller changes can have drastic consequences in utilization also. Many things can be chip related.
There's zero evidence that the chip has any such problems. I'll let you dig for evidence to the contrary - it's ridiculous to use this as the basis of any kind of argument.

First of all that's just a guess for NVIO. Logically you can't do 150M transistors worth of work in a separate chip like that. Secondly, even making a ridiculous assumption of a 830M G80 is not enough to justify the performance deficit of a 700M R600. Finally, that technology doesn't matter if it doesn't get used.
About as ridiculous as ignoring the yield-advantages that R6xx's fine-grained redundancy affords ATI?

Make no mistake NVIO is off-die because NVidia ran out of die budget - and not by a trivial amount. You're being ridiculous by asserting this is irrelevant.

Useless metric. Look at G73 vs. RV530, for example.
Hey? G73 has ~ twice the rates of RV530. For ~ same theoretical rates, compare G73 and RV560 - with AF/AA the latter is ~ 35% faster:

http://www.computerbase.de/artikel/...on_hd_2900_xt/32/#abschnitt_performancerating

7900GS (cut-down 7900GTX) is the only thing competitive with X1650XT (cut-down X1950Pro).

Jawed
 
You think the fact it's slower than R580 is because of the chip? :rolleyes:
Why not? it's possibility, we just don't know. Bugs in the hw can happen, you know.

You're forgetting supposedly 150M transistors of NVIO and I'm arguing there's "technology/architecture" that consumes transistors in R600 but not in G80 - the virtual memory/context-switching stuff for example. Like being able to hide the latency associated with register spillage into VRAM.
150M what?
Anyway.. who cares, right now, about the virtual memory/context switching? I don't and I program these things for a living. IF (another big if) AMD devoted a relevant part of the chip to this stuff that's their fault. If you wanna introduce new features at all costs and you want developers to use them just put it on the chip crippled so that they can at least experiment with that, if you have to giveaway a relevant amount of performance/die estate just for that, then don't do it.

TSMC's "half-nodes" have a long history of being somewhat broken/hard-to-use it seems.
Another bad choice?

Actually uses about half its bandwidth I expect (perhaps with exceptions, e.g. during streamout?) - as I said the bus width is a complete misdirection. X1600XT has the same problem.
Errare humanum est, perseverare diabolicum.

Eric said they were aiming for about 30% faster than R580, effectively for DX9. When they decided that, do you think they expected NVidia to aim for less than 100% faster than G71?
I don't know what nvidia was aiming for but try to answer to this question: what should AMD aimed for to reach G80 performance per watt level?
You aim at what you can hit, or a bit above the bar, anything more is going to be a disaster.

Per unit of texture and fill rate it's going great, when the drivers aren't screwing it over
drivers are the new scape goat?
For what it's worth, if you include "ease of creating performant drivers" and "just works when new games are released" then I think these are mighty demerits of R6xx's technology. CrossFire-dependent R670 and R700 both sound like shit and fan poised for a re-match :cry: :cry:
It's beyond me how AMD could got R600, their second generation unified shading architecture (as they call it) so wrong, when their first incarnation aka Xenos is such a little jewel.
edit: fixed some half sentences and typos
 
Jawed said:
Make no mistake NVIO is off-die because NVidia ran out of die budget - and not by a trivial amount. You're being ridiculous by asserting this is irrelevant.
And you'd be wrong. The decision to make NVIO a separate chip was done far far before the final die size for G80 was known.
 
Why not? it's possibility, we just don't know. Bugs in the hw can happen, you know.
Eric lays the "blame" squarely on drivers.

I don't know what nvidia was aiming for but try to answer to this question: what should AMD aimed for to reach G80 performance per watt level?
Hey, I dunno. Remember way way back when when ATI apparently asserted that "we need 90nm to go unified"? Mystified the pants off me.

As I said earlier in the thread, I think some of R6xx's architecture is aimed squarely at the CF single-board SKU (R670/R700). So, single GPU may not be the correct way to judge the performance of the architecture. A bit like trying to judge R5xx performance based on R520, when the architecture was really heading for a 3:1 ALU:TMU ratio.

You aim at what you can hit, or a bit above the bar, anything more is going to be a disaster.
Eric says that R600 is a ground-up D3D10 GPU.

Jawed
 
And you'd be wrong. The decision to make NVIO a separate chip was done far far before the final die size for G80 was known.
Thanks for the info.

What's the logic of doing so for G80 but none of the rest of the G8x GPUs?

Jawed
 
G80 won because of its big blunt tool approach, not because it's clever.

Is this beyond3d or twilightzone.com? What exactly is so clever about R600 anyway? Virtual memory and context-switching? As if anyone cares.... :p

I'll recognize R600's merits when it actually has some.
 
There's zero evidence that the chip has any such problems. I'll let you dig for evidence to the contrary - it's ridiculous to use this as the basis of any kind of argument.
Zero evidence? How about the fact that, as you pointed out, there are cases where R600 underperforms R580? And that these cases scale with resolution?

It is far more ridiculous for you to say that all performance problems must be driver related and it's impossible for the chip to have a design flaw than for me to say it's a possibility.

About as ridiculous as ignoring the yield-advantages that R6xx's fine-grained redundancy affords ATI?
None of us have any way of quantifying this, so there's not much to say. NVidia has a huge advantage in gross margins over ATI, so it's unlikely that the advantage is that big.

Hey? G73 has ~ twice the rates of RV530. For ~ same theoretical rates, compare G73 and RV560 - with AF/AA the latter is ~ 35% faster:
Geez, why can't you get this through your head? Who the hell cares about "equal theoretical rates" comparisons? ATI and NVidia sure as hell don't. With the same chunk of silicon and same memory BW, NVidia destroyed ATI in the low end last gen, so they got to sell their part at a higher price.

And please, stop posting your cherry-picked computerbase.de benchmarks. They gimp NVidia G7x cards by disabling filtering optimizations. ATI is affected too, but not nearly as much. If you don't like english websites, quote digit-life (translated from the Russian ixbt, AFAIK).

The fact is that the target market reads anandtech, firingsquad, tomshardware, etc., orders of magnitude more than that niche site, and they also keep their settings at factory default aside from the AA/AF slider, just like these sites do.
 
That is exaclty what I think G92 is. Although I dont understand how it's going to make them look bad. ATi is also bringing out a 2900pro, and 2900gt in the Fall.
A half G80 on 65nm is going to be the same size as RV630, give or take. It should have a higher clock too, so I think we might see near 8800GTS (and thus 2900XT) performance. BW would be lower, but still. It would absolutely clobber RV630, which is well below half the speed of the 2900XT.

However, NVidia would rather go for higher margins than blow ATI out of the water at the same pricepoint, so you're right in that it would probably go up against a 2900pro or 2900gt.
 
A half G80 on 65nm is going to be the same size as RV630, give or take. It should have a higher clock too, so I think we might see near 8800GTS (and thus 2900XT) performance. BW would be lower, but still. It would absolutely clobber RV630, which is well below half the speed of the 2900XT.

However, NVidia would rather go for higher margins than blow ATI out of the water at the same pricepoint, so you're right in that it would probably go up against a 2900pro or 2900gt.


I think G92 will hold it's own agaist a 8800GTS value wise. I would not say performance wise. After all... it's a "half" G80 with a 256 bit bus. :p (if to believe the rumours...) Remember, the 7900gt still had it's place when the 7900gs came in.

The 2900pro I think is the v7600. If that's the case, the only thing that is being cut out is 2 render back ends and slicing the bus in half. The 2900pro should give G92 a little run. :eek:

Sounds like the 2900GT will be pitted up agaisnt the 8600. If thats the case, thats not good for nvidia in the benchmarks. This the SKU that I think has half the shaders of the XT and has a theoretical power output of a 160w.


Although I'm sure ATi would have rather had RV670 pit up agaisnt G92, these should serve them fairly well untill that one arrives. I think the 55nm process is to blame for that one.
 
Last edited by a moderator:
It is far more ridiculous for you to say that all performance problems must be driver related and it's impossible for the chip to have a design flaw than for me to say it's a possibility.

You would also think that if these problems were simple driver problems then they would have been fixed by now.
 
And please, stop posting your cherry-picked computerbase.de benchmarks. They gimp NVidia G7x cards by disabling filtering optimizations.
Isn't this correct? G7x cards have significantly worse default AF quality, but if you want to compare apples to oranges...
 
There's zero evidence that the chip has any such problems. I'll let you dig for evidence to the contrary - it's ridiculous to use this as the basis of any kind of argument.

A few days ago, somebody posted a link to graphs that illustrate performance vs memory clock speed. There were ugly effects in there with negative correlation. That's often a sign of chaos theory at work and very difficult to design away.
The unusually high performance drops when switching on AA are suspicious also.
At lot of resources are fighting in parallel for the ALU's and the memory controllers. It's extremely easy to overlook secondary effects during design that can that cut a large slice of your theoretical performance (especially with decentralized bus architectures.)

Several times, ATI has praised their ability to tweak their arbiters for individual performance cases, which has been interpreted by many as a great advantage. A different point of view would be that this exposes a significant weakness in their architecture. If you've ever been involved in the design of anything arbitration related, you'll know that software driver guys hate to mess with this kind of knobs they don't understand and usually leave them at 1 standard setting. It also gives little hope to those who want to see a game optimized that's not part of the top-20 marketing line up.

About as ridiculous as ignoring the yield-advantages that R6xx's fine-grained redundancy affords ATI?
All we know is that they have 1 spare for every 16 ALUs. We've yet to see an example of other places with redundancy, unlike, say, an 8800GTS which has both cluster and (a first) fine grained MC redundancy. It's possible that R600 supports MC configurations of, say, 448 bits, but I doubt it. R600 doesn't seem to be able to use its full bandwidth anyway, so there's little reason not to improve yields this way.

As we discussed long time ago, it is nice to have extremely fine grained redundancy, but it's more important first to make sure that as much area of the chip is potentially redundant while still having a nice ratio of active vs disabled units. With the ALU's of R600 alone, the overall area part is not covered. And the R600 configuration with a 256-bits MC doesn't have a nice ratio.

With that in mind, I don't think there's a lot of regret at Nvidia about their redundancy strategy.
 
Last edited by a moderator:
What's the logic of doing so for G80 but none of the rest of the G8x GPUs?
Analog blocks are often ready long before the digital part is ready to tape out. It is very common in the semiconductor world to create separate test chips for the analog blocks before taping out the real chip.

If NVIO contains the display logic, a large part of it should be analog, which is notoriously hard to get right. As seen here, the initial G80 was an A02 revision, while the NVIO was A03. Decoupling into two chips makes it easier to still hit schedule or start debugging one before the other is available.

That doesn't answer why it was only done now and not for previous chips, but it may be at least part of it.

Tesla cards.

Tesla cards still have an NVIO. See here:
The basic unit of the current Tesla line, the Tesla C870, should be very familiar to anyone who's seen the GeForce 8800. It's essentially an 8800 GTX--a 575MHz core clock and 128 SPs at 1.35GHz--with 1.5GiB of GDDR3 RAM. Of course, it's not quite an 8800 GTX--there are no display outputs at all on the card, even though it has a new version of the NVIO chip.
 
Last edited by a moderator:
Jawed said:
G80 won because of its big blunt tool approach, not because it's clever.

This seems to me to be bass ackwards. The G80 design, if anything, seems elegant, a fine example of KISS principles in action.

I mean, what do you call an architecture with an overengineered amount of bandwidth and ALUs, that requires extremely complex drivers in order to match the power of a card with less raw bandwidth and math power?

At least you finally seem to be admitting that writing drivers for the R600 is more difficult. The last time I was involved in one of these threads, you were strongly arguing that there is no inherent advantage in the G80's scalar approach vis-a-vis driver compiler vs the R600.

To me, blunt power is when you load up on raw computation resources, in order to overcompensate for weak ability to maximize utilization over those resources. The g80 obtains high utilization rates, so it really seems absurd to call it 'a brute force' approach.

Complexity in an of itself neither makes a superior design, nor an elegant, less brute force one. Only complexity that is spent on 'saving' work and increasing effective utilization is evidence of a good design, and for me the value of the R600's design -- well, the jury is still out on that one. Likewise for so-called margin/yield advantages. nVidia has consistently shown good margin management in recent years, so unless we see evidence to the contrary, I wouldn't give any props to the R600 on this either.
 
Analog blocks are often ready long before the digital part is ready to tape out. It is very common in the semiconductor world to create separate test chips for the analog blocks before taping out the real chip.
That is definitely an interesting factor I hadn't considered, thanks for the insight :)
That doesn't answer why it was only done now and not for previous chips, but it may be at least part of it.
Well, previous chips rarely were about performance at all costs. There never were 400mm2+ GPUs (or CPUs, afaik!) before G80 and R600. And certainly, in the early days of 3D Graphics, the analogue was a separate chip - or even a separate product! Voodoo Graphics, anyone? This approach got lost for obvious cost reasons.

Tesla cards still have an NVIO. See here:
We were told that there might eventually be Teslas without NVIO though, iirc. But indeed, this doesn't seem like much of a priority for NVIDIA.
 
Back
Top