G80 vs R600 Part X: The Blunt & The Rich Feature

How many trannies on the NVIO are doing real work, vs just filler die space?
I wrote back in December that "a large part" of NVIO is just filler to get the pad count needed. We knew the percentage at the time but didn't quote it, but I can't seem to find that in a quick check of my notes. I'm confident it's more than 50% though.
 
Zero evidence? How about the fact that, as you pointed out, there are cases where R600 underperforms R580? And that these cases scale with resolution?

It is far more ridiculous for you to say that all performance problems must be driver related and it's impossible for the chip to have a design flaw than for me to say it's a possibility.
I'm sure Eric appreciates being called a liar. The ATI guys have been honest about the problems they've encountered these past few years.

None of us have any way of quantifying this, so there's not much to say.
No but you and plenty of others keep talking in these terms without having any facts.

Geez, why can't you get this through your head? Who the hell cares about "equal theoretical rates" comparisons?
I'm interested in architecture, technology, scalability, futures. Apparently you're not.

ATI and NVidia sure as hell don't. With the same chunk of silicon and same memory BW, NVidia destroyed ATI in the low end last gen, so they got to sell their part at a higher price.
Whoops there you go again, with zero facts.

And please, stop posting your cherry-picked computerbase.de benchmarks. They gimp NVidia G7x cards by disabling filtering optimizations. ATI is affected too, but not nearly as much. If you don't like english websites, quote digit-life (translated from the Russian ixbt, AFAIK).
They're not cherry picked. They're about the only benchmarks that cover a lot of recent games for a decent range of cards.

The fact is that the target market reads anandtech, firingsquad, tomshardware, etc., orders of magnitude more than that niche site, and they also keep their settings at factory default aside from the AA/AF slider, just like these sites do.
And the target market sucks up whatever they're told.

Last night I found some Call of Juarez D3D10 benchmarks that make a mockery of Anandtech's, for example. But there's no point me linking them because this website's seemingly more thorough methodology is not what the target market wants.

Jawed
 
A few days ago, somebody posted a link to graphs that illustrate performance vs memory clock speed. There were ugly effects in there with negative correlation. That's often a sign of chaos theory at work and very difficult to design away.
Yeah. We saw some signs of this with R580+. We also have clearly documented cases where R5xx performance in AF/AA improves at the same time as non-AF/AA performance degrades. There's a pile of "chaos" that seemingly has to be laid at the feet of the memory controllers in R5xx and R6xx.

The unusually high performance drops when switching on AA are suspicious also.
You're ignoring the theoretically small fillrate advantage that R600 has over R580 with AA on, 14% - the no-AA case is where R600's 128% higher z-only fillrate distorts things. Of course that's going to make the AA-drop look big.

At lot of resources are fighting in parallel for the ALU's and the memory controllers. It's extremely easy to overlook secondary effects during design that can that cut a large slice of your theoretical performance (especially with decentralized bus architectures.)
Not sure why you mention ALUs. If you'd mentioned TUs and RBEs, then fair enough. R5xx's long history of driver performance tweaks centred on the MC seem evidence enough.

Several times, ATI has praised their ability to tweak their arbiters for individual performance cases, which has been interpreted by many as a great advantage. A different point of view would be that this exposes a significant weakness in their architecture. If you've ever been involved in the design of anything arbitration related, you'll know that software driver guys hate to mess with this kind of knobs they don't understand and usually leave them at 1 standard setting. It also gives little hope to those who want to see a game optimized that's not part of the top-20 marketing line up.
I've concluded much the same, that ATI's built a bit of a frankenstein. Sort of like MS Office, way beyond what people wanted or needed or could fathom...

G80 SLI under Vista is still not working. Do you think it'll ever work on G80 or will G92 be the first GPU where it works properly?

All we know is that they have 1 spare for every 16 ALUs.
At one point the patent application talks about 1 spare per 64!

We've yet to see an example of other places with redundancy,
But how could we? It's logically invisible unless you have the right diagnostics or can find some trace of this in the BIOS.

unlike, say, an 8800GTS which has both cluster and (a first) fine grained MC redundancy.
Turning off an entire MC (along with its associated ROPs and L2) is not fine-grained.

As we discussed long time ago, it is nice to have extremely fine grained redundancy, but it's more important first to make sure that as much area of the chip is potentially redundant while still having a nice ratio of active vs disabled units. With the ALU's of R600 alone, the overall area part is not covered.
In R5xx it would seem that ATI proved the concept of fine-grained redundancy solely using ALUs. If fine-grained redundancy is widespread within R600 then the "overall area" problem is solved.

And the R600 configuration with a 256-bits MC doesn't have a nice ratio.
Coarse-grained redundancy's gotta hurt. What's puzzling me is the idea of a R600Pro based on R600 with bits turned off - surely the required volume of this part can't be sustained long term. Perhaps it's like X1950GT, an interim solution (that only lasted several months).

RV570/560 using coarse-grained redundancy does imply that "overall area" was still a fundamental problem.

Jawed
 
I'm interested in architecture, technology, scalability, futures.

But what's your criteria for evaluating an architecture or the underlying technology? It seems that you favor R600's "complexity" but what good is complexity just for the sake of being complex? Complex can be re-interpreted as cludgy, inelegant, inefficient or overengineered depending on where you stand.
 
I'm sure Eric appreciates being called a liar. The ATI guys have been honest about the problems they've encountered these past few years.

Your kidding right? Please tell me your kidding. Like he is going to come out and say making drivers for the R600 is a PITA, he wouldn't have his job for very long if he did. And ATI Honest in the past? Come on, did they come out and say sorry , R520 will be late because we are having issues? R600? Come on man.


Last night I found some Call of Juarez D3D10 benchmarks that make a mockery of Anandtech's, for example. But there's no point me linking them because this website's seemingly more thorough methodology is not what the target market wants.
Jawed


Please do share.
 
G80 SLI under Vista is still not working. Do you think it'll ever work on G80 or will G92 be the first GPU where it works properly?

Jawed

I have SLI under vista and it works just fine. Yes there are problems here and there, but it is far from not working at all as you imply.
 
G80 SLI under Vista is still not working. Do you think it'll ever work on G80 or will G92 be the first GPU where it works properly?

I do hope your kidding. The Geforce 8800GTX in SLI offer pretty much equivalent performance give or take a little compared to XP. ((No they arent identical but they arent miles apart or fundamently broken as you seem to be implying)) The only difference is SFR does not fully function currently under Windows Vista. Considering every SLI profile that exists is AFR. I dont know how meaningful this is to the non power user.

I have dual boot XP/Vista platform and I primarily game under Vista these days with SLI. The only time I use XP is when I run into software compatibility issues which arent driver related at all. If you dont want to take my word for it. ((Which I'm sure you wont)) just look at any modern Vista benchmarks. Like the recent firingsquad article...
 
Hmm I don't see how you can call the g80 unelegent compared to the r600, in just about every catagory its better or faster. The r600 has a hell of alot more math power, but its artificially held back because of design choices, its not all drivers, if you see tests without AA and AF the r600's performance has increased from the release drivers till now and it performs around the GTX, but with any amount of IQ features, its performance drops like a rock and performance hasn't increased from driver releases with AA and AF on, I can some problems being driver related, but not all the r600's woes.
 
Jawed said:
Turning off an entire MC (along with its associated ROPs and L2) is not fine-grained.
As opposed to what? Prior architectures could either not turn off MCs/ROPs/L2s at all, or could only go down to the next lower power of 2 (ie: turn off half the MCs).

G80 can turn on/off MCs at a 64-bit granularity(*). I call that fine-grained. This is as fine-grained as you can go without having to add more pads. more external wiring or more DRAMs.

(*) Actually, it's per-MC. The MCs just happen to be 64-bits which matches up pretty well with GDDR3.
 
This thread should be about RV670 and R670 rather than R7xx IMHO. Hell, even about RV635 and RV620 would do too. We have a long way to go till R700, but RV670 and R670 sounds just as intresting.
 
I'm sure Eric appreciates being called a liar. The ATI guys have been honest about the problems they've encountered these past few years.
Where the hell did anyone call Eric a liar? You often don't find hardware issues for a while and just assume it's drivers. The bug I was telling you about with R200 was the same way.

You've completely forgotten where this bickering started. nAo said, "I don't know how you can just say that when R600 doesn't perform well is just because of drivers." He is 100% right. You are jumping to huge conclusions to dispute that statement.

No but you and plenty of others keep talking in these terms without having any facts.
I just gave you some, but you ignored them. NVidia's margins are way above ATI's. Obviously ATI's yield advantage is not a big enough issue to negate their larger die size for equally priced parts and lower price (due to demand) for equally sized parts.

I'm interested in architecture, technology, scalability, futures. Apparently you're not.
That has nothing do with any of your replies to me or nAo. Any discussion of performance on parallel workloads is utterly meaningless without the context of cost.

Whoops there you go again, with zero facts.
Whoops, there you go again ignoring the facts. The 7600 always sold at a higher price than the X1600. Why? Because it blew the X1600 away in performance.

They're not cherry picked. They're about the only benchmarks that cover a lot of recent games for a decent range of cards.
They absolutely are cherry-picked. No other site shows G7x in such a bad light by gimping it so thoroughly. You won't fine 0.1% of G7x buyers running their video card with the settings of that site. Sites like xbitlabs test even more games. For any game that both computerbase.de and other sites test, the results from the former are completely out of line from everyone else. When 10 sites have mostly agreeable results and computerbase.de deviates from them so heavily in favour of ATI (when compared to G7x), how can you not call it cherry picking?

But there's no point me linking them because this website's seemingly more thorough methodology is not what the target market wants.
There's nothing more thorough about their G7x testing methodology. They arbitrarily decide that viewers are interested in G7x performance when gimped by 50% from image quality settings that barely improve the gaming experience. It's absurd.

More importantly, design decisions are based on the opinions of 99% of the market, not those of IQ freaks. ATI is not selling cards only to the few people that value computerbase.de benchmarks over everything else. For you to judge the hardware engineering of ATI and NVidia with the results of this site is ludicrous. If you continue to do so then there is no point for me or anyone else to debate 3D hardware performance with you.
 
Do you think when they've added in all the extra ALUs (two of these have gotta get close to 1TFLOP) and D3D10.1 functionality that'll still be true?

Jawed
What? Read my post. I'm talking about a theoretical half G80 on 65nm. Nothing more. But if you want to to go down this road, fine. If increasing the ALU:TEX ratio makes NVidia's chips even stronger per mm2, then that makes my claim even more justified.
 
This seems to me to be bass ackwards. The G80 design, if anything, seems elegant, a fine example of KISS principles in action.
Hmm, that's what I've been saying, keeping a lid on change to produce the minimal architecture/tech to produce a D3D10 GPU.

At least you finally seem to be admitting that writing drivers for the R600 is more difficult. The last time I was involved in one of these threads, you were strongly arguing that there is no inherent advantage in the G80's scalar approach vis-a-vis driver compiler vs the R600.
Hey? What has a driver compiler for ALU-utilisation got to do with the optimisation of the use of TUs, RBEs and bandwidth?

To me, blunt power is when you load up on raw computation resources, in order to overcompensate for weak ability to maximize utilization over those resources. The g80 obtains high utilization rates, so it really seems absurd to call it 'a brute force' approach.
G80's high utilisation only comes in single-function synthetics. A nice example is the z-only fillrate which is comically high (in a fuck-me, that's incredible, sense) and under-utilised in games.

I like to defend G80's ability to do 4xAA per loop - but the total Z capacity seems wildly out of proportion with either available bandwidth or triangle rate for things like z-only passes.

Complexity in an of itself neither makes a superior design, nor an elegant, less brute force one. Only complexity that is spent on 'saving' work and increasing effective utilization is evidence of a good design, and for me the value of the R600's design -- well, the jury is still out on that one. Likewise for so-called margin/yield advantages. nVidia has consistently shown good margin management in recent years, so unless we see evidence to the contrary, I wouldn't give any props to the R600 on this either.
I'm not giving props to R600 generally - I'm suggesting that ATI's focus is to lay foundations - they're running to a very different technology/architecture timetable than NVidia.

You could say since R300/NV30 this has been the norm.

Jawed
 
As opposed to what?
Fine-grained as in the SKU is still fully capable after the redundancy has kicked in, which is what R600 is doing, apparently. Not resulting in a second SKU to mop up faulty dies with <100% capability.

Jawed
 
Jawed said:
Fine-grained as in the SKU is still fully capable after the redundancy has kicked in,
And you would do this how, exactly? And how would that scale when 2 MCs are faulty?

If you include a fully functional MC which is never used (either disabled or defective) then you're wasting precious die area. What do you do when all MCs are fully functional? With the G80 architecture, you get the benefit of extra bandwidth. Think of G80 as having 320-bits to memory, and if all MCs are functional, you get an extra 64-bits. Isn't that awesome?

If you always disabled a MC, you wouldn't get that extra bandwidth. So you pay more and get less. Great.

Weren't you complaining about G80 being "brute-force" just a few posts back?
 
Last edited by a moderator:
I guess the idea would be that if a MC consists of a number of repeated structures (in the same way that a g80 cluster contains multiple ALUs) you'd have an extra one or two of those structures (whatever they are) in each MC.

From your reply it sounds like you might be able to use that kind of approach in the ALUs used for ROP blending/stencil/Z, but that the MC itself is not so easily subdivided into identical things. Is that a fair statement?

Also a bit OT, but do current GPUs protect memory structures against defects? I had heard that in the CPU world, at least the larger last level caches are manufactured with extra lines for just this reason...
 
Hmm, that's what I've been saying, keeping a lid on change to produce the minimal architecture/tech to produce a D3D10 GPU.

Oh, so G7x->G80 was keeping a lid on "change"? Whereas the evolutionary tweak of the R600 over the R580/Xenos is, what, revolutionary change?

Hey? What has a driver compiler for ALU-utilisation got to do with the optimisation of the use of TUs, RBEs and bandwidth?

Quite a bit when the R600 keeps losing shader benchmarks that they haven't specifically optimized for. Of course, the retort is "oh, that shader is not 100% math. They sampled a texture! False comparison!"

G80's high utilisation only comes in single-function synthetics.

Right... which explains why a card with far less peak ALU power and far less bandwidth continually equals and beats a card with far more on-card resources. I guess it must be the low utilization rates on G80 GPU resources that gimp it. :)

I'm not giving props to R600 generally - I'm suggesting that ATI's focus is to lay foundations - they're running to a very different technology/architecture timetable than NVidia.

Jawed, you are going way out of your way to try and defend ATI's decisions, so you are definately trying to give props to a design for which the evidence is not there. Maybe I should say that the Nv3x "laid the foundation" for the G7x, and therefore, we should have ignored the NV3x's bugs and deficiencies?

If the R700 comes out and has none of the issues of the R600, will you then claim that the R600 was of course, the natural stepping stone that "laid the foundation" for a better chip? And what do you say to people who bought chips with idle silicon not reaching its full potential in the meantime?

You say they are running a so-called different technology/architecture timetable than Nvdia, but what I see is that NVidia had two timetables. An evolutionary one that was a continual branch off the NV2x->G7x, and a parallel 4+ years-in-the-making G8x one.

What I see is that you are spending a frightful amount of effort in the forums to defend ATI's design decisions, decisions for which evidence does not exist as to benefits to the ATI financial bottom line, nor it's endusers.

I also see loads of assumptions and speculations as to DX10 performance, for which no real solid evidence exists. (so-called assumptions of limitations in geometry shader performance or streamout see to be jumping the gun, as CUDA shows different results, so clearly there is room left in the drivers of the G80 for tweaking too)
 
Back
Top