G80 vs R600 Part X: The Blunt & The Rich Feature

Oh, so G7x->G80 was keeping a lid on "change"? Whereas the evolutionary tweak of the R600 over the R580/Xenos is, what, revolutionary change?



Quite a bit when the R600 keeps losing shader benchmarks that they haven't specifically optimized for. Of course, the retort is "oh, that shader is not 100% math. They sampled a texture! False comparison!"



Right... which explains why a card with far less peak ALU power and far less bandwidth continually equals and beats a card with far more on-card resources. I guess it must be the low utilization rates on G80 GPU resources that gimp it. :)



Jawed, you are going way out of your way to try and defend ATI's decisions, so you are definately trying to give props to a design for which the evidence is not there. Maybe I should say that the Nv3x "laid the foundation" for the G7x, and therefore, we should have ignored the NV3x's bugs and deficiencies?

If the R700 comes out and has none of the issues of the R600, will you then claim that the R600 was of course, the natural stepping stone that "laid the foundation" for a better chip? And what do you say to people who bought chips with idle silicon not reaching its full potential in the meantime?

You say they are running a so-called different technology/architecture timetable than Nvdia, but what I see is that NVidia had two timetables. An evolutionary one that was a continual branch off the NV2x->G7x, and a parallel 4+ years-in-the-making G8x one.

What I see is that you are spending a frightful amount of effort in the forums to defend ATI's design decisions, decisions for which evidence does not exist as to benefits to the ATI financial bottom line, nor it's endusers.

I also see loads of assumptions and speculations as to DX10 performance, for which no real solid evidence exists. (so-called assumptions of limitations in geometry shader performance or streamout see to be jumping the gun, as CUDA shows different results, so clearly there is room left in the drivers of the G80 for tweaking too)



G80 is designed for time to market, whereas the R600 is specialized in the rich feature.

Is that right ????


I believe R700 is similar to G80 in terms of the level of the design.


Regards

Vincent
 
Some basic engineering training tells me the success or elegance of a design is determined by its full system performance under given resource constraints such as area and TDP. If somebody argu Pentium IV is better than K8 just because it can run ALUs at double clock rate (>7GHz), I would LMAO. You could probably also say Pentium IV laid the foundation of future chip by switching from one OOO mechanism (RS + ROB) to another (physical register and active list). Well it didn't bode well.

Another example is the war between CISC and RISC. You can say RISC is pretty much brutal force of piling registers and pursuing high clock rate. But so what? At the end of the day it is the FULL SYSTEM performance that matters.
 
It'll certainly be interesting if they work out to be very similar architecturally to R700. I'm wondering what is really going to distinguish them, actually.

Jawed


The only thing that would be similar between R670 and R700 is that they will both have two chips.:p
 
Jawed said:
e.g. a 65th ALU for a set of 64 ALUs.
You still haven't answered the question: How does that work for a MC? This scheme may work for (say) an RBE/ROP, but when there's a flaw in the arbitration logic for a MC, your only choice is to either pluck down a whole extra arbiter per MC (fun!), a whole perpetually disabled MC, or to disable the entire MC.

It also doesn't address what happens when you have 2 defects.

G80's architecture is very elegant here: you pay only for the sillicon that works, and there is no secret unused hardware.


Vincent said:
G80 is designed for time to market, whereas the R600 is specialized in the rich feature.
Oh the irony. I think I will sig this.
 
Oh, so G7x->G80 was keeping a lid on "change"? Whereas the evolutionary tweak of the R600 over the R580/Xenos is, what, revolutionary change?
I can't think of anything in R600 that's like R580 or Xenos, based on the patents I've read. Beyond that, my argument is that R600 is a lot closer, architecturally, to what ATI intends to make a D3D11 GPU than G80 is. It seems to me NVidia has planned more equally-sized architectural/technological steps from here to D3D12, say. Whereas ATI appears to be front-loading that.

Some of this may well be fall-out from D3D10 horse-trading.

Some of this front-loading may be specific to R700, too.

Quite a bit when the R600 keeps losing shader benchmarks that they haven't specifically optimized for. Of course, the retort is "oh, that shader is not 100% math. They sampled a texture! False comparison!"
Our earlier discussion was in terms of ALU instruction throughput/utilisation and ease of compilation for G80's MAD+SF/MUL co-issue versus R600's 5-way issue.

Right... which explains why a card with far less peak ALU power and far less bandwidth continually equals and beats a card with far more on-card resources.
When you have ALU- or bandwidth-limited games that back up this assertion then fine...

Jawed, you are going way out of your way to try and defend ATI's decisions, so you are definately trying to give props to a design for which the evidence is not there.
I'm trying to explain it, not defend it. Of course it's much easier to read Anandtech and say G80 has won. Parts of R600, e.g. the bandwidth, just appear to be lunatic. That pad density thing they've done may come in handy for when they try to put a 256-bit bus on a 100mm2 die (or put a 128-bit bus on that die alongside a 128-bit connection to a partner die?).

If the R700 comes out and has none of the issues of the R600, will you then claim that the R600 was of course, the natural stepping stone that "laid the foundation" for a better chip?
Let's guess at R700: say it is 2xR600 configuration on 55nm, each die with 256-bit memory bus (70GB/s?), with an additional 140GB/s connection between them and performs 120%+ faster than R600 on "CF compatible" games (clocks should increase from where they are now). Which part of the architecture and technology of R600 are you expecting to be redundant? I can't think of anything.

e.g. I'm guessing the two dies' L2 caches will share data and that CF will suffer none of the "traditional" 2x "distinct pools of memory" problems that SLI and original CF suffered from.

It doesn't mean I like the idea of a 2 die R700, but I'm trying to correlate aspects of R600 with that direction as well as think of function points in D3D11 that steer the architecture. R700, conceptually, hasn't just popped out of nowhere. After they got it running, ATI didn't go "oh shit, R600 is a dead end, what are we gonna do? Oh, we could put two of them together on one board."

I dislike the idea, because I think game compatibility will go right out the window. I'm also pessimistic about the compatibility of AFR with the more intricate rendering algorithms. So CF-incompatible games will be just as wasteful (if not more so) than they are on a single R600.

And what do you say to people who bought chips with idle silicon not reaching its full potential in the meantime?
Considering that I advise friends against R600, what do you expect me to say?

I also see loads of assumptions and speculations as to DX10 performance, for which no real solid evidence exists.
As far as R600 goes, I'm trying to restrict my evaluation to games that have a decent chance of a reasonably optimised driver. The result is there's practically no useful game-based data :cry:

Jawed
 
G80's high utilisation only comes in single-function synthetics. A nice example is the z-only fillrate which is comically high (in a fuck-me, that's incredible, sense) and under-utilised in games.

I like to defend G80's ability to do 4xAA per loop - but the total Z capacity seems wildly out of proportion with either available bandwidth or triangle rate for things like z-only passes.
You seem to have a very poor understanding of "marginal cost".

If you have the hardware for 4 AA samples per loop, then you might as well use that hardware at full speed in non-AA scenarios. G80 outputs 55 Gpix/s for Z-only without AA and 13.8 Gpix/s for Z-only with AA. Is the latter really that outrageous? Several generations of GPUs have been doing near those levels without AA, and now G80 can do that with 4xAA.

It doesn't matter if games don't use the stupendous non-AA z-only fillrate because if you already decided to do 4xAA per loop, then the marginal cost is almost zero.

If you want to discuss the "total Z capacity" of 8 samples per clock, that's there for early-Z rejection. You definately benefit from it being faster than your fillrate, so 8 makes sense.
 
the link Jawed referred to earlier

http://www.digital-daily.com/video/d...ark/index3.htm

Anand
CPU: Intel Core 2 Extreme X6800 (2.93GHz/4MB)
Motherboard: ASUS P5W-DH
Chipset: Intel 975X
Chipset Drivers: Intel 8.2.0.1014
Hard Disk: Seagate 7200.7 160GB SATA
Memory: Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 2)
Video Card: Various
Video Drivers: ATI Catalyst 8.38.9.1-rc2
NVIDIA ForceWare 162.18
Desktop Resolution: 1280 x 800 - 32-bit @ 60Hz
OS: Windows Vista x86


Link above
Bus PCI-Express
CPU Intel Core2 Extreme Quad-Core QX6700 @ 2.93 GHz
MB Foxconn N68S7AA-8EKRS2H
Memory Corsair XMS Xpert DDRII-800 4x512 MB
OS Windows Vista 64 Premium, DirectX 10
PSU Hiper HPU-4M730 730 W

We ran the tests using the ForceWare 158.45 and Catalyst 7.5 drivers.

Can anyone here say its an honest review or an apples to apples comparison? outside of Jawed.
 
You're ignoring the theoretically small fillrate advantage that R600 has over R580 with AA on, 14% - the no-AA case is where R600's 128% higher z-only fillrate distorts things. Of course that's going to make the AA-drop look big.
Ok, I wasn't really comparing to a R580 specifically, but just looking at it stand-alone without the context of other chips. It stresses the point, though, that it was very unfortunate of ATI to not aim for a much higher performance multiple compared to R580.

Not sure why you mention ALUs. If you'd mentioned TUs and RBEs, then fair enough. R5xx's long history of driver performance tweaks centred on the MC seem evidence enough.
I skipped a step here: chaos tends to enter a system faster when you have more interacting agents. When you're going to run AA functionality on your shaders, the number of shader clients goes from 2 (VS/PS) or 3 (VS/GS/PS) to 3 or 4. That can only make things more complicated and harder to analyze. Not only because of how you're going to schedule the shader units, but also because the resulting memory traffic may now be less coherent.

G80 SLI under Vista is still not working. Do you think it'll ever work on G80 or will G92 be the first GPU where it works properly?
My turn now to question why this suddenly entered the discussion. ;) SLI seems to be used in benchmarks? That's about as far as my knowledge reaches.

But how could we? It's logically invisible unless you have the right diagnostics or can find some trace of this in the BIOS.

Turning off an entire MC (along with its associated ROPs and L2) is not fine-grained.

In R5xx it would seem that ATI proved the concept of fine-grained redundancy solely using ALUs. If fine-grained redundancy is widespread within R600 then the "overall area" problem is solved.
I'm with Bob on this.
Unlike RAMs, there are no established techniques that I know of for random logic design to obtain redundancy. What ATI seems to be doing for its ALUs is about as far as you can go, but it's not usable for other pieces of logic. Other than dumb duplication (expensive and even then you have single point of failure in the multiplexers), there is no good way to design redundant state machines, multiplexers, address counters, FIFO's, or basically any non-parallel, non-regular structure on a chip, yet those blocks take up the lions' share of the combinational area.

So what I meant by a fine grained redundancy is that it has a high active/disabled ratio. An equally important metric is the number of single point of failures in the design, which I suspect to be very high in the case of ALU redundancy. When you have the ability to disable large monolithic blocks of logic, like an MC, that number should be very low, mostly limited to a bunch of multiplexers or gates that surround it.
 
I think the problem with DX10 tests is we don't know if they are testing DX10 SM4 only features (GS, stream-out, etc) or just longer shaders without people dissecting the shaders and render state to see what's going on.

Jawed said:
Beyond that, my argument is that R600 is a lot closer, architecturally, to what ATI intends to make a D3D11 GPU than G80 is. It seems to me NVidia has planned more equally-sized architectural/technological steps from here to D3D12, say. Whereas ATI appears to be front-loading that.

D3D11? We have no inkling of what this is, unless you think it is DX10 + tessellator. The only thing from DirectX Next that didn't make it seems to be the tessellator and frame-buffer reads in the shader (e.g. do blending yourself)

I can see the former, but I see no R600 advantage for the latter feature. Talk about D3D11 and D3D12 "architecture" at this point seems to be nonsense.
 
Let's guess at R700: say it is 2xR600 configuration on 55nm, each die with 256-bit memory bus (70GB/s?), with an additional 140GB/s connection between them and performs 120%+ faster than R600 on "CF compatible" games (clocks should increase from where they are now). Which part of the architecture and technology of R600 are you expecting to be redundant? I can't think of anything.

e.g. I'm guessing the two dies' L2 caches will share data and that CF will suffer none of the "traditional" 2x "distinct pools of memory" problems that SLI and original CF suffered from.

It doesn't mean I like the idea of a 2 die R700, but I'm trying to correlate aspects of R600 with that direction as well as think of function points in D3D11 that steer the architecture. R700, conceptually, hasn't just popped out of nowhere. After they got it running, ATI didn't go "oh shit, R600 is a dead end, what are we gonna do? Oh, we could put two of them together on one board."

I dislike the idea, because I think game compatibility will go right out the window. I'm also pessimistic about the compatibility of AFR with the more intricate rendering algorithms. So CF-incompatible games will be just as wasteful (if not more so) than they are on a single R600.

but maybe, as an extension of your L2 sharing idea, we can think of the R700 as a single processor on multiple dies? like a 386+387, a voodoo 1 or a pentium pro (with one or two L2 dies). one of the dies would be the master and speak to the PCIe bus and you'd effectly have a single GPU software wise. The on-package interconnects would have to be really fast (how is that made on the pentium pro? or the L3 dies on POWER chips?)

Is that feasible and could they still easily be able to use a single die for midrange boards.
 
Last edited by a moderator:
They absolutely are cherry-picked. No other site shows G7x in such a bad light by gimping it so thoroughly. You won't fine 0.1% of G7x buyers running their video card with the settings of that site. Sites like xbitlabs test even more games. For any game that both computerbase.de and other sites test, the results from the former are completely out of line from everyone else. When 10 sites have mostly agreeable results and computerbase.de deviates from them so heavily in favour of ATI (when compared to G7x), how can you not call it cherry picking?

There's nothing more thorough about their G7x testing methodology. They arbitrarily decide that viewers are interested in G7x performance when gimped by 50% from image quality settings that barely improve the gaming experience. It's absurd.

More importantly, design decisions are based on the opinions of 99% of the market, not those of IQ freaks. ATI is not selling cards only to the few people that value computerbase.de benchmarks over everything else. For you to judge the hardware engineering of ATI and NVidia with the results of this site is ludicrous. If you continue to do so then there is no point for me or anyone else to debate 3D hardware performance with you.


You don´t have a point here. I bought a 7900GTX, thinking it was the best at the time. Little after, a cousin of mine bought a 1900xtx, and guess what: we did a side by side comparison, and the IQ of my 7900gtx was shitty, compared to his 1900xtx. I didn´t have to be a IQ freak, as you implied, to see the gigantic difference. I changed various settings, to get the same visual quality, and that brought me big drops in performance.
My conclusion? Nvidia was cheating. Yes, cheating. And those same sites you mentioned allowed that. The cards were not rendering the same image, so it´s pointless the "fps graphs" showed by those sites. Its apples to oranges. I think its valid for a site to disable the optimizations of both cards to show the performance in equal ground. 7900 loss more performance because its default is a mess, and without the optimizations it has to render the proper image, as intended by the game developer; 1900 loss a lot less because the difference between the default and the "optmization-free" output is negligible.
Ati could have fool their customers, by setting the default rendering quality of its 1900XTX the same of 7900gtx´s, getting a performance advantage and more sales.
I sold my 7900, and bought a 1900xtx. I´m not a fan of ati or either company (in fact, today I have a 8800GTX, the best card I have ever owned). But graphs, shown by websites, that compares performance of a card that renders the games properly, and another card that cheats and degrades IQ, noticiably, are pointless.
 
fbomber666:

Unless i'm mistaken, what Jawed wants to prove with the G7x, R580 benchmarks is that the R580 was much more forward looking. Which of course should be visisble in current benchmarks.

He then uses benchmarks from a source that disables filtering optimizations (G70) to prove that. Not saying that this isn't correct from a IQ perspective (didn't own any of these cards myself) but it's certainly not that interesting if you want to prove the forward looking aspect of the R580 and it's shader power advantages compared to the G7x. In fact, the best thing would be to do the opposite, disable AF & MSAA.
 
Last edited by a moderator:
NVidia is the one playing catch-up, technologically and has got a long way to go

I'm trying to understand where you got this from, but I can't.

Actually it's exactly the opposite, or have you already forgot NV40 with SM3.0 vs. R420? If you're implying that stuff like the ring-bus is some technological advance, you're obviously wrong. While it's a nice checklist feature, it had and has no use whatsoever in real life, nor does it show any speed gains in real games.

From what I see, ATI are the ones playing catch-up here, since their parts underperform and usually arrive 6 months later than the NV parts.
 
Well ATi does have the 512 bit bus done, and from what Eric stated even with a die shrink they shouldn't have much problems using it, and of course the shader power, but its not like nVidia will have much issues going to 512 bit bus, and if they do, they will have more shader power.

I think nV really started to innovate after they were forced into a corner with the nv30 on many fronts, not just features, but also process technology, which is now just giving them the advantage with the g80. I don't know how fast AMD will be able to counter this, its pretty obvious it took nV quite some time to make the g80 (design phase), much longer then previous generation chips.
 
Jawed seems to have an infatuation with overengineered out-of-spec features as his metric for measuring the better architecture. I find this ironic, because for years, Nvidia was criticized for putting forward looking features in their architectures before the market needed them, features which seriously gimped their performance, and gave no real benefit in games for years. Nvidia was the first consumer card (besides 3dlabs DCC stuff) that had a fixed-function T&L chip. Didn't help endusers one iota as most games were inlining vertex transforms as C preprocess macros, not as library calls. Then there was the NV30 disaster, designing a chip to support shader features that went beyond PS2.0 in some respects (PS2.0a) NV4x introduced VTF, PS3.0 DB, and other stuff for which games really didn't need, and for which a high performance version would consume massive die space (R5xx), all the while, most games did not need these features, and alternatives were available (e.g. Humus alpha/stencil tricks.) Just look at Heavenly Sword, it doesn't really need R580/Xenos level DB. NVidia sported PCF for years, yet it was underutilized (although it's marginal cost probably wasn't high) The NV3x had double-Z and z-scissor, and it's practical consequence was benefit in only one game: Doom.


So Nvidia used to get accused of "frontloading" features prior to market demand, and ended up sabotaging their designs by sacrificing transistor budget and making consumers pay for useless features.

Here's Jawed's assumption:

1) R600 "frontloaded" features like virtual memory, multicontexts, and tessellation, so when the magical D3D11/12 games arrive in 3-4 years, they'll have a head start on the R700/R800, and NVidia's architecture will be behind.

2) The G8x architecture has no evolutionary upgrade path to these features, without nVidia going back to the drawing board and doing major work, therefore, future G9x/G1xx architectures won't be ready for D3D11/12.


Here's my opinion:
1) The G8x *is* feature rich, it just lacks two cherry picked features that Jawed values highly. On the other hand, it supports every other feature of DX10, plus CSAA, plus better anisotropic filtering, plus some non-exposed TMU/ROP features.

2) and in supporting all these features, the G8x achieves high performance with less chaotic/unpredictable driver performance.

3) all with fewer ALU and bandwidth on a bigger process node

4) with lower power and noise

5) and it's highend model still owns the performance crown, while ATI's flagship, is "midrange"


If GPU's were cars, it seems you'd be complaining that the G8x's more "fuel efficient" design, with less cylinders and smaller volume, is a "brute force" design, vs a car with a huge engine, volume, and many more cylinders, and super advanced computer controlled fuel injectors, that, because of complexity, can't seem to keep the engine at top efficiency.

It all comes down to what you view as "brute force". If you think adding more TMUs and ROPs is brute power, then the G8x is brute force. But what if you think adding a high number of ALUs and 512-memory bus, yet unstable to saturate the memory bus, or keep the whole chip balanced and busy, is a brute force design?
 
I think nV really started to innovate after they were forced into a corner with the nv30 on many fronts, not just features, but also process technology
I think that happened at right about the time they got their first NV30 wafers back and determined the yields... If my data on that is correct, it was quite an amusing figure! ;) Zero redundancy + first spin + new process ftw? I think that was for Low-K too, but I'm not completely sure.

Anyway, here is one arguement I'd like to highlight: even if G80 had no easy path to implement some features required by future versions of DirectX (which I don't really agree with), why does this even matter when NVIDIA is making ludicrous profits that will allow them to spend more in R&D if they need to?

Furthermore, from what I've seen, there is not a SINGLE DX11-level patent that can be publicly viewed today, at least on the NVIDIA side. I would be very surprised if NVIDIA's DX11 architecture is an evolutionary step of the G8x. Quite on the contrary, it is probably safe to presume that it will be a completely new architecture again.

In fact, interestingly, that might also be the case on the AMD side. But then again, even 'completely new architectures' tend to share some things with previous generations, so the real question is how much both will be able to keep from their previous efforts. I'm willing to bet that, indeed, AMD is a better position to do that. But once again, why does this matter, when NVIDIA is able to just spend more money and manpower on it to compensate?

If your arguement is that this is a good investment for the future, then I disagree completely: from a financial perspective, just making more money on the current generation will more than compensate that for NVIDIA.
 
In fact, interestingly, that might also be the case on the AMD side. But then again, even 'completely new architectures' tend to share some things with previous generations, so the real question is how much both will be able to keep from their previous efforts. I'm willing to bet that, indeed, AMD is a better position to do that.

The question is if it's an advantage to reuse "old" stuff. The Xenos and the supposedly forward looking R580 didn't help Ati that much with the R600.
 
Last edited by a moderator:
Back
Top