G80 vs R600 Part X: The Blunt & The Rich Feature

In terms of evaluating the hardware, there's a another flaw in that methodology. NVidia probably doesn't have to disable all optimizations to improve IQ substantially. They give users the option, but since very few customers use it, they don't bother finding which settings are most important.

This is probably true. Nvidia never did a good job of explaining what these settings did. Or how they impacted performance. Some of the G7x/Nv4x opts were very aggressive. ((Anistropic Mip Filter optimisation being one of them)) which were texture stage optimisations. This single optimisation ((while beneficial to performance)) was also the largest culprit for image issues the games that we saw. Trilinear opts along with the LoD optimisations they had in place were not nearly as satanic. Unfortunately very few people actually looked at these opts to see exactly what they were doing and just chose to hit the HQ button. An effective but perhaps often overkill aproach to an architecture that had some of its shader performance tied into its texturing abilities.

Chris
 
G80 still won't do full tril. aniso (in certain places?), I was shocked to see it in at least COJ (dx9) and FEAR. You have to look good, but with a keen eye you can still see mipmap bounderies.

This is with AF set to app and all opt. off in CP.

The R600 does full tril. AF AFAICS ;) but introduces also some shimmering.
 
I think the problem here is that the application only requests bilinear anisotropy instead of trilinear aniso. G80 is certainly fully capable of doing almost perfect angle independent tri-af.

Could be that ATi is always applying trilinear filtering when AF is requested, because that's a common mistake made by developers.
 
I sense some major miscommunication going on here, and it's probably just me :D , but when I first read Jawed stating that G80 was not designed 'cleverly' (to paraphrase) what I took from that is that G80 followed the KISS methodology, whereas R600 engineers tried to be clever (perhaps too clever by half) and put in a bunch of stuff that was currently not useful, and only perhaps useful in the future. Not only that, but this clever complexity demands a lot more work in drivers to fully optimize.

I didn't sense that he was necessarily being a fb about R600, nor saying which design was better in the end overall, just the design philosophies.

Could be wrong...

To me it seems obvious that 'simple' can often be both very elegant and very efficient.
:???:
ERK
 
I sense some major miscommunication going on here, and it's probably just me :D , but when I first read Jawed stating that G80 was not designed 'cleverly' (to paraphrase) what I took from that is that G80 followed the KISS methodology, whereas R600 engineers tried to be clever (perhaps too clever by half) and put in a bunch of stuff that was currently not useful, and only perhaps useful in the future. Not only that, but this clever complexity demands a lot more work in drivers to fully optimize.
Yep.

I'm still trying to decide whether it's worth responding to the last 47-odd hours' worth of postings. I could just carry on gaming to make up lost time while 3D was offlimits this past week.

Jawed
 
Jawed must really be kidding.

1) G80 has better performance than R600.
2) G80 uses alot less power than R600.
3) G80 does all this with fewer memory bandwidth.
4) G80 supports all DirectX10 features just like R600.
5) G80 has a more stable performance over a wide range of games, while R600 is very chaotic on which games it does perform.
6) G80 was launched alot of months earlier than R600.

With all these facts in place, sorry, I can't see how you would promote R600 that much. And believe me, I have had 3 ATi cards before I switched to G80 a few weeks back.

I think that R600 has general manufacturing problems like errors in the AA logic or something. I think it is more than we know.

5) G80 has a more stable performance over a wide range of games, while R600 is very chaotic on which games it does perform.

Tell that to people with the GTS slowdown problem. :)
 
Well, the problem is, in English, the word "clever" has positive connotations, whereas "Blunt" and "KISS" do not. Moreover, KISS designs are inherently "clever". One of the mainstays of cleverness is "doing more with less".

For example, software engineers view "clever" algorithms and hacks, as those which are compact, elegant, and sometimes, ingenious but obfuscated. (Remember the Quake3 SQRT hack?) A design which gets the same work done using a much simpler mechanism is often viewed as "clever" in engineering fields.

For example, take Bipedal robots. Many researchers build phenomenally complex actuators and control mechanisms to allow robots to walk with a human gait, on the other hand, simple passive-dynamic walkers, even unpowered ones, have shown this ability with *no* control logic at all, and the powered passive-dynamic walkers exhibit exordinarily less power requirements for a given distance covered. Undoubtedly, Jawed would view the Asimo robot as inherently superior to a boring old "toy" potential/kinetic energy passive-dynamic robot, because the Asimo has multimillion dollar actuators and tons of CPU computation. However, the reality is, it's *overkill* for the job and a much simpler design exists.

What I object to is calling the G80 "brute force". There are many aspects of the design that are clever. The fact that the architecture is simpler is irrelevent. Complexity != cleverness. I defer to Occams Razor, KISS, and elegance in design as my valuation of what is clever. In software, I view more expressive programming languages, that permit vastly complex programs to be specified or solutions defined, in compact code, as clever. So for example, ML/Haskell's "quicksort in 2 lines" appears to my cleverness. It may not even be practical, but it's clever.
 
I basically agree with you, DC. It's just a shame that Jawed has to take such flack from a lot of people who read something into what he said that he didn't really intend to mean... mostly some semantic argument over 'clever,' etc.

I agree simple designs can be clever, but the Quake code is definitely not simple, despite its brevity, not to mention genius.

ERK
 
To accomplish the same amount of work, simpler design is more clever. Complexity is equal to stupid IF it does not achieve more.

I basically agree with you, DC. It's just a shame that Jawed

I agree simple designs can be clever, but the Quake code is definitely not simple, despite its brevity, not to mention genius.

ERK
 
I didn't say the Quake code was simple, I said it was compact, elegant, and ingenious. The passive-dynamic walker is an example where simplicity combines with compactness, elegance, and ingeniousness.

There are things which are simple and compact, and things where the simplest solution is the more verbose one. Often, when solutions of the former appear, we are gobsmacked and say "of course! it's so damn obvious!" and marvel in how we overlooked something so ingenious but so simple.

I think the G80 is an example of the former BTW. One of those "of course, it's obvious now, scalar is the way to go." In fact, so simple, that when the G8x was announced, I was still erroneously assuming all kinds of complexity with respect to the register file, and swizzling, etc that no longer existed. Then when I mentally when through the translation of vector code to scalar, I was like "aha! of course. It's self evident why this has major benefits!"

Of course, Mint may disagree, because you have to be able to address 4x the registers anyway, but conceptually, it makes writing the compiler so much more pleasurable.
 
It's just a shame that Jawed has to take such flack from a lot of people who read something into what he said that he didn't really intend to mean... mostly some semantic argument over 'clever,' etc.

I disagree. I think that Jawed really meant what he wrote :)
 
I like to defend G80's ability to do 4xAA per loop - but the total Z capacity seems wildly out of proportion with either available bandwidth or triangle rate for things like z-only passes.

But it's almost inherent - so why artificially cripple it even it cannot be utilized most of the time? You have six Quad-ROP-Partitions netting fully featured, 4x AA'ed Pixels. Instead of everyone of them 96 Samples/clk. having c+z you now also have the option of making that z+z.
Can't imagine that this does cost an arm and a leg in terms of transistor count.

edit: Sorry, apparently I've been to late with this.

Fine-grained as in the SKU is still fully capable after the redundancy has kicked in, which is what R600 is doing, apparently. Not resulting in a second SKU to mop up faulty dies with <100% capability.

Jawed
Redundancy mandatorily implies having die space sitting idle until activated. Is it really a more elegant approach to have die-space on a fully functional chip sitting idle than selling this GPU at it's full potential?

More fine grained - yes. Elegant - questionable i'd say.

edit: Sorry, apparently I've been to late with this also.
 
Last edited by a moderator:
When you have ALU- or bandwidth-limited games that back up this assertion then fine...
That's one of the problems: There are none and probably never will be in R600s or G80s lifespan. Not even Scene - damn, not even a single frame is determinedly limited by one single ressource. Balance between those ressource is what counts.

Or are you trying to imply, R600 was not designed for Games but for spitting out vast numbers of (non-serialized) GFLOPs in purely mathematical environments?

Let's guess at R700: say it is 2xR600 configuration on 55nm, each die with 256-bit memory bus (70GB/s?), with an additional 140GB/s connection between them and performs 120%+ faster than R600 on "CF compatible" games (clocks should increase from where they are now). Which part of the architecture and technology of R600 are you expecting to be redundant? I can't think of anything.
Quite a large bit i'd say. Because it'd really be an astonishing feat if AMD would not be able to clock all of R600 20% higher when going to 55nm with that die.

Then you'd have excess silicon as large as one of your two R600/256-Bit-dies being produced for nothing.

Even if clock rates are to be left out of the equation - 20 percent performance gain with 100 percent die-size gain seem not to be fit for a company concerned to make its shareholders happy with large profit margins.

The DB level was somewhat a by product of the architecture that followed down from R300. DB granularity actually went down from R520 to R580.
And down from R580 (48) to R600 (64) - right?

Actually, they couldn't. ATI designed R5xx to have low performance impact with high quality. R580 can usually do math while the TMUs spend more cycles to improve quality. G7x can't do that. Reducing the quality wouldn't make ATI's cards run much faster at all.

If there's math to be done, that is. They can hide their texturing latency only, when there's somethin to hide it behind.
 
Last edited by a moderator:
I disagree that G80 is "simpler" than R600. It also is a fully unified architecture that is heavily threaded (look at CUDA) to compensate latencies and it's individual parts are all designed very cleverly.

Just because ATI didn't have enough transistors left to put in enough texture units in R600, because they wasted much more on the ALUs (remember, nVIDIA saved a lot here because of the double pumped design) and cache doesn't make that design "more clever". A brute force external 512 bit bus that isn't even utilized completely isn't very clever as well. Here I also have to say that I like the 384 bit intermediate step solution of nVIDIA much more.

The only thing that I see which could be more clever with R600 is the handling of geometry shaders with huge data expansion because it streams out to VRAM. But I don't think that will make a difference in practice.

I also have no indication yet that very ALU heavy shader workload will do much better on R600 than on G80 besides very synthetic shaders especially designed to favor one architecture. Which makes calling G80 "blunt" even more ridiculous.
 
Just because ATI didn't have enough transistors left to put in enough texture units in R600, because they wasted much more on the ALUs (remember, nVIDIA saved a lot here because of the double pumped design) and cache doesn't make that design "more clever".
I'm not sure we know that. I don't know of anyone who has come out with data on the relative amounts of transistors devoted to the ALUs for each design.

We don't know the amount of die space taken up by different types of units in each design. Relative density might affect things as well, as some kinds of logic such as cache compress more easily than control and ALU logic.

If only they'd put out clear die shots, we could settle this.
 
You seem to have a very poor understanding of "marginal cost".

If you have the hardware for 4 AA samples per loop, then you might as well use that hardware at full speed in non-AA scenarios. G80 outputs 55 Gpix/s for Z-only without AA and 13.8 Gpix/s for Z-only with AA. Is the latter really that outrageous? Several generations of GPUs have been doing near those levels without AA, and now G80 can do that with 4xAA.

It doesn't matter if games don't use the stupendous non-AA z-only fillrate because if you already decided to do 4xAA per loop, then the marginal cost is almost zero.

If you want to discuss the "total Z capacity" of 8 samples per clock, that's there for early-Z rejection. You definately benefit from it being faster than your fillrate, so 8 makes sense.
Actually I agree with you on all this stuff. I think a high z:colour ratio is pretty spiffy.

The colour rate is what seems over the top, which then leads into the apparent superfluity of Z. Since NVidia has bound the ROPs and MCs tightly, that's the way the cookie crumbles (last time we discussed ROPs, this is the conclusion I came to). My interpretation (as before) is that NVidia did this to reduce the complexity/count of crossbars within G80.

Thinking of the ratio of pixel-instructions:colour-write, this ratio is headed skywards (prettier pixels). D3D10's increase to 8 MRTs seems to counter to that. It'd be interesting to see what sort of colour rate a deferred renderer achieves during G-buffer creation. Clearly a D3D10-specific DR could chew through way more than a DX9-based one.

Jawed
 
You bought the 7900GTX, didn't you? That's all NVidia wants. Also, would you have known that it was a problem without doing a side by side comparison? Did you play games at those settings you mentioned before you saw this comparison? (BTW, what was the IQ problem? Texture shimmering?)

For nearly everyone else, they would not sell their 7900GTX because they wouldn't have this comparison. Few complained about the NV4x quality in comparison to R4xx, so NVidia figured everything was just fine. This is not like the drastic mipmap fudging and blurring we saw with NV31 and NV34. Looking at
sales and margins, NVidia judged it correctly.

That´s what I call fooling customers.
Btw, the problem was overall texture quality. For me, and many others, texture quality is one of the most important things that makes or breaks a game IQ. Not, as you stated, a thing that barely improves gaming experience. Quite the contrary.


Remember, also, that I was talking more about the lower end parts. R580 has 80% more die space than the 7900GTX. Compare the performance of the 7600GT and the X1600XT. It's a complete asswhooping by a smaller chip. Would the average gamer be willing to sacrifice around half his framerate for the better IQ of the X1600XT?

Completely different market segments. Those who buy highend seek best performance WITH best IQ. Who buy mid to low end, for example the 7600GT, have to compromise IQ for speed, in order to get acceptable performance.

But I agree with you: G73 was a much better chip for the market it targeted. It offered great performance at an acceptable IQ, all that while being smaller. So, more profits for Nvidia. ATI made wrong decisions regarding the x1600XT that cost the company losses and customers confidence. I agree too when people here are saying that as a company that acts as a company, making profits and survive, ultimately Nvidia has been making better decisions and is delivering products on time, while AMD (ATI) is failing to do so. I think that, if things continue this way, Nvidia will be better prepared for Intel´s entrance, be it in 2008 or 2009.
 
Completely different market segments. Those who buy highend seek best performance WITH best IQ. Who buy mid to low end, for example the 7600GT, have to compromise IQ for speed, in order to get acceptable performance.

But I agree with you: G73 was a much better chip for the market it targeted. It offered great performance at an acceptable IQ, all that while being smaller. So, more profits for Nvidia. ATI made wrong decisions regarding the x1600XT that cost the company losses and customers confidence.
You need to understand that it's all related. You can't have different architectures at the high and low end, as the investment is too much. Making R580 have a small peformance hit with high IQ means the architecture gains little with medium IQ, so it's actually a disadvantage in many ways. Now, where do you think ATI and NVidia get more total profits from?

Anyway, I still think 80% of the IQ argument has nothing to do with hardware. If NVidia tuned the drivers to settings that you find acceptable as opposed to the lesser standards that most review sites do, it would still beat R580 substantially in perf/mm2. It wouldn't be anywhere near a big a drop as we see on computerbase.de.
 
Back
Top