G80 vs R600 Part X: The Blunt & The Rich Feature

Discussion in 'Architecture and Products' started by Jawed, Aug 11, 2007.

  1. ChrisRay

    ChrisRay <span style="color: rgb(124, 197, 0)">R.I.P. 1983-
    Veteran

    Joined:
    Nov 25, 2002
    Messages:
    2,234
    Likes Received:
    26
    This is probably true. Nvidia never did a good job of explaining what these settings did. Or how they impacted performance. Some of the G7x/Nv4x opts were very aggressive. ((Anistropic Mip Filter optimisation being one of them)) which were texture stage optimisations. This single optimisation ((while beneficial to performance)) was also the largest culprit for image issues the games that we saw. Trilinear opts along with the LoD optimisations they had in place were not nearly as satanic. Unfortunately very few people actually looked at these opts to see exactly what they were doing and just chose to hit the HQ button. An effective but perhaps often overkill aproach to an architecture that had some of its shader performance tied into its texturing abilities.

    Chris
     
  2. Skinner

    Regular

    Joined:
    Sep 13, 2003
    Messages:
    871
    Likes Received:
    9
    Location:
    Zwijndrecht/Rotterdam, Netherlands and Phobos
    G80 still won't do full tril. aniso (in certain places?), I was shocked to see it in at least COJ (dx9) and FEAR. You have to look good, but with a keen eye you can still see mipmap bounderies.

    This is with AF set to app and all opt. off in CP.

    The R600 does full tril. AF AFAICS ;) but introduces also some shimmering.
     
  3. ChrisRay

    ChrisRay <span style="color: rgb(124, 197, 0)">R.I.P. 1983-
    Veteran

    Joined:
    Nov 25, 2002
    Messages:
    2,234
    Likes Received:
    26
    Did you try unclamping the LoD Skinner?
     
  4. Skinner

    Regular

    Joined:
    Sep 13, 2003
    Messages:
    871
    Likes Received:
    9
    Location:
    Zwijndrecht/Rotterdam, Netherlands and Phobos
    Yes, I always allow negative LODBias.
     
  5. Novum

    Regular

    Joined:
    Jun 28, 2006
    Messages:
    335
    Likes Received:
    8
    Location:
    Germany
    I think the problem here is that the application only requests bilinear anisotropy instead of trilinear aniso. G80 is certainly fully capable of doing almost perfect angle independent tri-af.

    Could be that ATi is always applying trilinear filtering when AF is requested, because that's a common mistake made by developers.
     
  6. ERK

    ERK
    Regular

    Joined:
    Mar 31, 2004
    Messages:
    287
    Likes Received:
    10
    Location:
    SoCal
    I sense some major miscommunication going on here, and it's probably just me :grin: , but when I first read Jawed stating that G80 was not designed 'cleverly' (to paraphrase) what I took from that is that G80 followed the KISS methodology, whereas R600 engineers tried to be clever (perhaps too clever by half) and put in a bunch of stuff that was currently not useful, and only perhaps useful in the future. Not only that, but this clever complexity demands a lot more work in drivers to fully optimize.

    I didn't sense that he was necessarily being a fb about R600, nor saying which design was better in the end overall, just the design philosophies.

    Could be wrong...

    To me it seems obvious that 'simple' can often be both very elegant and very efficient.
    :???:
    ERK
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Yep.

    I'm still trying to decide whether it's worth responding to the last 47-odd hours' worth of postings. I could just carry on gaming to make up lost time while 3D was offlimits this past week.

    Jawed
     
  8. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    968
    Likes Received:
    54
    Location:
    Canada
    I think that R600 has general manufacturing problems like errors in the AA logic or something. I think it is more than we know.

    Tell that to people with the GTS slowdown problem. :)
     
  9. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Well, the problem is, in English, the word "clever" has positive connotations, whereas "Blunt" and "KISS" do not. Moreover, KISS designs are inherently "clever". One of the mainstays of cleverness is "doing more with less".

    For example, software engineers view "clever" algorithms and hacks, as those which are compact, elegant, and sometimes, ingenious but obfuscated. (Remember the Quake3 SQRT hack?) A design which gets the same work done using a much simpler mechanism is often viewed as "clever" in engineering fields.

    For example, take Bipedal robots. Many researchers build phenomenally complex actuators and control mechanisms to allow robots to walk with a human gait, on the other hand, simple passive-dynamic walkers, even unpowered ones, have shown this ability with *no* control logic at all, and the powered passive-dynamic walkers exhibit exordinarily less power requirements for a given distance covered. Undoubtedly, Jawed would view the Asimo robot as inherently superior to a boring old "toy" potential/kinetic energy passive-dynamic robot, because the Asimo has multimillion dollar actuators and tons of CPU computation. However, the reality is, it's *overkill* for the job and a much simpler design exists.

    What I object to is calling the G80 "brute force". There are many aspects of the design that are clever. The fact that the architecture is simpler is irrelevent. Complexity != cleverness. I defer to Occams Razor, KISS, and elegance in design as my valuation of what is clever. In software, I view more expressive programming languages, that permit vastly complex programs to be specified or solutions defined, in compact code, as clever. So for example, ML/Haskell's "quicksort in 2 lines" appears to my cleverness. It may not even be practical, but it's clever.
     
  10. ERK

    ERK
    Regular

    Joined:
    Mar 31, 2004
    Messages:
    287
    Likes Received:
    10
    Location:
    SoCal
    I basically agree with you, DC. It's just a shame that Jawed has to take such flack from a lot of people who read something into what he said that he didn't really intend to mean... mostly some semantic argument over 'clever,' etc.

    I agree simple designs can be clever, but the Quake code is definitely not simple, despite its brevity, not to mention genius.

    ERK
     
  11. lik

    lik
    Newcomer

    Joined:
    Jun 30, 2006
    Messages:
    13
    Likes Received:
    1
    To accomplish the same amount of work, simpler design is more clever. Complexity is equal to stupid IF it does not achieve more.

     
  12. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    I didn't say the Quake code was simple, I said it was compact, elegant, and ingenious. The passive-dynamic walker is an example where simplicity combines with compactness, elegance, and ingeniousness.

    There are things which are simple and compact, and things where the simplest solution is the more verbose one. Often, when solutions of the former appear, we are gobsmacked and say "of course! it's so damn obvious!" and marvel in how we overlooked something so ingenious but so simple.

    I think the G80 is an example of the former BTW. One of those "of course, it's obvious now, scalar is the way to go." In fact, so simple, that when the G8x was announced, I was still erroneously assuming all kinds of complexity with respect to the register file, and swizzling, etc that no longer existed. Then when I mentally when through the translation of vector code to scalar, I was like "aha! of course. It's self evident why this has major benefits!"

    Of course, Mint may disagree, because you have to be able to address 4x the registers anyway, but conceptually, it makes writing the compiler so much more pleasurable.
     
  13. Bjorn

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,775
    Likes Received:
    1
    Location:
    Luleå, Sweden
    I disagree. I think that Jawed really meant what he wrote :)
     
  14. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,830
    Likes Received:
    2,120
    Location:
    Germany
    But it's almost inherent - so why artificially cripple it even it cannot be utilized most of the time? You have six Quad-ROP-Partitions netting fully featured, 4x AA'ed Pixels. Instead of everyone of them 96 Samples/clk. having c+z you now also have the option of making that z+z.
    Can't imagine that this does cost an arm and a leg in terms of transistor count.

    edit: Sorry, apparently I've been to late with this.

    Redundancy mandatorily implies having die space sitting idle until activated. Is it really a more elegant approach to have die-space on a fully functional chip sitting idle than selling this GPU at it's full potential?

    More fine grained - yes. Elegant - questionable i'd say.

    edit: Sorry, apparently I've been to late with this also.
     
    #114 CarstenS, Aug 15, 2007
    Last edited by a moderator: Aug 15, 2007
  15. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,830
    Likes Received:
    2,120
    Location:
    Germany
    That's one of the problems: There are none and probably never will be in R600s or G80s lifespan. Not even Scene - damn, not even a single frame is determinedly limited by one single ressource. Balance between those ressource is what counts.

    Or are you trying to imply, R600 was not designed for Games but for spitting out vast numbers of (non-serialized) GFLOPs in purely mathematical environments?

    Quite a large bit i'd say. Because it'd really be an astonishing feat if AMD would not be able to clock all of R600 20% higher when going to 55nm with that die.

    Then you'd have excess silicon as large as one of your two R600/256-Bit-dies being produced for nothing.

    Even if clock rates are to be left out of the equation - 20 percent performance gain with 100 percent die-size gain seem not to be fit for a company concerned to make its shareholders happy with large profit margins.

    And down from R580 (48) to R600 (64) - right?

    If there's math to be done, that is. They can hide their texturing latency only, when there's somethin to hide it behind.
     
    #115 CarstenS, Aug 15, 2007
    Last edited by a moderator: Aug 15, 2007
  16. Novum

    Regular

    Joined:
    Jun 28, 2006
    Messages:
    335
    Likes Received:
    8
    Location:
    Germany
    I disagree that G80 is "simpler" than R600. It also is a fully unified architecture that is heavily threaded (look at CUDA) to compensate latencies and it's individual parts are all designed very cleverly.

    Just because ATI didn't have enough transistors left to put in enough texture units in R600, because they wasted much more on the ALUs (remember, nVIDIA saved a lot here because of the double pumped design) and cache doesn't make that design "more clever". A brute force external 512 bit bus that isn't even utilized completely isn't very clever as well. Here I also have to say that I like the 384 bit intermediate step solution of nVIDIA much more.

    The only thing that I see which could be more clever with R600 is the handling of geometry shaders with huge data expansion because it streams out to VRAM. But I don't think that will make a difference in practice.

    I also have no indication yet that very ALU heavy shader workload will do much better on R600 than on G80 besides very synthetic shaders especially designed to favor one architecture. Which makes calling G80 "blunt" even more ridiculous.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,193
    Likes Received:
    3,134
    Location:
    Well within 3d
    I'm not sure we know that. I don't know of anyone who has come out with data on the relative amounts of transistors devoted to the ALUs for each design.

    We don't know the amount of die space taken up by different types of units in each design. Relative density might affect things as well, as some kinds of logic such as cache compress more easily than control and ALU logic.

    If only they'd put out clear die shots, we could settle this.
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Actually I agree with you on all this stuff. I think a high z:colour ratio is pretty spiffy.

    The colour rate is what seems over the top, which then leads into the apparent superfluity of Z. Since NVidia has bound the ROPs and MCs tightly, that's the way the cookie crumbles (last time we discussed ROPs, this is the conclusion I came to). My interpretation (as before) is that NVidia did this to reduce the complexity/count of crossbars within G80.

    Thinking of the ratio of pixel-instructions:colour-write, this ratio is headed skywards (prettier pixels). D3D10's increase to 8 MRTs seems to counter to that. It'd be interesting to see what sort of colour rate a deferred renderer achieves during G-buffer creation. Clearly a D3D10-specific DR could chew through way more than a DX9-based one.

    Jawed
     
  19. fbomber

    Newcomer

    Joined:
    Jun 9, 2004
    Messages:
    156
    Likes Received:
    17
    That´s what I call fooling customers.
    Btw, the problem was overall texture quality. For me, and many others, texture quality is one of the most important things that makes or breaks a game IQ. Not, as you stated, a thing that barely improves gaming experience. Quite the contrary.


    Completely different market segments. Those who buy highend seek best performance WITH best IQ. Who buy mid to low end, for example the 7600GT, have to compromise IQ for speed, in order to get acceptable performance.

    But I agree with you: G73 was a much better chip for the market it targeted. It offered great performance at an acceptable IQ, all that while being smaller. So, more profits for Nvidia. ATI made wrong decisions regarding the x1600XT that cost the company losses and customers confidence. I agree too when people here are saying that as a company that acts as a company, making profits and survive, ultimately Nvidia has been making better decisions and is delivering products on time, while AMD (ATI) is failing to do so. I think that, if things continue this way, Nvidia will be better prepared for Intel´s entrance, be it in 2008 or 2009.
     
  20. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    You need to understand that it's all related. You can't have different architectures at the high and low end, as the investment is too much. Making R580 have a small peformance hit with high IQ means the architecture gains little with medium IQ, so it's actually a disadvantage in many ways. Now, where do you think ATI and NVidia get more total profits from?

    Anyway, I still think 80% of the IQ argument has nothing to do with hardware. If NVidia tuned the drivers to settings that you find acceptable as opposed to the lesser standards that most review sites do, it would still beat R580 substantially in perf/mm2. It wouldn't be anywhere near a big a drop as we see on computerbase.de.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...