The New and Improved "G80 Rumours Thread" *DailyTech specs at #802*

Discussion in 'Pre-release GPU Speculation' started by Geo, Sep 11, 2006.

Thread Status:
Not open for further replies.
  1. aeryon

    Newcomer

    Joined:
    Oct 5, 2006
    Messages:
    85
    Likes Received:
    3
    Location:
    France / China
    here we go...

    8800GTX SLI stock on core2 at 3600MHz gives 15k with shipped drivers (old)


    single 8800GTX o/c 660/2200 and 96.92 drivers breaks 14k 3dM06 barrier on o/c kentsfield:
    [​IMG]



    and finally 8800GTX SLI stock on kentsfield o/c at 3700MHz and NF590 platform gives 17k with up to date 96.97 drivers :

    [​IMG]


    we are waiting for 8800GTX SLI o/c on kentsfield o/c to maybe break 20k barrier !!!!

    source : http://www.xtremesystems.org/forums/showthread.php?t=121980
     
    #2641 aeryon, Nov 7, 2006
    Last edited by a moderator: Nov 7, 2006
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    Maybe if its texturing architecture was efficient.

    Until your test is PS limited, there's no point in making a comparison between two GPUs. e.g. one doesn't compare high-end GPUs at 640x480. Far Cry's 1600x1200 no-AA/no-AF is clearly a useless baseline.

    I hope you're not now referring to the Far Cry results specifically. Clearly at anything less than 1600x1200 4xAA/16xAF "other stuff" is happening.

    But there's an uncanny bandwidth correspondence between both GTS and GTX performance, compared against XTX, at the max resolution in these game tests. The only exceptions are where stencil shadowing is playing a big part.

    Clearly there's no point talking about no-AA/no-AF results, because bandwidth plays a much lighter role regardless of the GPU under test.

    Jawed
     
  3. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Well, if ATI can add 32 full shader processors in 60M transistors, I can't imagine it would be that much for single purpose texture address calculators. Even if they could only hand plain 2D textures.

    I think I'm missing something, though. For single cycle trilinear or volume fetch or 64-bit textures, you don't need any extra latency hiding because you're only bringing one vec4 of data into the stream processor. Truly independent texture requests would need more pixels to be in flights to maintain efficiency.

    So that's probably why they did this. At 1.35 GHz, the FIFO's will already have to be pretty huge to hide the latency of a texture access. Makes sense now. I'm still amazed that G80 does so well in the texture heavy shadermark tests compared to G71 without having twice the address units.
     
  4. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    I hadn't thought of that but it's an excellent point and should have at least as much impact as the texturing HW cost. I'm still very curious as to what types of textures get filtered at what speed (if they do 2 fp16 samples per clock I'll be amazed). Someone leak the b3d review already :grin:
     
  5. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    In UT2004, Doom3 and Quake4 most textures have AF set to maximum. I think in Doom3 there is one of the texture units set to bilinear or less just because it's likely used as a table to normalize values. Of course that is if the OpenGL driver actually makes use of what the programmer has (for bad or good) decided for texture sampling.

    The AF algorithm used to get those numbers is the 8-petal (4 axis) algorithm so it can't be detecting more anisotropy that ATI's HQ.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
  7. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    ...

     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    The way I see it, while the 64 TMUs in G80 can do Int8 filtering (with shared address across pairs of TMUs), it's designed to run a single Int16 or fp16 fetch-and-filter per clock. Now two "bytes" are being fetched and filtered per clock, depending upon one address unit - as opposed to the one byte that we see in NV40 etc. When NV40 wants to fetch and filter 2 bytes (per channel, e.g. Int16 or fp16) it takes twice as long. With 4 bytes, it takes four times as long.

    It seems similar to the duality that NV40 etc. has, with support for either fp16 or fp32 math in its PS ALUs.

    Jawed
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    The end user scores in FEAR for the GTS at 4x16x also line up eerily close to the XTX. So from early appearances R580 is making just as effective use of its bandwidth as G80.
     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    It's not useless because at 1024x768 the ceilings are notably higher. It's not ideal, but they unfortunately didn't test any higher resolution.

    BTW, my comment about the AA hit wasn't meant to imply anything about the AA abilities of G80 and R580+, but rather the fact that G80 has a large non-PS component in its framerates.

    And what makes you so sure that "other stuff" is not happening even at that resolution? I bet if you went to 2048x1536, the GTS would pull ahead.

    The "uncanny" correlation between bandwidth will always appear to be there between the GTS and GTX because all their capabilities are scaled at the same ratio. So any time the X1950XTX happens matches the GTS you can make this so-called "uncanny" claim, especially since you ignore G71.

    Anyway, there's Oblivion, 3DMark, Quake4 (which I'm told is usually tested without stencil shadows), and Serious Sam 2 which don't match your pattern. In SS2 you can really see that ATI's HDR+AA works a lot better here. Lack of compression in G80? Immature drivers?

    Couple that with the fact that HL2 and FarCry are not fillrate/PS limited on G80, and your pattern is pretty weak.
     
  11. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    That makes no sense. How is any of that not explained by having more pixel shaders? There's no proof of out-of-order advantages there.

    Geeforcer mentions the Archmark tests, but those are ~1% except for the single texture bilinear test which could be due to any number of issues (drivers, thrashing, interleaving settings, etc).
     
  12. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    I mentioned Archmark as the lack of evidence re. R580 over R520 performance advantage, hence "margin-of-error-ish" remark... wich BTW is what Dave almsot states when commenting on some of the cited test results.
     
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    And I thank you for that. :smile:
     
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    What exactly is this debate about? I thought the whole deal with the R580 3:1 thing is to alleviate the ALU bottleneck in R520. Out of order threading also benefits ALU throughput, not texturing. Why would anyone expect faster texturing in R580 compared to R520. What I would expect is higher ALU throughput when texturing (compared to both R520 and G7x).

    I also don't get Jawed's comment above about better texturing efficiency in R5xx compared to G7x.
     
  15. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Yeah, but just because you got the mipmap selection to match the 8-petal pattern doesn't mean you know the number of samples. The specifics of how many samples are taken are unknown and probably confidential, and both ATI and NVidia would have trimmed it down to the bare minimum to avoid aliasing (NVidia went a little past that point). GF4 takes a much larger hit than R520 HQ.

    But I find it unbelievable that going from bilinear to AF the total number of texture samples per frame goes up by 4x. Do you have any other statistics you can share with us? How many stencil pixels were drawn per frame? How many shaded pixels? How many texture instructions from the shader? How many instructions total?

    Maybe I should fiddle around with the simulator myself.
     
  16. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    No, it doesn't. Neither does it sound good as is, even. What's the point in a unified approach, if there is a fixed split anyway? That sounds like it has all the disadvantages of unified with none of the advantages.
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Well the assumption there was that "compatibility mode" refers to a fallback when dynamic load-balancing is fubared. The option was described in that famous patent.
     
  18. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    Maybe it will just take time for them to optimize the drivers for it, or they are just keeping it that way for now since there really is not need for its full performance as of now. Which don't see how the performance will really increase unless one of the shaders types is bottlenecking. Which they don't see to be. Need more benchmarks to see that better though.
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Well something isn't working properly cause the 7900GTX is beating up on the 8800GTS when it has no right to.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...