NVIDIA GT200 Rumours & Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 10, 2008.

Thread Status:
Not open for further replies.
  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Both probably right - as soon as you run badly out of memory on the former cards. ;)
     
  2. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    Yes, 2x 9800GX2 would be in some cases < 2x 8800 Ultra (scroll down).

    But the 7k in Vantage Extreme seemed to be a GTX 280 SLI or some people only misunderstood the "2x 9800GX2".
     
  3. Lukfi

    Regular

    Joined:
    Apr 27, 2008
    Messages:
    423
    Likes Received:
    0
    Location:
    Prague, Czech Republic
    I was just checking Charlie Demerjian's news and comparing it to NDA info I have myself and I realized that although Charlie's interpretation is very anti-nVidia, some of it is right. So I'm inclined to believe Charlie knows GTX 200 specs and clocks, just his info about RV770 is way off, which is why he claims that single RV770 will be on par with the GTX 260.
     
  4. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    And it has 3 times its floating point performance ? So, the missing mul maybe is still missing...
     
  5. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    933GFLOPs / 384GFLOPs is only 2.4x and I would think/heard some times ago, modern code is more demanding MADD.
     
  6. nicolasb

    Regular

    Joined:
    Oct 21, 2006
    Messages:
    421
    Likes Received:
    4
    For what it's worth:

    http://www.fudzilla.com/index.php?option=com_content&task=view&id=7604&Itemid=1


    and: http://www.fudzilla.com/index.php?option=com_content&task=view&id=7603&Itemid=1

     
  7. igg

    igg
    Newcomer

    Joined:
    May 16, 2008
    Messages:
    63
    Likes Received:
    0
    Nvidia GT200 sucessor tapes out

    This doesn't look very good, maybe even disappointing:

    ...
    ...
    ...
     
  8. CJ

    CJ
    Regular

    Joined:
    Apr 28, 2004
    Messages:
    816
    Likes Received:
    40
    Location:
    MSI Europe HQ
    /Offtopic... I wish people would actually turn back a few pages to check if a link wasn't already posted.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    What's the current blending rate ?
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Thanks.

    One thing that I'm a bit dubious about is the way that a cluster is split into multiprocessor and TMU. There are parts of a cluster that aren't either, as far as I can tell, relating to scheduling/instruction issue.

    So I'm wondering if some of the area that's being counted as TMU is something else.

    I'm curious about the batch size (number of elements in a hardware thread) on GT200. G80 has an underlying batch size of 16 but I have the impression that it'll be 32 in GT200. I wonder if this leads to some simplification of the multiprocessors, or at least in the scoreboarding/scheduling/instruction-issuing.

    I'm also wondering if the register file and shared memory bandwidth is doubled. G80 is capable of fetching 16 elements per clock (from each, though I'm not sure if it can fetch from both simultaneously at that rate), so I'm wondering if GT200 fetches 32. I've long suspected there's a register bandwidth bottleneck in G80, so it'll be interesting to see if this has changed and therefore a factor in improved TMU utilisation and usage of the MUL - both of which I suspect are bound by register bandwidth in G80.

    Jawed
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I'm sure there'll be a lot of CUDA-using people jumping for joy over the register file increase :smile:

    Jawed
     
  12. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    No, that's part of the SMs. There are indeed things which are only present at the cluster level (I think constant cache is one of them, but I can't remember right now) and there's definitely some basic scheduling there too. However, it's probably fair to say that a significant majority of it is related to texturing or fetches in general.
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Of course the missing MUL was a serious hit to NVidia's claims for the efficiency of their ALUs. If it routinely achieves 2/3 of the "headline" GFLOPs rating - then it's better to just pretend it doesn't exist, which is why we have G80 as 346GFLOPs, not 518GFLOPs, etc.

    Presumably as far as GT200 marketing is concerned, it has more than twice the GFLOPs of G92.

    Jawed
     
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    OK. One thing I'm still unclear about is whether NVidia's architecture has dedicated point-samplers in addition to the "TMU"s or whether in fact the modular configuration of texture fetching/filtering (e.g. addressing unit, fetching, filtering) allows them to run all fetches through the same samplers. It seems likely to be the latter.

    Additionally modularisation implies an overhead, as each unit theoretically needs to independently handle its own instruction/operand fetching and resultant posting.

    Jawed
     
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    B3D article:

     
  17. Twinkie

    Regular

    Joined:
    Oct 22, 2006
    Messages:
    386
    Likes Received:
    5
    From the slide posted by trini

    "Do not distribute" :lol:

    I kind of feel sorry for those who bought a 9800GX2. I mean, there would be no point in nVIDIA to spend alot of resources especially on the driver side of things to maintain scaling for the GX2 in newer titles to come since its pretty much reached EOL (due to GT200). Quad SLi becomes meaningless once again just like the 7950GX2.
     
  18. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    I dare to propose, that the batch-size is untouched from the G80, e.g. an additional SIMD array of eight SPs per cluster just increases the threading parallelism...
    So now GT200 has a kind of triple-instruction issue, to name it.

    I wonder, how this would impact the interpolation rate, compared to G80/92.
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    If I understand correctly the purpose of a cluster is just to share the texturing unit. All of the multiprocessors are independent and contain their own scheduling and operand issue hardware, register file and shared memory. So the number of multiprocessors per cluster won't affect any ALU processing rate but it should increase TMU utilization.

    CUDA blocks are distributed at the multiprocessor level and all inter-thread communication is limited to a single multiprocessor.
     
  20. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    BTW, here's a thought: I think the only way Charlie could possibly be right about the GT200b die size being 'a little more than 400mm²' while GT200 is presumably 576mm²... is if GT200b uses GDDR5. 576*0.81 is 466mm², but you could assume the I/O for GDDR5 per-bit isn't massively higher than GDDR3 so you could save maybe 20mm² there. And if you save a bit on the MCs too, there you go, low 400s.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...