Beyond3D's GT200 GPU and Architecture Analysis

Discussion in 'Architecture and Products' started by Arun, Jun 16, 2008.

  1. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Nice article, as usual.

    Couple of things:
    Moving these things to the shader core require gobs of operand bandwidth. For custom filtering, you can already do point samples and whatever you want from there.

    One thing you should realize is that pure math logic isn't very expensive at all. It's the routing and temporary storage of data that uses most of the transistors. Filtering alone needs only a fraction of the logic of a shader core, and triangle setup needs to be done in front of the triangle rasterization. I'm not too sure why triangle setup hasn't been improved beyond once per clock, but I think there may be difficulties in parallelizing while preserving order of the triangles and their quads throughout the pipeline. I don't see anything that can't be overcome, though.

    Also, on the last page, don't you mean 1/10th of a terazixel instead of petazixel? I have a tough time believing a 1500 fold increase over G80. ;)
     
  2. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    Is it 1/8th or 1/12th? The DP unit cannot issue the MUL along with MAD ?
     
  3. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    The reason why the FP64 unit cannot be used at the same time is that there is no scheduling hardware dedicated to it (you can't issue one extra instruction per cycle for it), it shares the register file (so it needs part of the FP32 ALUs' register files' banks/ports to be reserved for it while used), and so forth. As we say in the article, it was claimed that it could be used along with FP32 up to a certain extend, so we speculate that it can feed from the same ports as the SFU/MUL which would then have to idle meanwhile.

    As for memory support, NVIDIA has been claiming they could use GDDR4 for the last three centuries or so, but it never happened. And it's never going to happen either. There are both political and technical reasons for that; in the G8x/G9x case, the architectures were fully optimized around GDDR3's burst length and could have not supported GDDR4 at equally high efficiency. In the GT200's case, I'm not completely sure about that part given that GT2xx will use GDDR5 eventually, but I would be very supported if either the MCs or the PHYs supported GDDR4.

    As for Direct3D 10.1... Well, not sure what we can or cannot say, so uhhh we'll let you know eventually if we can!

    Heh, the NDA has been lifted, I just haven't had the chance to finish my article yet... :)
    Bah, how much of this thread needs to be dedicated to Rys' jokes? ;)
    The unit itself is 1/8th the width of the main ALUs, so considering the MUL the theoretical peak flop rating is 1/12th that of SP, correct. Similarly, in AMD's case with RV770/Firestream 9150, it seems to be 1/5th of the SP rate (for a much smaller die size), although presumably still without denormal or rounding errors. Should be fine for most apps anyway probably.
     
  4. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,156
    Likes Received:
    1,433
    Location:
    Beyond3D HQ
    Heh, yeah :razz:

    I see Arun threw in his rant about shader core triangle setup too, without really asking :twisted:
     
  5. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,452
    Likes Received:
    110
    It took Nvidia too many space to add double precision hardware. It is a shame it´s not usable for gaming. Are Ati´s double precision units usable for gaming when not used for gpgpu ?
    I would say that Larrabee has a lot to do with the desing of this new chip, and that has implied Nvidia to forget looking in the rear mirror to ATI.
     
  6. bdmosky

    Newcomer

    Joined:
    Jul 31, 2002
    Messages:
    167
    Likes Received:
    22
    I wonder then how much work and or extra transistors it would take then to expose the FP64 unit alongside the others. Would it be reasonable to assume a future derivative might do this?

    *Edit* Perhaps even replacing some more of the FP32 units to work in tandom with the others.
     
    #26 bdmosky, Jun 16, 2008
    Last edited by a moderator: Jun 16, 2008
  7. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,237
    Likes Received:
    807
    Location:
    Funny, It Worked Last Time...
    I'm positive the GT200 will never support using you as its main memory. :lol::lol:
     
  8. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Swwwweeet, thanks for the article guys. I love it when the first thing I can read about new hardware hotness is the low-level tech details :) Then I can happily move on to the benches and see how it pans out in reality.

    So awesome job as always guys and keep up the great work!

    And lol @ "petazixel" :D
     
  9. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
    Does the Purevideo VP2 unit in GTX 200 GPUs now support full VC-1 and MPEG-2 decoding on GPU like the Geforce 8200 mGPU does or still only partial on GPU decoding ala G92?
     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    So does anyone think NVidia is going to be a bit worried this time around? The GT200 seems to really hurt their performance per mm2, whereas all indications are that ATI really improved it with RV770. Even their HPC line may be in trouble with Firestream's 200 GFlops DP performance.

    Half of the problem with GT200's cost effectiveness is the lower shader clock, and half with the extras that bloated the tranny count. I'd expect both to go away with value models. We may see a situation like the Geforce 6xxx series where NV43 provided much more perf/$ than the high end parts.

    Nonetheless, after RV670 achieved near parity with NVidia (again, in terms of the cost effectiveness of an architecture), ATI looks like they'll be even better this time around.
     
  11. INKster

    Veteran

    Joined:
    Apr 30, 2006
    Messages:
    2,110
    Likes Received:
    30
    Location:
    Io, lava pit number 12
    I think GT200's VP2 does H.264/MPEG 2 only, just like G84/G86/G94/G92, etc.
    And, if we think about it, it does makes sense not to spend more transistors for it in a product such as the GTX 2xx.

    No one in their right mind would pair a top-end card like this to a Celeron or even a Pentium Dual-Core, and as such (seeing as VC-1 is not as compute intensive as H.264), it's only logical to provide VP3 capabilities on platforms already potentially limited by their low-end CPU's, such as the ones using IGP's (MCP78/7A) and low-end GPU's (G98, etc) from Nvidia.


    edit
    Scratch all that above. Apparently it does have full VC-1 decode capabilities, therefore matching G98's VP3 and AMD's UVD engines.
     
    #31 INKster, Jun 16, 2008
    Last edited by a moderator: Jun 16, 2008
  12. zsouthboy

    Regular

    Joined:
    Aug 1, 2003
    Messages:
    563
    Likes Received:
    9
    Location:
    Derry, NH
    I loved the article, guys - it's why I came here in the first place - technical information in an informal tone.

    Although I had to google Kim Kardas.. Kar.. whatever her name was. :D
     
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Shouldn't that be 32-bit integer pixels made up of three int10 channels and 2 bits for alpha or does a fp10 float format really exist?

    Also, I got to wonder... 8800GTX/G80 didn't really seem to suffer from a lack of blend rate, and the bandwidth per ROP didn't increase neither, so does increasing the blend rate per ROP really help there?
     
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    I'd really want to see a pic of the card (without the cooler...). Does someone really sell 2gbit gddr3 ram chips or are they using 2 1gbit chips in parallel, requiring an even more complex pcb (and hence the quite a bit lower sdram clock)?
     
  15. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,156
    Likes Received:
    1,433
    Location:
    Beyond3D HQ
    It's an FP format, s6e3 if I remember rightly (could be wrong there though, I'll check). And it looks like Tridam couldn't find full speed FP10 or FP16....
     
  16. Vincent

    Newcomer

    Joined:
    May 28, 2007
    Messages:
    235
    Likes Received:
    0
    Location:
    London
    My view is that-


    I think Jade Raymond is far more attractive than GT280.:lol:
     
  17. fivefeet8

    Newcomer

    Joined:
    Sep 13, 2004
    Messages:
    8
    Likes Received:
    0
    I think there is a spelling error on Page 6.

    "Portunately they can."

    Nice article by the way.
     
  18. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,727
    Likes Received:
    5,820
    Location:
    ಠ_ಠ
    7-bit mantissa, 3-bit exponent. :)
     
  19. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    http://hkepc.com/?id=1321&fs=c1hh
     
  20. CJ

    CJ
    Regular

    Joined:
    Apr 28, 2004
    Messages:
    816
    Likes Received:
    40
    Location:
    MSI Europe HQ
    Oh really?

    :wink:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...