Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    True, that wouldn't make sense. I'm trying to make sense of the four blue bits. Maybe those are all of the transcendentals :shrug:

    Except RCP is in the SFU, and that seems wrong to me -- from the same usefulness perspective.

    -Dave
     
  2. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    Extremely hard to do right now and wouldn't be accurate due to having nothing other than G200 to base it on.

    Would be much easier if we knew some more details on the GT2xx derivatives, specs, diesize and tranny count. Even if we knew that info it would still be pretty inaccurate due to G300 being a new architecture.

    Also, ATi's transistor density is much better than Nvidia's, so basing any estimates on how much ATi increased their transistor density isn't going to be very accurate.

    Purely going on linear shrinks, I am getting about the same number as you 3.2b but with a smaller diesize ~580mm2, so I would estimate somewhere around 3.1-3.3b for a diesize around 560-600mm2, covering the most recent rumors of a G200 sized die.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    There's wanton excess in NVidia too - just for some reason a lot of people lose critical faculties whenever the subject is raised. I'm certainly not going to rake through my long standing arguments about the terrible inefficiencies there.

    Magic, hmm...

    There's no doubt, GT21x has been a bracing failure so far, not exactly shining a positive light on NVidia's current abilities to deliver.

    Jawed
     
  4. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    What inefficiencies are there in GT200 besides the MSAA 8x performance? Even separate DP units can't really be called an obvious mistake of design, it was just a choice they made back then which wasn't necessarily bad considering the results.

    I still haven't seen any information on how they did it beyond the pointless marketing buzz.

    Was it NVIDIA GT21x specifically or TSMC 40G in general though?
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    You mean write anywhere? That's a feature of R700 LDS too.

    That's not to say that R800 LDS doesn't work better than R700 LDS.

    You mean on this side of GPU design, as opposed to the Larrabee-like future?

    But only for clause lengths > X, many times greater than a hardware implementation? Also, is DWF able to stand-up to the strain of nested branching?

    This is one of my big questions about D3D11, as it seems to declare open day for out of order pixel shader memory-accesses.

    R800, by the sound of it, has beefed-up buffers as a step in this direction. Additionally the ability of TUs to read render targets sounds like there's a connection of data from RBE cache to L2 (which is for TU), in order to provide a monster pixel data bandwidth into the ALUs. (That's a guess).

    But I'd still like to know more about what's happening there.

    L2 in Larrabee with 32 cores at 1.5GHz provides about 3TB/s of bandwidth. We're looking at 1TB/s L1/LDS (guessing LDS bandwidth) in RV870 and 435GB/s L2->L1. GT200's shared memory bandwidth is about 1.4TB/s, it would be reasonable to expect ~doubling in GF100.

    I still think shared memory is a short-term fix that'll hobble programming these things later on.

    Oh and Ct is getting closer:

    http://makebettercode.com/ct_tech/survey.php

    even if Intel appears to believe that it's an interim thing.

    Jawed
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I don't really understand what you mean by "usefulness" - you're referring to some absolute capability? You're saying that it should be at 50%, or higher, throughput compared with MUL?

    Jawed
     
  7. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Which gets decimated (literally) with relatively random gathers.
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    The whole damn thing.

    Compared with x86 it's pretty appalling.

    Well journalists have their chances to find out more.

    GT218 is a case in point. We've seen the power/performance comparisons with RV710. RV740 doesn't need any such excuses. What's NVidia doing?

    Jawed
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Shared memory in GT200 is only slightly less decimated by the same kinds of patterns - and that's provided you play ball within a murderously tight budget of allocated memory per strand. Of course GF100 could be way better. Gotta wait and see.

    Jawed
     
  10. apoppin

    Regular

    Joined:
    Feb 12, 2006
    Messages:
    255
    Likes Received:
    0
    Location:
    Hi Desert SoCal
    Well,
    NVIDIA Collaborates with Microsoft on High Performance GPU Computing

    i believe that Nvidia is betting 'the house' that GPU computing will become as important as CPU processing. That is what their GTC is about; their future.

    i am going to check it out. i was at Nvision08 and i am packing right now and heading for San Jose tonight to report on GTC for my site

    .. and no worries, i will ask Jensen about Fermi there (or someone else will) .. but my own sources have already confirmed it
     
  11. jaredpace

    Newcomer

    Joined:
    Sep 28, 2009
    Messages:
    157
    Likes Received:
    0
    Well this is a speculation thread, and since I'm new here I hope you guys don't mind me posting my speculation. :p This is what I am guessing:

    GF100 (Saw-zall GTX)
    40nm DX11 Cuda3
    ~590mm^2 ~3.2 billion transistors
    24.5 x 24.5mm2 die
    1536mb .4ns samsung/hynix 5gbps
    ~233gb/sec bandwidth
    700c / 1750s / 1250m
    512 MIMD / 128 tmu / 64 rop
    195watt TDP
    launch nov 25th, major retail christmas/jan.
    $549 - $599
     
  12. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    I'm saying that, comparatively speaking, that 5% number for transcendentals wouldn't hold for RCP. Or, it's more obvious (to me, a non-gfx, non-sci-analysis programmer) why MUL and RCP would be 1:1 than it is for, say, ADD and MUL.

    -Dave
     
  13. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    If I had done the work to prove my opinion I wouldn't have said I think :)
    That's just a question of heuristics ... maybe profile guided branch probability hints could help? In the end nothing beats MIMD, but the assumption is that upto a point the trade off in area remains worth it.

    I wonder if Intel has any automated tools for dynamic strand formation yet.
    Coincidentally that's what I think providing snooping cache coherency does for Larrabee ... just teaches bad habits with something which is convenient but scales like shit.
     
  14. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    CELL programmers are still banging their heads against the wall, don't make them cry even more :)
     
  15. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    I'm not saying local stores are necessary, I'm saying that removing the need to think carefully about data communication by just throwing lots of snooping bandwidth at it and allowing each and every cache to contain a copy of a memory location is a bit too extreme.
     
  16. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    (HW) implementation details aside, programmers that really care about performance will simply try to keep snooping traffic low. Which sounds simpler than managing 27 different-all partially incoherent-memory types.
    Perhaps tomorrow someone will find a way to make it easy for the sw developers and simple from an hw implementation standpoint, although I doubt it will ever happen :)
     
  17. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    That's a bit of an overstatement.

    So how come nobody did?

    GT218 is a 60mm^2 GPU. I don't think that you can compare it to the 140mm^2 RV740. And you surely can't compare it to a GPU made on another process.
    In other words we need more information before any conclusion on GT21x being a failure can be made. One review of GT218 isn't enough for such conclusion.
     
  18. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    The software will be written for the hardware of the day ... the next generation of hardware will be the designed partly for the software written for the hardware of yesterday.

    Once they go down this road it will be hard to make a turn.
    There are many ways to guarantee coherency.
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,110
    Location:
    New York
    Fair point. I'm also holding judgment till I see proper reviews. It's not like there are high volume parts on 40nm from anyone out there as yet.
     
  20. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Sure. Any favourite model of yours?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...