Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Previously the interpolator provided the interpolated values to the shader. In SM 5.0 the shader can ask for interpolated values by itself. There are some functions for it: EvaluateAttributeAtCentroid(), EvaluateAttributeAtSample() and EvaluateAttributeSnapped().

    5% would be a pretty reasonable estimate IMO. Might even be lower for optimized shaders. RSQ is probably used more, might be in the 10% range due to normalize() being used fairly frequently.
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Any such data path puts RV870 one step closer to fully closing the write/read loop in the manner CPU caches do.
    It would probably still be less flexible and have higher latency, but at least there's an on-chip path.

    The LDS bandwidth make sense, assuming the 64-byte data path in RV770 remains without further elaboration.

    As a side note, I'm curious about the additional non-texture L1 that was added alongside the regular texture cache, as mentioned in the Anandtech article. What this brings to the table at that size compared to the larger texture and LDS, I'm not sure. It would help with problems with thrashing, if graphics and compute shaders hit the same SIMD, I suppose.
    In a GPGPU situation, what would it offer over using the larger L1?
     
  3. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    With now both NVIDIA and ATI interpolating in the shader cores divisions are used even more often :)
     
  4. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,805
    Likes Received:
    473
    Directory based with no replication at all of writeable data.
     
  5. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Well, in the short term NVIDIA and/or Intel have the chance of make you happy. Although I get the feeling you are quite skeptic..
     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    432
    Location:
    New York
    How do you figure that?

    I get 30*16-banks*4-bytes*600Mhz =~ 1.15TB/s
     
  7. Arty

    Arty KEPLER
    Veteran

    Joined:
    Jun 16, 2005
    Messages:
    1,906
    Likes Received:
    55
    You also 'reported' that GT300 taped out in Jan-Mar (massive lol) and that it was 512b, I'm waiting for you to backtrack on that bit and conveniently jump on the 384b bandwagon thanks to the hint from Rys. FYI, you also 'reported' that Cypress was 1200SPs and what not .. :roll:

    Anyone care to reason why the change from 384b to 512b? (Assuming it was 512b to start with) Does that improve yields? And if it does, is that significant? This also comes at an awkward proposition, irregular memory configurations (Arun?). A 384b GF100 would mean 1.53GB (53% more memory wrt Cypress) would be an additional cost, no?

    FYI, if I'm reading CJ correctly G300/Fermi/GF100 recently returned from a spin.
     
  8. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    Thanks for data re RSQ in shaders. Interesting. I figured 5% was about right for divisions in my own code, but then the percentage for transcendentals in my own code would be roughly 0% :) I also figured the number of adds vs. multiplies would put muls at around 10%, but then a lot of adds are address related and loop related. I'd be hard-pressed to take an educated guess at the ratio absent those two items, but my stab in the dark would say that there are more adds per mul than there are more muls per div. :shrug:

    At anyrate, from my limited experience, if I wanted to make my ALUs more generic, DIV would have to be close to the top of my list....

    -Dave
     
  9. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    789
    Likes Received:
    74
    Location:
    'Zona
    Seems like you are going with the 384bit, but your ROPs don't match it. Should be either 24 or 48ROPs, 48ROPs sounds a bit more likely.
    Not sure how you are getting 233GBps, 384b w/ 5ghz GDDR5 is 240GBps, probably going to have slightly lower clocks though to save on power consumption unless they are heavily bottlenecked by bandwidth.
     
  10. mapel110

    Newcomer

    Joined:
    Apr 24, 2003
    Messages:
    150
    Likes Received:
    0
    Location:
    Germany
    Let me try. ^^

    40nm DX11
    ~450mm^2
    ~2.8 billion transistors
    2048 MB GDDR5
    320gb/sec bandwidth
    512bit memory interface
    700c / 1600s / 1250m
    512 MIMD / 160 tmu / 32 rops at much higher frequency
    170watt TDP
    launch nov, availability december
    $450 for top model
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,183
    Likes Received:
    1,840
    Location:
    Finland
    As some others already pointed out, you don't have the best track record either on reliable reports ;)
     
  12. CJ

    CJ
    Regular

    Joined:
    Apr 28, 2004
    Messages:
    816
    Likes Received:
    40
    Location:
    MSI Europe HQ
    Actually, that's not what I said. ;) I said it needs a spin and not that it returned from a spin.
     
  13. Unknown Soldier

    Veteran

    Joined:
    Jul 28, 2002
    Messages:
    2,238
    Likes Received:
    33
    or 27th ;)
     
  14. Arty

    Arty KEPLER
    Veteran

    Joined:
    Jun 16, 2005
    Messages:
    1,906
    Likes Received:
    55
    Which was posted last week in this same thread, unless you like regurgitated news at that link.
     
  15. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,702
    Likes Received:
    117
    Clearly wrong... Quick game of find the inconsistencies.
     
  16. Wirmish

    Newcomer

    Joined:
    May 4, 2007
    Messages:
    160
    Likes Received:
    0
    [​IMG]

    You sure ? ---> FiringSquad
     
  17. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Hardly conclusive tests. Increasing the memory clock could simply generate more data transfer errors. You can't tweak a couple of knobs and expect a complex architecture to show a linear behaviour.
     
  18. FUDie

    Regular

    Joined:
    Sep 25, 2002
    Messages:
    581
    Likes Received:
    34
    Except that increasing engine clock 9% alone was enough to gain 5%. Increasing memory clocks by 9% as well couldn't get you more than additional 4%, so engine clock has more impact that memory clock.

    -FUDie
     
  19. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    You clearly haven't read what I wrote about data transfer errors. We are dealing with GDDR5, it won't fail, it will scale badly or even impact perf. Moreover no app is entirely ALU limited or bw limited, bottlenecks are dynamic and constantly change while rendering a single frame.
    BTW..not just talking about the memory modules 'failing', the GDDR5 interface can fail as well.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...